

XML & ZIP: Explore Your Excel Workbooks File Structure
Did you know that Excel files are actually a zipped collection of XML files? You can easily explore and edit the complete structure of your Excel workbook. All you have to do: Rename and unzip the .xlsx file. Let’s take a look at it.
Contents
Some background information about the XML format
Before we jump in: Some background information. Excel files use the Open Office XML File Format. It’s an open standard so that also other software and developers can easily access your files. It’s documented well (the core document has more than 5,000 PDF pages). If you are interested: Here are the publicly available documents: http://www.ecma-international.org/publications/standards/Ecma-376.htm
Each Excel workbook contains of a bunch of folders and XML files. XML stand for Extented Markup Language. It works with opening and closing tags, e.g. <car name>BMW</car name> and looks similar to HTML. The good thing about this: You can (comparatively) easy read and understand the source code.
How to see the content of an Excel file
It’s actually very simple to see all the content of your Excel file. Not the workbook itself in Excel, but rather its source code XML files. If you save your Excel workbook, it usually got the .xlsx ending. That means, your file is an Excel file which will open in Excel.
There is a trick:
- Save your workbook and close it.
- Rename it: Replace the .xlsx ending of the file name by .zip.
- Unzip it.
So you basically just replace the .xlsx by .zip. That way, your file will open in a new Windows Explorer and you can see all it’s content. When you enter or extract the contents it now you got a new folder with some subfolders. Each contains some XML files. You can open them with the text editor or – and that usually looks better – when a browser, e.g. Chrome.
Hold on a second. Was this information helpful so far?
Why don't you subscribe to our monthly, free Excel newsletter?
Subscribe now! Your welcome gift: Our big 45 pages keyboard shortcuts package. In case the sign-up form above doesn't work, please use this page. Sorry for the inconvenience.
Of course, I'm also on other networks:
Twitter: Follow @professorexcel
Facebook:
Contents


An Excel workbook can contain various types of data. All of them are somehow saved within your file.
An Excel workbook can have various types of contents: The charts, tables, PivotTables, drawing, images and of course the actual worksheet or cell contents. This all must be saved somehow in the Excel file. But furthermore, there are many other information stored: Some meta data, calculation chains, themes and styles, named ranges and the relationship of all this data and files.
The image on the right side provides a rough overview of all the types of data saved in an Excel file. You can divide the content into the following two categories:
- “Primary” content: The actual content which you can see and access through Excel. For example, the cell contents, images, PivotTables, Tables and so on.
- “Secondary” content: Things you don’t exactly see, but still necessary for your Excel file to work.
Basic contents


The contents of an empty Excel workbook.
Each Excel workbook has some basic contents. A completely empty workbook already comes with these folders and files:
- _rels (Folder): “/_rels/.rels” is an XML file where the starting package-level relationships are stored.
- docProps (Folder): In most cases this folder has two files: app.xml and core.xml. Those files have some metadata. E.g. core.xml contains information about the auther, modified by, data created and date saved. app.xml on the other hand has some more information rather about the content of the file, e.g. if there are external links which need to be updated
- xl (Folder): This folder has most of the information. By default it has 3 subfolder: “_rels”, “theme” and “worksheets” as well as two more XML files styles.xml and workbook.xml.
- [Content_Types].xml: The XML file [Content_Types] is the only file on this level. It has references to the XML files within the folders above.
How a worksheet is saved


Example: Let’s take a look at the source code for this worksheet.
There is one XML file for each Excel worksheet. Those worksheet XML files are saved within the folder “xl/worksheets”. So, let’s take a look at an example worksheet (see the screenshot on the right side).
This worksheet has some content in the cells A1, B1, A2 and B2. The cells A1, B1 and A2 only have text whereas the cell B2 has a formula (a VLOOKUP). There is also an image, the cell I24 is selected. So how does the underlying source code look like?


XML file for the worksheet above.
Actually, the XML file doesn’t look to complicated. Let’s take a look at some selected lines of code:
- The tab has a dark green color. It’s formatted with the tag “<tabColor”, followed by the RGB value.
- This sheet is selected. So it got the tag “<sheetView tabSelected” set to 1.
- Number 3 is quite interesting: It has the content of cell A1. As you can see in the screenshot above, the text in cell A1 says “Lookup Value”. But instead of this text, the XML code just says “11”. This number 11 refers to the 11th entry on the XML sheet “sharedStrings”. That way Excel makes sure that each text is only saved once in order to save space.
- The cell B2 has a VLOOKUP formula. The complete formula is saved with the tag “<f>” and the calculated value – which you will see immediately when opening the workbook – is also stored under the tag “<v>”.
- The page margins are also saved.
The image is just inserted with the reference “<drawing r:id=”rId1″/>”.
Excel too slow? Speed it up. Get the book now!


Tired of waiting for Excel? Use the 30 best methods described in this book to speed up Excel calculations!
- Learn how Microsoft Excel performs calculations
- Use the simple and effective step-by-step guide to master each method
- Get to know the impact each method will have on performance
Learn more or get it on Amazon!






Advanced contents


Screenshot of the structure of an Excel file with some more contents.
Besides the basic worksheets, Excel files can have much more content. This content could be
- PivotTables with their cache. The actual source for the PivotTable doesn’t have to be included in the workbook file.
- Named ranges: The named in the workbook.xml file in the xl folder.
- Drawing: Images and charts count as drawings. The image itself will be stored in the folder xl/media the charts in xl/charts.
- Tables: Data tables are saved with their headlines and range under xl/tables. The values are saved separately though. They are still in the corresponding worksheet XML file.
More information
You want to know more? There are some interesting websites available:
- Overview of the Open XML SDK: https://msdn.microsoft.com/en-us/library/office/gg278316.aspx
- Information about the Open Packaging Conventions: https://en.wikipedia.org/wiki/Open_Packaging_Conventions
- A good summary of the XML and Excel file structure: http://www.jkp-ads.com/Articles/XMLAndExcel00.asp
- Archive of the underlying Office Open XML File Formats: http://www.ecma-international.org/publications/standards/Ecma-376.htm
Comments 15
XML | Pearltrees
[…] In short, XML greatly eases the definition, transmission, validation, and interpretation of data between databases, applications, and organizations. XML data and schema files Excel works primarily with two types of XML files: XML data files (.xml), which contain the custom tags and structured data. The XML standard also defines Extensible Stylesheet Language Transformation (XSLT) (.xslt) files, which are used to apply styles and transform XML data into different presentation formats. Key XML and Excel scenarios By using XML and Excel, you can manage workbooks and data in ways that were previously impossible or very difficult. Top of Page 1. 2. 3. 4. 5. XML & ZIP: Explore Your Excel Workbooks File Structure | Professor Excel | Professor Excel. […]
How to import from Excel with cell coordinates in Power Query and Power BI – The BIccountant
[…] Explore your Excel file-structure in Windows Explorer by Professor Excel […]
Worker
When I try to just change the name of the file to put “.zip” at the end, it just saves as FileName.xlsm.zip.xlsm. I guess this is beyond just saving in a different format to zip? Maybe I’m just too old for excel these days?
Henrik Schiffner
No, just simply rename it in the Windows Explorer, not through “Save as” in Excel.
Michal
Is there a way to access encrypted excel file (I do know the password) without opening the file in excel first to remove the password?
daniel
MIS-250 INFORMATION SYSTEMS
Assignment 2 – Analysis with Excel
Instructions for the provided excel file
Download the archive that contains the initial excel file. The archive is named Assignment2.zip.
Extract the workbook from the archive.
Upload and submit your final Excel file. You may have to create a .zip archive containing the file in
order to upload it.
Information for the excel workbook file and explanation
The excel file contains three worksheets. The first worksheet (Orders) contains formulas and the other two
only values. The three worksheets can be used to track inventory for a set of parts, by entering the proper
formulas in the cells instead of just numerical values.
The values in Sheet(“Orders”) are the number of parts per month of certain types that were delivered
to a warehouse for a particular month. The formula computes the number of parts sold the previous
month minus how many were on hand at the start of the month and selects the larger of either this
difference or 0. That is the amount to order during the current month.
The values in Sheet(“Sales”) are the number of parts of certain types that were sold to customers from
the warehouse during a month.
The values in Sheet(“Available”) represent the number of each part available on the first day of the
month.
Before you begin, make sure you understand the data in the sheets. To help you understand the data, answer
the following questions.
How many parts with number A01 were on hand at the end of January 2016?
How many parts with number A01 were ordered and received by the last day of January 2016?
How many parts with number A01 were sold during the month of January 2016?
Follow the instructions
1. Add the first two values and subtract the third one to determine how many A01s were in the warehouse
on the first day of February 2016. Compare your computed value to the corresponding value shown in
the Sheet(“Available”).
2. Sort all the worksheets so that the part numbers (column headings) are in ascending order
alphabetically. Make sure that the corresponding numbers in adjacent rows are also rearranged as you
sort.
2
3. Replace the values in Sheet(“Available”) with formulas that compute the number on hand. The
calculation is a simple sum of two numbers minus another number.
4. Format the three inventory worksheets so that titles and subtitles stand out and so that negative
inventory amounts are in red.
5. Copy the three sheets, accepting the default names and making sure the three copies are in order at the
end of the workbook.
6. The company is thinking to use a set order quantity inventory policy which means about always
ordering the same amount of a particular part each month, rather than varying the amounts ordered from
month to month. The amount ordered each month would vary from part to part, but for a particular part
would be the same each month. Your job is to test a set order quantity inventory policy against last
year’s data by trying different set order amounts for each part.
To create a set order amount policy, replace the formulas in Sheet(“Orders (2)”) so that there is a single
fixed value in each column. The value used in one column can be different from the value used in the
next column, but the values in a single column should all be the same. That is, the same amount of a
particular part will be ordered every month, regardless of how many are sold the prior month. An
example Sheet(“Orders (20)”) with a set order amount is shown in Figure 1.
Figure 1 Orders Worksheet with Set Order Amount Inventory Policy 7. Try to find a set order amount value for each part so that the quantity on hand on March 1, 2017 is as
close as possible to 0 but still non-negative and the number of shortages (red values) is minimized. Stop
trying different set order amounts when you think you have found the ‘best’ set order amount for each
part (according to the stated criterion). To aid in your analysis, create a chart that compares the March
2017 Available values for the new ordering policy compared to the old policy. ***Note that there is not one “CORRECT” answer to this task.
3
8. How do you think the set order amount inventory policy compares to the policy previously used? Is
one better than the other? Why? Write four or five sentences that express your ideas and place these in
a worksheet named My Thoughts
Grading
10%
Matt
Hello Professor Ex,
Your information here and other pages on your site about “unlocking” the underlying xml data have been very helpful–
We occasionally receive xlsx files from other companies that my cohorts feel are missing data. They first view the file in “Protected View”, then later after opt to “Enable Editing” and find a lot of errors (#ref!, etc). If they close the file and re-open it in protected view again, they notice that the values make more sense there and seem to be without error, but are largely unusable because of the nature of “Protected View”.
In troubleshooting one such file issue, in which the file was missing all “Named Ranges”, I inspected the underlying workbook.xml file and noticed that there is a line that says:
with no “opening” “definedNames” line (and also not a properly closing one, since it is supposed to begin with the forward slash and not end with it?)
A few weeks later I came across an erroneous file that behaved the same way, and this time had some blank cells that didn’t make sense. I looked in the corresponding xml data for the sheet (sheet2.xml) and noticed that the blank cell, let’s say “KJ15” had the following “value”
Other properly filled out values looked right, such as:
2
I believe KJ15 should’ve read
10
based on my inspection of “sharedStrings2.xml”. There was no “sharedStrings.xml” file.
Have you ever seen this before in your illustrious Excel carrier? I can only think that the companies we are receiving these files from must have some kind of antivirus that is scrubbing parts of the xml files, or maybe they are using some strange add-in that is doing it?
Matt
Some items were, ironically, removed from that post. I’ll try it parentheses instead of the angle brackets:
Hello Professor Ex,
Your information here and other pages on your site about “unlocking” the underlying xml data have been very helpful–
We occasionally receive xlsx files from other companies that my cohorts feel are missing data. They first view the file in “Protected View”, then later after opt to “Enable Editing” and find a lot of errors (#ref!, etc). If they close the file and re-open it in protected view again, they notice that the values make more sense there and seem to be without error, but are largely unusable because of the nature of “Protected View”.
In troubleshooting one such file issue, I inspected the underlying workbook.xml file and noticed that there is a line that says:
( x:definedNames/ )
with no “opening” “definedNames” line (and also not a properly closing one, since it is supposed to begin with the forward slash and not end with it?)
A few weeks later I came across an erroneous file that behaved the same way, and this time had some blank cells that didn’t make sense. I looked in the corresponding xml data for the sheet (sheet2.xml) and noticed that the blank cell, let’s say “KJ15” had the following “value”
( x:c r=”KJ15″ s=”46″ )
( x:v/ )
Other properly filled out values looked right, such as:
( x:c r=”AO15″ s=”9″ t=”s” )
( x:v>2</x:v )
I believe KJ15 should’ve read
( x:c r=”KJ15″ s=”46″ )
( x:v ) 10 ( /x:v )
based on my inspection of “sharedStrings2.xml”. There was no “sharedStrings.xml” file.
Have you ever seen this before in your illustrious Excel carrier? I can only think that the companies we are receiving these files from must have some kind of antivirus that is scrubbing parts of the xml files, or maybe they are using some strange add-in that is doing it?
shreddrx
Hey there, i’m a bit confused as to how the styles are set in xlsx? I know that you have to use “s=styleindex” but when i look into the style properties, i’m not able to differentiate the style indexs, like how does the xlsx determine which style to use on the style xml sheet?
Hootan
Hi,
Is it possible to edit an xltx file by unzipping it and zip it as a xlsx file? what should be edited inside the files?
gm
it doesn’t work in the standard Windows 10 any more, thx Microsoft.
Ay
So you basically just replace the .xlsx by .zip. That way, your file will open in a new Windows Explorer and you can see all it’s content. When you enter or extract the contents it now you got a new folder with some subfolders. Each contains some XML files. You can open them with the text editor or – and that usually looks better – when a browser, e.g. Chrome
I don’t get any subfolders when I extract – only the file itself. Do you know why this might be?
camila
Hi, I found that 2 tabs were what was causing my spreadsheets to be heavy. However, when I delete the 2 tabs, the size doesn’t change.
What is interesting is that, if I save again as .zip, the 2 tabs that I just deleted still appear there. Any idea?
Turan
Hello,
Where macro are stored?
I have an excel file which is 480MB and I’m trying to understend why is so big. The contents under renamed zip are no more then 3MB ?!?!
Thanks
Henrik Schiffner
Hi Turan,
I don’t know your file, but only in very rare cases, a VBA macro would be 477MB large. Most common in such case are rather formatting issues. Try to delete all unused cell ranges, for example. Please also check out these article: How to reduce the file size: https://professor-excel.com/how-to-reduce-the-file-size-of-an-excel-workbook/ and what really helps to reduce the file size of Excel files: https://professor-excel.com/study-reduce-file-size-excel-workbook/
Best regards,
Henrik