XML & ZIP: Explore Your Excel Workbooks File Structure

Did you know that Excel files are actually a zipped collection of XML files? You can easily explore and edit the complete structure of your Excel workbook. All you have to do: Rename and unzip the .xlsx file. Let’s take a look at it.

Some background information about the XML format

Before we jump in: Some background information. Excel files use the Open Office XML File Format. It’s an open standard so that also other software and developers can easily access your files. It’s documented well (the core document has more than 5,000 PDF pages). For your reference – here are the publicly available documents: http://www.ecma-international.org/publications/standards/Ecma-376.htm

Each Excel workbook contains of a bunch of folders and XML files. XML stand for Extented Markup Language. It works with opening and closing tags, e.g. <car name>BMW</car name> and looks similar to HTML. The good thing about this: You can (comparatively) easy read and understand the source code.

How to see the content of an Excel file, including the XML structure

It’s actually very simple to see all the content of your Excel file. Not the workbook itself in Excel, but rather its source code files. If you save your Excel workbook, it usually got the .xlsx ending. That means, your file is an Excel file which will open in Excel.

There is a trick:

  1. Save your workbook and close it.
  2. Rename it: Replace the .xlsx ending of the file name by .zip.
  3. Unzip it.

So you basically just replace the .xlsx by .zip. That way, your file will open in a new Windows Explorer and you can see all it’s content. When you enter or extract the contents it now you got a new folder with some subfolders. Each contains some XML files. You can open them with the text editor or – and that usually looks better – when a browser, e.g. Chrome.

Contents

data, excel, file, type, contents, xml
An Excel workbook can contain various types of data. All of them are somehow saved within your file.

An Excel workbook can have various types of contents: The charts, tables, PivotTables, drawing, images and of course the actual worksheet or cell contents. This all must be saved somehow in the Excel file. But furthermore, there are many other information stored: Some meta data, calculation chains, themes and styles, named ranges and the relationship of all this data and files.

The image on the right side provides a rough overview of all the types of data saved in an Excel file. You can divide the content into the following two categories:

  • “Primary” content: The actual content which you can see and access through Excel. For example, the cell contents, images, PivotTables, Tables and so on.
  • “Secondary” content: Things you don’t exactly see, but still necessary for your Excel file to work.


Do you want to boost your productivity in Excel?

Get the Professor Excel ribbon!

Add more than 120 great features to Excel!


Basic XML contents

contents, excel, workbook, xml, rel
The contents of an empty Excel workbook.

Each Excel workbook has some basic contents. A completely empty workbook already comes with these folders and files:

  • _rels (Folder): “/_rels/.rels” is an XML file where the starting package-level relationships are stored.
  • docProps (Folder): In most cases this folder has two files: app.xml and core.xml. Those files have some metadata. E.g. core.xml contains information about the auther, modified by, data created and date saved. app.xml on the other hand has some more information rather about the content of the file, e.g. if there are external links which need to be updated
  • xl (Folder): This folder has most of the information. By default it has 3 subfolder: “_rels”, “theme” and “worksheets” as well as two more XML files styles.xml and workbook.xml.
  • [Content_Types].xml: The XML file [Content_Types] is the only file on this level. It has references to the XML files within the folders above.

How a worksheet is saved

example, worksheet, source, code, xml, excel
Example: Let’s take a look at the source code for this worksheet.

There is one XML file for each Excel worksheet. So, these worksheet XML files are saved within the folder “xl/worksheets”. So, let’s take a look at an example worksheet (see the screenshot on the right side).

This worksheet has some content in the cells A1, B1, A2 and B2. The cells A1, B1 and A2 only have text whereas the cell B2 has a formula (a VLOOKUP). There is also an image, the cell I24 is selected. So how does the underlying source code look like?

xml, file, excel, source, code
XML file for the worksheet above.

Actually, the XML file doesn’t look to complicated. Let’s take a look at some selected lines of code:

  1. The tab has a dark green color. It’s formatted with the tag “<tabColor”, followed by the RGB value.
  2. This sheet is selected. So it got the tag “<sheetView tabSelected” set to 1.
  3. Number 3 is quite interesting: It has the content of cell A1. As you can see in the screenshot above, the text in cell A1 says “Lookup Value”. But instead of this text, the XML code just says “11”. This number 11 refers to the 11th entry on the XML sheet “sharedStrings”. That way Excel makes sure that each text is only saved once in order to save space.
  4. The cell B2 has a VLOOKUP formula. That means, the complete formula is saved with the tag “<f>” and the calculated value – which you will see immediately when opening the workbook – is also stored under the tag “<v>”.
  5. The page margins are also saved.

The image is just inserted with the reference “<drawing r:id=”rId1″/>”.

Advanced contents

contents, charts, drawing, excel, folder
Screenshot of the structure of an Excel file with some more contents.

Besides the basic worksheets, Excel files can have much more content. So, this content could be

  • PivotTables with their cache. The actual source for the PivotTable doesn’t have to be included in the workbook file.
  • Named ranges: The named in the workbook.xml file in the xl folder.
  • Drawing: Images and charts count as drawings. The image itself will be stored in the folder xl/media the charts in xl/charts.
  • Tables: Data tables are saved with their headlines and range under xl/tables. The values are saved separately though. They are still in the corresponding worksheet XML file.

More information

You want to know more? There are some interesting websites available:

Henrik Schiffner is a freelance business consultant and software developer. He lives and works in Hamburg, Germany. Besides being an Excel enthusiast he loves photography and sports.

13 comments

  1. When I try to just change the name of the file to put “.zip” at the end, it just saves as FileName.xlsm.zip.xlsm. I guess this is beyond just saving in a different format to zip? Maybe I’m just too old for excel these days?

  2. Is there a way to access encrypted excel file (I do know the password) without opening the file in excel first to remove the password?

  3. MIS-250 INFORMATION SYSTEMS
    Assignment 2 – Analysis with Excel
    Instructions for the provided excel file
     Download the archive that contains the initial excel file. The archive is named Assignment2.zip.
    Extract the workbook from the archive.
     Upload and submit your final Excel file. You may have to create a .zip archive containing the file in
    order to upload it.
    Information for the excel workbook file and explanation
    The excel file contains three worksheets. The first worksheet (Orders) contains formulas and the other two
    only values. The three worksheets can be used to track inventory for a set of parts, by entering the proper
    formulas in the cells instead of just numerical values.
     The values in Sheet(“Orders”) are the number of parts per month of certain types that were delivered
    to a warehouse for a particular month. The formula computes the number of parts sold the previous
    month minus how many were on hand at the start of the month and selects the larger of either this
    difference or 0. That is the amount to order during the current month.
     The values in Sheet(“Sales”) are the number of parts of certain types that were sold to customers from
    the warehouse during a month.
     The values in Sheet(“Available”) represent the number of each part available on the first day of the
    month.
    Before you begin, make sure you understand the data in the sheets. To help you understand the data, answer
    the following questions.
     How many parts with number A01 were on hand at the end of January 2016?
     How many parts with number A01 were ordered and received by the last day of January 2016?
     How many parts with number A01 were sold during the month of January 2016?
    Follow the instructions
    1. Add the first two values and subtract the third one to determine how many A01s were in the warehouse
    on the first day of February 2016. Compare your computed value to the corresponding value shown in
    the Sheet(“Available”).
    2. Sort all the worksheets so that the part numbers (column headings) are in ascending order
    alphabetically. Make sure that the corresponding numbers in adjacent rows are also rearranged as you
    sort.
    2
    3. Replace the values in Sheet(“Available”) with formulas that compute the number on hand. The
    calculation is a simple sum of two numbers minus another number.
    4. Format the three inventory worksheets so that titles and subtitles stand out and so that negative
    inventory amounts are in red.
    5. Copy the three sheets, accepting the default names and making sure the three copies are in order at the
    end of the workbook.
    6. The company is thinking to use a set order quantity inventory policy which means about always
    ordering the same amount of a particular part each month, rather than varying the amounts ordered from
    month to month. The amount ordered each month would vary from part to part, but for a particular part
    would be the same each month. Your job is to test a set order quantity inventory policy against last
    year’s data by trying different set order amounts for each part.
    To create a set order amount policy, replace the formulas in Sheet(“Orders (2)”) so that there is a single
    fixed value in each column. The value used in one column can be different from the value used in the
    next column, but the values in a single column should all be the same. That is, the same amount of a
    particular part will be ordered every month, regardless of how many are sold the prior month. An
    example Sheet(“Orders (20)”) with a set order amount is shown in Figure 1.
    Figure 1 Orders Worksheet with Set Order Amount Inventory Policy 7. Try to find a set order amount value for each part so that the quantity on hand on March 1, 2017 is as
    close as possible to 0 but still non-negative and the number of shortages (red values) is minimized. Stop
    trying different set order amounts when you think you have found the ‘best’ set order amount for each
    part (according to the stated criterion). To aid in your analysis, create a chart that compares the March
    2017 Available values for the new ordering policy compared to the old policy. ***Note that there is not one “CORRECT” answer to this task.
    3
    8. How do you think the set order amount inventory policy compares to the policy previously used? Is
    one better than the other? Why? Write four or five sentences that express your ideas and place these in
    a worksheet named My Thoughts
    Grading
    10%

  4. Hello Professor Ex,
    Your information here and other pages on your site about “unlocking” the underlying xml data have been very helpful–
    We occasionally receive xlsx files from other companies that my cohorts feel are missing data. They first view the file in “Protected View”, then later after opt to “Enable Editing” and find a lot of errors (#ref!, etc). If they close the file and re-open it in protected view again, they notice that the values make more sense there and seem to be without error, but are largely unusable because of the nature of “Protected View”.
    In troubleshooting one such file issue, in which the file was missing all “Named Ranges”, I inspected the underlying workbook.xml file and noticed that there is a line that says:

    with no “opening” “definedNames” line (and also not a properly closing one, since it is supposed to begin with the forward slash and not end with it?)
    A few weeks later I came across an erroneous file that behaved the same way, and this time had some blank cells that didn’t make sense. I looked in the corresponding xml data for the sheet (sheet2.xml) and noticed that the blank cell, let’s say “KJ15” had the following “value”

    Other properly filled out values looked right, such as:

    2
    I believe KJ15 should’ve read

    10
    based on my inspection of “sharedStrings2.xml”. There was no “sharedStrings.xml” file.
    Have you ever seen this before in your illustrious Excel carrier? I can only think that the companies we are receiving these files from must have some kind of antivirus that is scrubbing parts of the xml files, or maybe they are using some strange add-in that is doing it?

  5. Some items were, ironically, removed from that post. I’ll try it parentheses instead of the angle brackets:
    Hello Professor Ex,
    Your information here and other pages on your site about “unlocking” the underlying xml data have been very helpful–
    We occasionally receive xlsx files from other companies that my cohorts feel are missing data. They first view the file in “Protected View”, then later after opt to “Enable Editing” and find a lot of errors (#ref!, etc). If they close the file and re-open it in protected view again, they notice that the values make more sense there and seem to be without error, but are largely unusable because of the nature of “Protected View”.
    In troubleshooting one such file issue, I inspected the underlying workbook.xml file and noticed that there is a line that says:
    ( x:definedNames/ )
    with no “opening” “definedNames” line (and also not a properly closing one, since it is supposed to begin with the forward slash and not end with it?)
    A few weeks later I came across an erroneous file that behaved the same way, and this time had some blank cells that didn’t make sense. I looked in the corresponding xml data for the sheet (sheet2.xml) and noticed that the blank cell, let’s say “KJ15” had the following “value”
    ( x:c r=”KJ15″ s=”46″ )
    ( x:v/ )
    Other properly filled out values looked right, such as:
    ( x:c r=”AO15″ s=”9″ t=”s” )
    ( x:v>2</x:v )
    I believe KJ15 should’ve read
    ( x:c r=”KJ15″ s=”46″ )
    ( x:v ) 10 ( /x:v )
    based on my inspection of “sharedStrings2.xml”. There was no “sharedStrings.xml” file.
    Have you ever seen this before in your illustrious Excel carrier? I can only think that the companies we are receiving these files from must have some kind of antivirus that is scrubbing parts of the xml files, or maybe they are using some strange add-in that is doing it?

  6. Hey there, i’m a bit confused as to how the styles are set in xlsx? I know that you have to use “s=styleindex” but when i look into the style properties, i’m not able to differentiate the style indexs, like how does the xlsx determine which style to use on the style xml sheet?

  7. Hi,
    Is it possible to edit an xltx file by unzipping it and zip it as a xlsx file? what should be edited inside the files?

  8. So you basically just replace the .xlsx by .zip. That way, your file will open in a new Windows Explorer and you can see all it’s content. When you enter or extract the contents it now you got a new folder with some subfolders. Each contains some XML files. You can open them with the text editor or – and that usually looks better – when a browser, e.g. Chrome

    I don’t get any subfolders when I extract – only the file itself. Do you know why this might be?

  9. Hi, I found that 2 tabs were what was causing my spreadsheets to be heavy. However, when I delete the 2 tabs, the size doesn’t change.

    What is interesting is that, if I save again as .zip, the 2 tabs that I just deleted still appear there. Any idea?

  10. Hello,

    Where macro are stored?
    I have an excel file which is 480MB and I’m trying to understend why is so big. The contents under renamed zip are no more then 3MB ?!?!

    Thanks

    1. Hi Turan,
      I don’t know your file, but only in very rare cases, a VBA macro would be 477MB large. Most common in such case are rather formatting issues. Try to delete all unused cell ranges, for example. Please also check out these article: How to reduce the file size: https://professor-excel.com/how-to-reduce-the-file-size-of-an-excel-workbook/ and what really helps to reduce the file size of Excel files: https://professor-excel.com/study-reduce-file-size-excel-workbook/
      Best regards,
      Henrik

Leave a comment

Your email address will not be published. Required fields are marked *