In this post I'll have peek at which XML formats there's available for spreadsheets, here primarily the Microsoft variant in Excel and secondary in Open Office. This is NOT an entry in the flame war between ODF and EOOXML, since i neither have the knowledge nor insight to contribute in that debate.
Lately I've been using Excel for several tasks and since the launch of Microsoft Office 2007 there's been a massive campaign about the support for XML in MS Office 2007 - but hey, theres been support for if for a long time, as I'll show in this post. Now it's not necessarily so easy to define support for XML, since this can potentially be a wide array of features and probably the ways you can use XML with MS Office 2007 has increased since MS Office XP and 2003.
Throughout this post I'll use the simplest examples possible - that's empty ones. I could had inserted a hello world but figured it would not matter.
Excel 2002 - MS Office XP
Excel 2002 which came with MS Office XP (In 2001 accordin to wikipedia, that's seven years ago). Note: what went wrong with the naming here?. From back then it was possible to save (and load) as XML, and the empty example would look like:
1 <?xml version="1.0"?> 2 <Workbook 3 xmlns="urn:schemas-microsoft-com:office:spreadsheet" 4 xmlns:o="urn:schemas-microsoft-com:office:office" 5 xmlns:x="urn:schemas-microsoft-com:office:excel" 6 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" 7 xmlns:html="http://www.w3.org/TR/REC-html40"> 8 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office"> 9 <Author>Demo Boy</Author> 10 <LastAuthor>Demo Boy</LastAuthor> 11 <Created>2008-02-23T10:04:48Z</Created> 12 <LastSaved>2008-02-23T11:11:05Z</LastSaved> 13 <Company>Sweetxml</Company> 14 <Version>10.6830</Version> 15 </DocumentProperties> 16 <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office"> 17 <DownloadComponents /> 18 <LocationOfComponents HRef="file:///\\" /> 19 </OfficeDocumentSettings> 20 <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel"> 21 <WindowHeight>12525</WindowHeight> 22 <WindowWidth>17115</WindowWidth> 23 <WindowTopX>360</WindowTopX> 24 <WindowTopY>120</WindowTopY> 25 <ProtectStructure>False</ProtectStructure> 26 <ProtectWindows>False</ProtectWindows> 27 </ExcelWorkbook> 28 <Styles> 29 <Style 30 ss:ID="Default" 31 ss:Name="Normal"> 32 <Alignment ss:Vertical="Bottom" /> 33 <Borders /> 34 <Font /> 35 <Interior /> 36 <NumberFormat /> 37 <Protection /> 38 </Style> 39 </Styles> 40 <Worksheet ss:Name="Ark1"> 41 <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel"> 42 <PageSetup> 43 <Header x:Margin="0" /> 44 <Footer x:Margin="0" /> 45 <PageMargins 46 x:Bottom="0.984251969" 47 x:Left="0.78740157499999996" 48 x:Right="0.78740157499999996" 49 x:Top="0.984251969" /> 50 </PageSetup> 51 <Selected /> 52 <ProtectObjects>False</ProtectObjects> 53 <ProtectScenarios>False</ProtectScenarios> 54 </WorksheetOptions> 55 </Worksheet> 56 <Worksheet ss:Name="Ark2"> 57 <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel"> 58 <PageSetup> 59 <Header x:Margin="0" /> 60 <Footer x:Margin="0" /> 61 <PageMargins 62 x:Bottom="0.984251969" 63 x:Left="0.78740157499999996" 64 x:Right="0.78740157499999996" 65 x:Top="0.984251969" /> 66 </PageSetup> 67 <ProtectObjects>False</ProtectObjects> 68 <ProtectScenarios>False</ProtectScenarios> 69 </WorksheetOptions> 70 </Worksheet> 71 <Worksheet ss:Name="Ark3"> 72 <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel"> 73 <PageSetup> 74 <Header x:Margin="0" /> 75 <Footer x:Margin="0" /> 76 <PageMargins 77 x:Bottom="0.984251969" 78 x:Left="0.78740157499999996" 79 x:Right="0.78740157499999996" 80 x:Top="0.984251969" /> 81 </PageSetup> 82 <ProtectObjects>False</ProtectObjects> 83 <ProtectScenarios>False</ProtectScenarios> 84 </WorksheetOptions> 85 </Worksheet> 86 </Workbook>
I'm just observing the impressive preciseness like with x:Bottom="0.984251969"
.
Excel 2003 - MS Office 2003
These much more logic to the names here since it's all 2003. As I recall it there was quite some talk about the XML support in MS Office 2003, and some of the schemas was published in the InfoStructureBase (more on that later). Here'e the possibility to save as XML data
, but that's based on some schema setup I've never tried, but there's also the same option to save as XML like:
1 <?xml version="1.0"?> 2 <?mso-application progid="Excel.Sheet"?> 3 <Workbook 4 xmlns="urn:schemas-microsoft-com:office:spreadsheet" 5 xmlns:o="urn:schemas-microsoft-com:office:office" 6 xmlns:x="urn:schemas-microsoft-com:office:excel" 7 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" 8 xmlns:html="http://www.w3.org/TR/REC-html40"> 9 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office"> 10 <Author>Demo Boy</Author> 11 <LastAuthor>Demo Boy</LastAuthor> 12 <Created>2008-02-24T19:35:19Z</Created> 13 <LastSaved>2008-02-24T19:35:38Z</LastSaved> 14 <Company>Sweetxml</Company> 15 <Version>11.9999</Version> 16 </DocumentProperties> 17 <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel"> 18 <WindowHeight>12525</WindowHeight> 19 <WindowWidth>15195</WindowWidth> 20 <WindowTopX>480</WindowTopX> 21 <WindowTopY>120</WindowTopY> 22 <ProtectStructure>False</ProtectStructure> 23 <ProtectWindows>False</ProtectWindows> 24 </ExcelWorkbook> 25 <Styles> 26 <Style 27 ss:ID="Default" 28 ss:Name="Normal"> 29 <Alignment ss:Vertical="Bottom" /> 30 <Borders /> 31 <Font /> 32 <Interior /> 33 <NumberFormat /> 34 <Protection /> 35 </Style> 36 </Styles> 37 <Worksheet ss:Name="Sheet1"> 38 <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel"> 39 <Selected /> 40 <ProtectObjects>False</ProtectObjects> 41 <ProtectScenarios>False</ProtectScenarios> 42 </WorksheetOptions> 43 </Worksheet> 44 <Worksheet ss:Name="Sheet2"> 45 <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel"> 46 <ProtectObjects>False</ProtectObjects> 47 <ProtectScenarios>False</ProtectScenarios> 48 </WorksheetOptions> 49 </Worksheet> 50 <Worksheet ss:Name="Sheet3"> 51 <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel"> 52 <ProtectObjects>False</ProtectObjects> 53 <ProtectScenarios>False</ProtectScenarios> 54 </WorksheetOptions> 55 </Worksheet> 56 </Workbook>
Excel 2007 - MS Office 2007
With Excel 2007 you can still use the old XML format, and it would now look like:
1 <?xml version="1.0"?> 2 <?mso-application progid="Excel.Sheet"?> 3 <Workbook 4 xmlns="urn:schemas-microsoft-com:office:spreadsheet" 5 xmlns:o="urn:schemas-microsoft-com:office:office" 6 xmlns:x="urn:schemas-microsoft-com:office:excel" 7 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" 8 xmlns:html="http://www.w3.org/TR/REC-html40"> 9 <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office"> 10 <Author>Demo Boy</Author> 11 <LastAuthor>Demo Boy</LastAuthor> 12 <Created>2008-02-24T19:55:23Z</Created> 13 <LastSaved>2008-02-24T19:56:21Z</LastSaved> 14 <Company>Sweetxml</Company> 15 <Version>12.00</Version> 16 </DocumentProperties> 17 <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel"> 18 <WindowHeight>11955</WindowHeight> 19 <WindowWidth>18975</WindowWidth> 20 <WindowTopX>120</WindowTopX> 21 <WindowTopY>45</WindowTopY> 22 <ProtectStructure>False</ProtectStructure> 23 <ProtectWindows>False</ProtectWindows> 24 </ExcelWorkbook> 25 <Styles> 26 <Style 27 ss:ID="Default" 28 ss:Name="Normal"> 29 <Alignment ss:Vertical="Bottom" /> 30 <Borders /> 31 <Font 32 ss:FontName="Calibri" 33 x:Family="Swiss" 34 ss:Size="11" 35 ss:Color="#000000" /> 36 <Interior /> 37 <NumberFormat /> 38 <Protection /> 39 </Style> 40 </Styles> 41 <Worksheet ss:Name="Sheet1"> 42 <Table 43 ss:ExpandedColumnCount="1" 44 ss:ExpandedRowCount="1" 45 x:FullColumns="1" 46 x:FullRows="1" 47 ss:DefaultRowHeight="15"> 48 </Table> 49 <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel"> 50 <PageSetup> 51 <Header x:Margin="0.3" /> 52 <Footer x:Margin="0.3" /> 53 <PageMargins 54 x:Bottom="0.75" 55 x:Left="0.7" 56 x:Right="0.7" 57 x:Top="0.75" /> 58 </PageSetup> 59 <Selected /> 60 <ProtectObjects>False</ProtectObjects> 61 <ProtectScenarios>False</ProtectScenarios> 62 </WorksheetOptions> 63 </Worksheet> 64 <Worksheet ss:Name="Sheet2"> 65 <Table 66 ss:ExpandedColumnCount="1" 67 ss:ExpandedRowCount="1" 68 x:FullColumns="1" 69 x:FullRows="1" 70 ss:DefaultRowHeight="15"> 71 </Table> 72 <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel"> 73 <PageSetup> 74 <Header x:Margin="0.3" /> 75 <Footer x:Margin="0.3" /> 76 <PageMargins 77 x:Bottom="0.75" 78 x:Left="0.7" 79 x:Right="0.7" 80 x:Top="0.75" /> 81 </PageSetup> 82 <ProtectObjects>False</ProtectObjects> 83 <ProtectScenarios>False</ProtectScenarios> 84 </WorksheetOptions> 85 </Worksheet> 86 <Worksheet ss:Name="Sheet3"> 87 <Table 88 ss:ExpandedColumnCount="1" 89 ss:ExpandedRowCount="1" 90 x:FullColumns="1" 91 x:FullRows="1" 92 ss:DefaultRowHeight="15"> 93 </Table> 94 <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel"> 95 <PageSetup> 96 <Header x:Margin="0.3" /> 97 <Footer x:Margin="0.3" /> 98 <PageMargins 99 x:Bottom="0.75" 100 x:Left="0.7" 101 x:Right="0.7" 102 x:Top="0.75" /> 103 </PageSetup> 104 <ProtectObjects>False</ProtectObjects> 105 <ProtectScenarios>False</ProtectScenarios> 106 </WorksheetOptions> 107 </Worksheet> 108 </Workbook>
but more prominently you could use EOOXML for spreadsheets (or how it's coined). This is an all different beast since it's an assembly that's zipped, by default containing the following files and with the suffix/extension .xlsx
:
$ unzip Book1.xlsx Archive: Book1.xlsx inflating: [Content_Types].xml inflating: _rels/.rels inflating: xl/_rels/workbook.xml.rels inflating: xl/workbook.xml inflating: xl/theme/theme1.xml inflating: xl/styles.xml inflating: xl/worksheets/sheet3.xml inflating: xl/worksheets/sheet2.xml inflating: xl/worksheets/sheet1.xml inflating: docProps/core.xml inflating: docProps/app.xml
Nothing less than 11 seperate files, that I've yet to figure out exactly how works but one of the central files is xl/workbook.xml
:
1 <?xml version="1.0" encoding="UTF-8" standalone="yes"?> 2 <workbook 3 xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" 4 xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"> 5 <fileVersion 6 appName="xl" 7 lastEdited="4" 8 lowestEdited="4" 9 rupBuild="4505" /> 10 <workbookPr defaultThemeVersion="124226" /> 11 <bookViews> 12 <workbookView 13 xWindow="120" 14 yWindow="45" 15 windowWidth="18975" 16 windowHeight="11955" /> 17 </bookViews> 18 <sheets> 19 <sheet 20 name="Sheet1" 21 sheetId="1" 22 r:id="rId1" /> 23 <sheet 24 name="Sheet2" 25 sheetId="2" 26 r:id="rId2" /> 27 <sheet 28 name="Sheet3" 29 sheetId="3" 30 r:id="rId3" /> 31 </sheets> 32 <calcPr calcId="125725" /> 33 </workbook>
Calc - Open Office 2
Calc that comes with Open Office 2 support the OpenDocument format (ODF, ISO/IEC 26300, full name: OASIS Open Document Format for Office Applications) but also the MS Office 2003 XML format, as shown with this examples:
1 <?xml version="1.0" encoding="UTF-8"?> 2 <?mso-application progid="Excel.Sheet"?> 3 <Workbook 4 xmlns="urn:schemas-microsoft-com:office:spreadsheet" 5 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 6 xmlns:x="urn:schemas-microsoft-com:office:excel" 7 xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml" 8 xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet" 9 xmlns:o="urn:schemas-microsoft-com:office:office" 10 xmlns:html="http://www.w3.org/TR/REC-html40" 11 xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet"> 12 <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office"> 13 <Colors> 14 <Color> 15 <Index>3</Index> 16 <RGB>#c0c0c0</RGB> 17 </Color> 18 <Color> 19 <Index>4</Index> 20 <RGB>#ff0000</RGB> 21 </Color> 22 </Colors> 23 </OfficeDocumentSettings> 24 <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel"> 25 <WindowHeight>9000</WindowHeight> 26 <WindowWidth>13860</WindowWidth> 27 <WindowTopX>240</WindowTopX> 28 <WindowTopY>75</WindowTopY> 29 <ProtectStructure>False</ProtectStructure> 30 <ProtectWindows>False</ProtectWindows> 31 </ExcelWorkbook> 32 <Styles> 33 <Style 34 ss:ID="Default" 35 ss:Name="Default" /> 36 <Style 37 ss:ID="Result" 38 ss:Name="Result"> 39 <Font 40 ss:Bold="1" 41 ss:Italic="1" 42 ss:Underline="Single" /> 43 </Style> 44 <Style 45 ss:ID="Result2" 46 ss:Name="Result2"> 47 <Font 48 ss:Bold="1" 49 ss:Italic="1" 50 ss:Underline="Single" /> 51 <NumberFormat ss:Format="Currency" /> 52 </Style> 53 <Style 54 ss:ID="Heading" 55 ss:Name="Heading"> 56 <Alignment ss:Horizontal="Center" /> 57 <Font 58 ss:Bold="1" 59 ss:Italic="1" 60 ss:Size="16" /> 61 </Style> 62 <Style 63 ss:ID="Heading1" 64 ss:Name="Heading1"> 65 <Alignment 66 ss:Horizontal="Center" 67 ss:Rotate="90" /> 68 <Font 69 ss:Bold="1" 70 ss:Italic="1" 71 ss:Size="16" /> 72 </Style> 73 <Style ss:ID="co1" /> 74 <Style ss:ID="ta1" /> 75 </Styles> 76 <ss:Worksheet ss:Name="Sheet1"> 77 <Table ss:StyleID="ta1"> 78 <Column 79 ss:StyleID="Default" 80 ss:Width="64.26" /> 81 <Row ss:Height="12.1896"> 82 <Cell ss:Index="1" /> 83 </Row> 84 </Table> 85 <x:WorksheetOptions /> 86 </ss:Worksheet> 87 <ss:Worksheet ss:Name="Sheet2"> 88 <Table ss:StyleID="ta1"> 89 <Column 90 ss:StyleID="Default" 91 ss:Width="64.26" /> 92 <Row ss:Height="12.1896"> 93 <Cell ss:Index="1" /> 94 </Row> 95 </Table> 96 <x:WorksheetOptions /> 97 </ss:Worksheet> 98 <ss:Worksheet ss:Name="Sheet3"> 99 <Table ss:StyleID="ta1"> 100 <Column 101 ss:StyleID="Default" 102 ss:Width="64.26" /> 103 <Row ss:Height="12.1896"> 104 <Cell ss:Index="1" /> 105 </Row> 106 </Table> 107 <x:WorksheetOptions /> 108 </ss:Worksheet> 109 </Workbook>
The ODF format is in the form of .ods
files which again are an assembly like the one for EOOXML:
$ unzip mappe1.ods Archive: mappe1.ods extracting: mimetype creating: Configurations2/statusbar/ inflating: Configurations2/accelerator/current.xml creating: Configurations2/floater/ creating: Configurations2/popupmenu/ creating: Configurations2/progressbar/ creating: Configurations2/menubar/ creating: Configurations2/toolbar/ creating: Configurations2/images/Bitmaps/ inflating: content.xml inflating: styles.xml extracting: meta.xml inflating: Thumbnails/thumbnail.png inflating: settings.xml inflating: META-INF/manifest.xml
So spreadsheets can be loaded/saved as XML file formats and for the pragmatic programmer it looks like the XML format that came with MS Office 2003 can be used as an interoperable format. The only gotchas that I know of is that very often functions or macros are involved with the use of spreadsheets and here interoperability is not to be expected in general, and in general following a specified format in terms of XML Schema still leaves room for lack of interoperability.
There's a couple of MSDN articles about the format:
- XML in Excel and the Spreadsheet Component
- Dive into SpreadsheetML (Part 1 of 2)
- Dive into SpreadsheetML (Part 2 of 2)
The Schemas can be downloaded, they'rea called Office 2003: XML Reference Schemas (it's an .exe
file but just as a zip arhcive). As mentioned earlier you can also find the schemas in the ISB under XML Spreadsheet 2002. Here's an alternative representation of the Introduction to the Microsoft Office 2003 Reference Schemas:
- urn:schemas-microsoft-com:office:component:spreadsheet [Spreadsheet component]
- uuid:C2F41010-65B3-11d1-A29F-00AA00C14882 [Data Type]
This namespace contains the data types that can be used for columns in PivotTables.
- http://schemas.microsoft.com/office/excel/2003/xml [XML Spreadsheet 2003]
This namespace contains functionality added in Microsoft Office Excel 2003, which mainly includes mapping XML documents
- urn:schemas-microsoft-com:office:excel [XML Spreadsheet 2003]
This namespace defines elements and attributes used to describe more complex features of Excel, such as PivotTables, worksheet options, and validation.
- urn:schemas-microsoft-com:office:office [Common Properties]
This namespace defines the shared office schema and document properties collection (such as author, creation date) and document statistics (such as word count).
- #RowsetSchema [Model Row]
This namespace provides a description of a generalized row element that is used as a model for validation purposes when using PivotTables.
- urn:schemas-microsoft-com:rowset [Rowset]
This namespace is used to describe the attibutes of the row element for PivotTables.
- uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882 [Rowset Definition]
This namespace contains elements and attributes which describe the row elements for PivotTables.
- http://schemas.microsoft.com/data/udc [Universal Data Connection]
This namespace contains elements and attributes which describe Universal Data Connections that are used in Microsoft Office Excel 2003 to connect to external data sources.
- http://schemas.microsoft.com/data/udc/soap [Universal Data Connection SOAP]
This namespace defines external data connections when using the SOAP protocol.
- http://schemas.microsoft.com/data/udc/xmlfile [Universal Data Connection XML File]
This namespace defines only one element, File, which specifies the pathname of an XML file that is used by UDC.
- urn:schemas-microsoft-com:vml [Vector Markup Language]
VML is an application of Extensible Markup Language (XML) 1.0 which defines a format for the encoding of vector information together with additional markup to describe how that information may be displayed and edited.