Friday, February 29, 2008

A quick peek at XML formats for Spreadsheets

pencil icon, that"s clickable to start editing the post

In this post I'll have peek at which XML formats there's available for spreadsheets, here primarily the Microsoft variant in Excel and secondary in Open Office. This is NOT an entry in the flame war between ODF and EOOXML, since i neither have the knowledge nor insight to contribute in that debate.

Lately I've been using Excel for several tasks and since the launch of Microsoft Office 2007 there's been a massive campaign about the support for XML in MS Office 2007 - but hey, theres been support for if for a long time, as I'll show in this post. Now it's not necessarily so easy to define support for XML, since this can potentially be a wide array of features and probably the ways you can use XML with MS Office 2007 has increased since MS Office XP and 2003.

Throughout this post I'll use the simplest examples possible - that's empty ones. I could had inserted a hello world but figured it would not matter.

Excel 2002 - MS Office XP

Excel 2002 which came with MS Office XP (In 2001 accordin to wikipedia, that's seven years ago). Note: what went wrong with the naming here?. From back then it was possible to save (and load) as XML, and the empty example would look like:

    1 <?xml version="1.0"?>
    2 <Workbook
    3   xmlns="urn:schemas-microsoft-com:office:spreadsheet"
    4   xmlns:o="urn:schemas-microsoft-com:office:office"
    5   xmlns:x="urn:schemas-microsoft-com:office:excel"
    6   xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
    7   xmlns:html="http://www.w3.org/TR/REC-html40">
    8   <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
    9     <Author>Demo Boy</Author>
   10     <LastAuthor>Demo Boy</LastAuthor>
   11     <Created>2008-02-23T10:04:48Z</Created>
   12     <LastSaved>2008-02-23T11:11:05Z</LastSaved>
   13     <Company>Sweetxml</Company>
   14     <Version>10.6830</Version>
   15   </DocumentProperties>
   16   <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
   17     <DownloadComponents />
   18     <LocationOfComponents HRef="file:///\\" />
   19   </OfficeDocumentSettings>
   20   <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
   21     <WindowHeight>12525</WindowHeight>
   22     <WindowWidth>17115</WindowWidth>
   23     <WindowTopX>360</WindowTopX>
   24     <WindowTopY>120</WindowTopY>
   25     <ProtectStructure>False</ProtectStructure>
   26     <ProtectWindows>False</ProtectWindows>
   27   </ExcelWorkbook>
   28   <Styles>
   29     <Style
   30       ss:ID="Default"
   31       ss:Name="Normal">
   32       <Alignment ss:Vertical="Bottom" />
   33       <Borders />
   34       <Font />
   35       <Interior />
   36       <NumberFormat />
   37       <Protection />
   38     </Style>
   39   </Styles>
   40   <Worksheet ss:Name="Ark1">
   41     <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   42       <PageSetup>
   43         <Header x:Margin="0" />
   44         <Footer x:Margin="0" />
   45         <PageMargins
   46           x:Bottom="0.984251969"
   47           x:Left="0.78740157499999996"
   48           x:Right="0.78740157499999996"
   49           x:Top="0.984251969" />
   50       </PageSetup>
   51       <Selected />
   52       <ProtectObjects>False</ProtectObjects>
   53       <ProtectScenarios>False</ProtectScenarios>
   54     </WorksheetOptions>
   55   </Worksheet>
   56   <Worksheet ss:Name="Ark2">
   57     <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   58       <PageSetup>
   59         <Header x:Margin="0" />
   60         <Footer x:Margin="0" />
   61         <PageMargins
   62           x:Bottom="0.984251969"
   63           x:Left="0.78740157499999996"
   64           x:Right="0.78740157499999996"
   65           x:Top="0.984251969" />
   66       </PageSetup>
   67       <ProtectObjects>False</ProtectObjects>
   68       <ProtectScenarios>False</ProtectScenarios>
   69     </WorksheetOptions>
   70   </Worksheet>
   71   <Worksheet ss:Name="Ark3">
   72     <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   73       <PageSetup>
   74         <Header x:Margin="0" />
   75         <Footer x:Margin="0" />
   76         <PageMargins
   77           x:Bottom="0.984251969"
   78           x:Left="0.78740157499999996"
   79           x:Right="0.78740157499999996"
   80           x:Top="0.984251969" />
   81       </PageSetup>
   82       <ProtectObjects>False</ProtectObjects>
   83       <ProtectScenarios>False</ProtectScenarios>
   84     </WorksheetOptions>
   85   </Worksheet>
   86 </Workbook>

I'm just observing the impressive preciseness like with x:Bottom="0.984251969".

Excel 2003 - MS Office 2003

These much more logic to the names here since it's all 2003. As I recall it there was quite some talk about the XML support in MS Office 2003, and some of the schemas was published in the InfoStructureBase (more on that later). Here'e the possibility to save as XML data, but that's based on some schema setup I've never tried, but there's also the same option to save as XML like:

    1 <?xml version="1.0"?>
    2 <?mso-application progid="Excel.Sheet"?>
    3 <Workbook
    4   xmlns="urn:schemas-microsoft-com:office:spreadsheet"
    5   xmlns:o="urn:schemas-microsoft-com:office:office"
    6   xmlns:x="urn:schemas-microsoft-com:office:excel"
    7   xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
    8   xmlns:html="http://www.w3.org/TR/REC-html40">
    9   <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
   10     <Author>Demo Boy</Author>
   11     <LastAuthor>Demo Boy</LastAuthor>
   12     <Created>2008-02-24T19:35:19Z</Created>
   13     <LastSaved>2008-02-24T19:35:38Z</LastSaved>
   14     <Company>Sweetxml</Company>
   15     <Version>11.9999</Version>
   16   </DocumentProperties>
   17   <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
   18     <WindowHeight>12525</WindowHeight>
   19     <WindowWidth>15195</WindowWidth>
   20     <WindowTopX>480</WindowTopX>
   21     <WindowTopY>120</WindowTopY>
   22     <ProtectStructure>False</ProtectStructure>
   23     <ProtectWindows>False</ProtectWindows>
   24   </ExcelWorkbook>
   25   <Styles>
   26     <Style
   27       ss:ID="Default"
   28       ss:Name="Normal">
   29       <Alignment ss:Vertical="Bottom" />
   30       <Borders />
   31       <Font />
   32       <Interior />
   33       <NumberFormat />
   34       <Protection />
   35     </Style>
   36   </Styles>
   37   <Worksheet ss:Name="Sheet1">
   38     <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   39       <Selected />
   40       <ProtectObjects>False</ProtectObjects>
   41       <ProtectScenarios>False</ProtectScenarios>
   42     </WorksheetOptions>
   43   </Worksheet>
   44   <Worksheet ss:Name="Sheet2">
   45     <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   46       <ProtectObjects>False</ProtectObjects>
   47       <ProtectScenarios>False</ProtectScenarios>
   48     </WorksheetOptions>
   49   </Worksheet>
   50   <Worksheet ss:Name="Sheet3">
   51     <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   52       <ProtectObjects>False</ProtectObjects>
   53       <ProtectScenarios>False</ProtectScenarios>
   54     </WorksheetOptions>
   55   </Worksheet>
   56 </Workbook>

Excel 2007 - MS Office 2007

With Excel 2007 you can still use the old XML format, and it would now look like:

    1 <?xml version="1.0"?>
    2 <?mso-application progid="Excel.Sheet"?>
    3 <Workbook
    4   xmlns="urn:schemas-microsoft-com:office:spreadsheet"
    5   xmlns:o="urn:schemas-microsoft-com:office:office"
    6   xmlns:x="urn:schemas-microsoft-com:office:excel"
    7   xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
    8   xmlns:html="http://www.w3.org/TR/REC-html40">
    9   <DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
   10     <Author>Demo Boy</Author>
   11     <LastAuthor>Demo Boy</LastAuthor>
   12     <Created>2008-02-24T19:55:23Z</Created>
   13     <LastSaved>2008-02-24T19:56:21Z</LastSaved>
   14     <Company>Sweetxml</Company>
   15     <Version>12.00</Version>
   16   </DocumentProperties>
   17   <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
   18     <WindowHeight>11955</WindowHeight>
   19     <WindowWidth>18975</WindowWidth>
   20     <WindowTopX>120</WindowTopX>
   21     <WindowTopY>45</WindowTopY>
   22     <ProtectStructure>False</ProtectStructure>
   23     <ProtectWindows>False</ProtectWindows>
   24   </ExcelWorkbook>
   25   <Styles>
   26     <Style
   27       ss:ID="Default"
   28       ss:Name="Normal">
   29       <Alignment ss:Vertical="Bottom" />
   30       <Borders />
   31       <Font
   32         ss:FontName="Calibri"
   33         x:Family="Swiss"
   34         ss:Size="11"
   35         ss:Color="#000000" />
   36       <Interior />
   37       <NumberFormat />
   38       <Protection />
   39     </Style>
   40   </Styles>
   41   <Worksheet ss:Name="Sheet1">
   42     <Table
   43       ss:ExpandedColumnCount="1"
   44       ss:ExpandedRowCount="1"
   45       x:FullColumns="1"
   46       x:FullRows="1"
   47       ss:DefaultRowHeight="15">
   48     </Table>
   49     <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   50       <PageSetup>
   51         <Header x:Margin="0.3" />
   52         <Footer x:Margin="0.3" />
   53         <PageMargins
   54           x:Bottom="0.75"
   55           x:Left="0.7"
   56           x:Right="0.7"
   57           x:Top="0.75" />
   58       </PageSetup>
   59       <Selected />
   60       <ProtectObjects>False</ProtectObjects>
   61       <ProtectScenarios>False</ProtectScenarios>
   62     </WorksheetOptions>
   63   </Worksheet>
   64   <Worksheet ss:Name="Sheet2">
   65     <Table
   66       ss:ExpandedColumnCount="1"
   67       ss:ExpandedRowCount="1"
   68       x:FullColumns="1"
   69       x:FullRows="1"
   70       ss:DefaultRowHeight="15">
   71     </Table>
   72     <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   73       <PageSetup>
   74         <Header x:Margin="0.3" />
   75         <Footer x:Margin="0.3" />
   76         <PageMargins
   77           x:Bottom="0.75"
   78           x:Left="0.7"
   79           x:Right="0.7"
   80           x:Top="0.75" />
   81       </PageSetup>
   82       <ProtectObjects>False</ProtectObjects>
   83       <ProtectScenarios>False</ProtectScenarios>
   84     </WorksheetOptions>
   85   </Worksheet>
   86   <Worksheet ss:Name="Sheet3">
   87     <Table
   88       ss:ExpandedColumnCount="1"
   89       ss:ExpandedRowCount="1"
   90       x:FullColumns="1"
   91       x:FullRows="1"
   92       ss:DefaultRowHeight="15">
   93     </Table>
   94     <WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
   95       <PageSetup>
   96         <Header x:Margin="0.3" />
   97         <Footer x:Margin="0.3" />
   98         <PageMargins
   99           x:Bottom="0.75"
  100           x:Left="0.7"
  101           x:Right="0.7"
  102           x:Top="0.75" />
  103       </PageSetup>
  104       <ProtectObjects>False</ProtectObjects>
  105       <ProtectScenarios>False</ProtectScenarios>
  106     </WorksheetOptions>
  107   </Worksheet>
  108 </Workbook>

but more prominently you could use EOOXML for spreadsheets (or how it's coined). This is an all different beast since it's an assembly that's zipped, by default containing the following files and with the suffix/extension .xlsx:

$ unzip Book1.xlsx
Archive:  Book1.xlsx
  inflating: [Content_Types].xml
  inflating: _rels/.rels
  inflating: xl/_rels/workbook.xml.rels
  inflating: xl/workbook.xml
  inflating: xl/theme/theme1.xml
  inflating: xl/styles.xml
  inflating: xl/worksheets/sheet3.xml
  inflating: xl/worksheets/sheet2.xml
  inflating: xl/worksheets/sheet1.xml
  inflating: docProps/core.xml
  inflating: docProps/app.xml

Nothing less than 11 seperate files, that I've yet to figure out exactly how works but one of the central files is xl/workbook.xml:

    1 <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    2 <workbook
    3   xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
    4   xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    5   <fileVersion
    6     appName="xl"
    7     lastEdited="4"
    8     lowestEdited="4"
    9     rupBuild="4505" />
   10   <workbookPr defaultThemeVersion="124226" />
   11   <bookViews>
   12     <workbookView
   13       xWindow="120"
   14       yWindow="45"
   15       windowWidth="18975"
   16       windowHeight="11955" />
   17   </bookViews>
   18   <sheets>
   19     <sheet
   20       name="Sheet1"
   21       sheetId="1"
   22       r:id="rId1" />
   23     <sheet
   24       name="Sheet2"
   25       sheetId="2"
   26       r:id="rId2" />
   27     <sheet
   28       name="Sheet3"
   29       sheetId="3"
   30       r:id="rId3" />
   31   </sheets>
   32   <calcPr calcId="125725" />
   33 </workbook>

Calc - Open Office 2

Calc that comes with Open Office 2 support the OpenDocument format (ODF, ISO/IEC 26300, full name: OASIS Open Document Format for Office Applications) but also the MS Office 2003 XML format, as shown with this examples:

    1 <?xml version="1.0" encoding="UTF-8"?>
    2 <?mso-application progid="Excel.Sheet"?>
    3 <Workbook
    4   xmlns="urn:schemas-microsoft-com:office:spreadsheet"
    5   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    6   xmlns:x="urn:schemas-microsoft-com:office:excel"
    7   xmlns:x2="http://schemas.microsoft.com/office/excel/2003/xml"
    8   xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
    9   xmlns:o="urn:schemas-microsoft-com:office:office"
   10   xmlns:html="http://www.w3.org/TR/REC-html40"
   11   xmlns:c="urn:schemas-microsoft-com:office:component:spreadsheet">
   12   <OfficeDocumentSettings xmlns="urn:schemas-microsoft-com:office:office">
   13     <Colors>
   14       <Color>
   15         <Index>3</Index>
   16         <RGB>#c0c0c0</RGB>
   17       </Color>
   18       <Color>
   19         <Index>4</Index>
   20         <RGB>#ff0000</RGB>
   21       </Color>
   22     </Colors>
   23   </OfficeDocumentSettings>
   24   <ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
   25     <WindowHeight>9000</WindowHeight>
   26     <WindowWidth>13860</WindowWidth>
   27     <WindowTopX>240</WindowTopX>
   28     <WindowTopY>75</WindowTopY>
   29     <ProtectStructure>False</ProtectStructure>
   30     <ProtectWindows>False</ProtectWindows>
   31   </ExcelWorkbook>
   32   <Styles>
   33     <Style
   34       ss:ID="Default"
   35       ss:Name="Default" />
   36     <Style
   37       ss:ID="Result"
   38       ss:Name="Result">
   39       <Font
   40         ss:Bold="1"
   41         ss:Italic="1"
   42         ss:Underline="Single" />
   43     </Style>
   44     <Style
   45       ss:ID="Result2"
   46       ss:Name="Result2">
   47       <Font
   48         ss:Bold="1"
   49         ss:Italic="1"
   50         ss:Underline="Single" />
   51       <NumberFormat ss:Format="Currency" />
   52     </Style>
   53     <Style
   54       ss:ID="Heading"
   55       ss:Name="Heading">
   56       <Alignment ss:Horizontal="Center" />
   57       <Font
   58         ss:Bold="1"
   59         ss:Italic="1"
   60         ss:Size="16" />
   61     </Style>
   62     <Style
   63       ss:ID="Heading1"
   64       ss:Name="Heading1">
   65       <Alignment
   66         ss:Horizontal="Center"
   67         ss:Rotate="90" />
   68       <Font
   69         ss:Bold="1"
   70         ss:Italic="1"
   71         ss:Size="16" />
   72     </Style>
   73     <Style ss:ID="co1" />
   74     <Style ss:ID="ta1" />
   75   </Styles>
   76   <ss:Worksheet ss:Name="Sheet1">
   77     <Table ss:StyleID="ta1">
   78       <Column
   79         ss:StyleID="Default"
   80         ss:Width="64.26" />
   81       <Row ss:Height="12.1896">
   82         <Cell ss:Index="1" />
   83       </Row>
   84     </Table>
   85     <x:WorksheetOptions />
   86   </ss:Worksheet>
   87   <ss:Worksheet ss:Name="Sheet2">
   88     <Table ss:StyleID="ta1">
   89       <Column
   90         ss:StyleID="Default"
   91         ss:Width="64.26" />
   92       <Row ss:Height="12.1896">
   93         <Cell ss:Index="1" />
   94       </Row>
   95     </Table>
   96     <x:WorksheetOptions />
   97   </ss:Worksheet>
   98   <ss:Worksheet ss:Name="Sheet3">
   99     <Table ss:StyleID="ta1">
  100       <Column
  101         ss:StyleID="Default"
  102         ss:Width="64.26" />
  103       <Row ss:Height="12.1896">
  104         <Cell ss:Index="1" />
  105       </Row>
  106     </Table>
  107     <x:WorksheetOptions />
  108   </ss:Worksheet>
  109 </Workbook>

The ODF format is in the form of .ods files which again are an assembly like the one for EOOXML:

$ unzip mappe1.ods
Archive:  mappe1.ods
 extracting: mimetype
   creating: Configurations2/statusbar/
  inflating: Configurations2/accelerator/current.xml
   creating: Configurations2/floater/
   creating: Configurations2/popupmenu/
   creating: Configurations2/progressbar/
   creating: Configurations2/menubar/
   creating: Configurations2/toolbar/
   creating: Configurations2/images/Bitmaps/
  inflating: content.xml
  inflating: styles.xml
 extracting: meta.xml
  inflating: Thumbnails/thumbnail.png
  inflating: settings.xml
  inflating: META-INF/manifest.xml

So spreadsheets can be loaded/saved as XML file formats and for the pragmatic programmer it looks like the XML format that came with MS Office 2003 can be used as an interoperable format. The only gotchas that I know of is that very often functions or macros are involved with the use of spreadsheets and here interoperability is not to be expected in general, and in general following a specified format in terms of XML Schema still leaves room for lack of interoperability.

There's a couple of MSDN articles about the format:

The Schemas can be downloaded, they'rea called Office 2003: XML Reference Schemas (it's an .exe file but just as a zip arhcive). As mentioned earlier you can also find the schemas in the ISB under XML Spreadsheet 2002. Here's an alternative representation of the Introduction to the Microsoft Office 2003 Reference Schemas:

urn:schemas-microsoft-com:office:component:spreadsheet [Spreadsheet component]

uuid:C2F41010-65B3-11d1-A29F-00AA00C14882 [Data Type]

This namespace contains the data types that can be used for columns in PivotTables.

http://schemas.microsoft.com/office/excel/2003/xml [XML Spreadsheet 2003]

This namespace contains functionality added in Microsoft Office Excel 2003, which mainly includes mapping XML documents

  • urn:schemas-microsoft-com:office:excel [XML Spreadsheet 2003]
  • This namespace defines elements and attributes used to describe more complex features of Excel, such as PivotTables, worksheet options, and validation.

    urn:schemas-microsoft-com:office:office [Common Properties]

    This namespace defines the shared office schema and document properties collection (such as author, creation date) and document statistics (such as word count).

    #RowsetSchema [Model Row]

    This namespace provides a description of a generalized row element that is used as a model for validation purposes when using PivotTables.

  • urn:schemas-microsoft-com:rowset [Rowset]
  • This namespace is used to describe the attibutes of the row element for PivotTables.

    uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882 [Rowset Definition]

    This namespace contains elements and attributes which describe the row elements for PivotTables.

    http://schemas.microsoft.com/data/udc [Universal Data Connection]

    This namespace contains elements and attributes which describe Universal Data Connections that are used in Microsoft Office Excel 2003 to connect to external data sources.

    http://schemas.microsoft.com/data/udc/soap [Universal Data Connection SOAP]

    This namespace defines external data connections when using the SOAP protocol.

    http://schemas.microsoft.com/data/udc/xmlfile [Universal Data Connection XML File]

    This namespace defines only one element, File, which specifies the pathname of an XML file that is used by UDC.

    urn:schemas-microsoft-com:vml [Vector Markup Language]

    VML is an application of Extensible Markup Language (XML) 1.0 which defines a format for the encoding of vector information together with additional markup to describe how that information may be displayed and edited.

    0 comments :