Saturday, March 29, 2008

A minimal word document in OOXML format (WordprocessingML)

pencil icon, that"s clickable to start editing the post

I wanted to a first real look at OOXML documents (for Word) and decided to go for a minimal example. Luckily I was not the first to do this so I found several resources, which also gave me a trip around commmandline'in .NET and an aspect of versioning.

CreateDOCX

The first resource I found was a blog entry by Doug Mahugh called CreateDOCX Sample Program. I downloaded the example code, jumped into the Debug directory and ran the program with the following result:

C:\misc\CreateDOCX\CreateDOCX\bin\Debug>CreateDOCX.exe sx.doc "Hello Sweetxml.org"

Unhandled Exception: System.IO.FileNotFoundException: Could not load file or assembly 'WindowsBase, Version=3.0.51116.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. Den angivne fil blev ikke fundet.
File name: 'WindowsBase, Version=3.0.51116.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35'
   at CreateDOCX.SaveDOCX(String fileName, String BodyText)
   at CreateDOCX.Main() in D:\projects\OpenXmlDeveloper\PackagingAPI\CreateDOCX\CreateDOCX\CreateDOCX.cs:line 26

WRN: Assembly binding logging is turned OFF.
To enable assembly bind failure logging, set the registry value [HKLM\Software\Microsoft\Fusion!EnableLog] (DWORD) to 1.
Note: There is some performance penalty associated with assembly bind failure logging.
To turn this feature off, remove the registry value [HKLM\Software\Microsoft\Fusion!EnableLog].

I'm not really sure why there's a dependency on the source file, but I decided to compile it myself from the commandline, with this little batch file:

echo on

set csc="C:\WINDOWS\Microsoft.NET\Framework\v3.5\csc.exe"
set winBase="C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.0\WindowsBase.dll"

set exe="build\simpleDocx.exe"

del /Q build\*

%csc%  /nologo  /target:exe /reference:%winBase% /out:%exe% CreateDOCX\CreateDOCX.cs

%exe% sx.docx "Hello Sweetxml.org"

After having run this with success I was ready to open the file with Word 2007, just to get this warning:

After comparing with a similar document created with Word 2007 i found the reason, I should had paid notice to when the article was from (2006) cause the namespace used in the program: http://schemas.openxmlformats.org/wordprocessingml/2006/3/main isn't what's used in the final version (standardization in progress) which is http://schemas.openxmlformats.org/wordprocessingml/2006/main. There's a description of this behaviour in Message when you try to open a file that was saved in a prerelease version of the 2007 Office programs: "This file was created in a previous beta version" or "This workbook was created in an earlier beta version".

As for versioning Word 2007 is aware of this earlier version, and hopefully it's close to the released version since otherwise handling multiple versions can turn out as a big challenge - something that catching up with the standardization process will demand, a situation that in general is hard to handle, having support for multiple versions with possibly different data model and/or semantics.

The simple 'Hello Sweetxml.org' document shown in Word 2007

A truely minimal document

An better resource is Jesper Lund Stocholms blog entry (in danish): Venstrehåndsarbejde på Version2 where he creates a truely minimal OOXML Word document "Hello World!"-OOXML dockument.

The Ecma Office Open XML File Formats Standard - Primer and whitepaper

Last but not the least the, and had I only read the (proposed) standard ealier I would had known, the Primer [PDF] (Ecma TC45 Final Draft Part 3), has a minimal document in section 2.2 Basic Document Structure.

Also the whitepaper from ECMA TC45 OFFICE OPEN XML OVERVIEW has a minimal example in section 5.6 MINIMAL WORDPROCESSINGML DOCUMENT.

Debugging corrupt documents

Fiddling around with the content/documents in a more complex document/assembly will easily give you this experience when you try to open it in Word 2007:

Document is corrupt prompt from Word 2007

And I guess that is just the risk I have to accept if I alter documents by hand or with just basic support tools, with the only test options in schema validation and trying to open the documents with word or another compliant tool.

1 comments :

Anonymous said...

Hi Brian,

Since you are using .Net 3.5 Fx you could optimize Dougs code to something like:


public static void createDocx35(string fileName, string BodyText)
{
XNamespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

XElement document =
new XElement(w + "document",
new XElement(w + "body",
new XElement(w + "p",
new XElement(w + "r",
new XElement(w + "t", BodyText)))));

XDocument doc = new XDocument();


Package pkgOutputDoc = null;
pkgOutputDoc = Package.Open(fileName, FileMode.Create, FileAccess.ReadWrite);

Uri uri = new Uri("/sovs_og_kartofler/document.xml", UriKind.Relative);
PackagePart partDocumentXML = pkgOutputDoc.CreatePart(uri,
"application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml");
StreamWriter streamStartPart = new StreamWriter(partDocumentXML.GetStream(FileMode.Create, FileAccess.Write));
document.Save(streamStartPart);
streamStartPart.Close();
pkgOutputDoc.Flush();


pkgOutputDoc.CreateRelationship(uri, TargetMode.Internal,
"http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument", "rId1");
pkgOutputDoc.Flush();
pkgOutputDoc.Close();

}


But perhaps you already move the DOM boilderplate code :-)

-René