About the Microsoft SDK for Open XML Formats
Office Open XML (OpenXML) is an open standard for word-processing documents, presentations, and spreadsheets that can be freely implemented by multiple applications on different platforms. OpenXML is designed to faithfully represent existing word-processing documents, presentations, and spreadsheets that are encoded in binary formats defined by Microsoft Office applications. The reason for the need for OpenXML is simple: billions of documents now exist but, unfortunately, the information in those documents is tightly coupled with the programs that created them. The purpose of the OpenXML standard is to decouple documents created by Microsoft Office applications so that they can be manipulated by other applications independent of proprietary formats and without the loss of data.
Structure of an OpenXML Package
An OpenXML file is stored in a ZIP archive for packaging and compression. You can view the structure of any OpenXML file using a ZIP viewer. An OpenXML document is built of multiple parts. The relationships between the parts are themselves stored in parts. The ZIP format supports random access to each part. For example, an application can move a slide from one Microsoft Office PowerPoint 2007 presentation to another presentation without parsing the slide content. Likewise, an application can strip all of the comments out of a word processing document without parsing any of its contents.
The parts in an OpenXML package are created as XML markup.
Because XML is structured plain text, you can view the contents of a part using text readers or you can parse the contents using processes such as XPath.
Structurally, an OpenXML document is an Open Packaging Conventions (OPC) package. As stated previously, a package is composed of a collection of parts. Each part has a part name that consists of a sequence of segments or a pathname such as "/word/theme/theme1.xml." The package contains a [Content_Types].xml part that allows you to determine the content type of all parts in the package. A set of explicit relationships for a source package or part is contained in a relationships part that ends with the .rels extension.
Word 2007 documents are defined using WordprocessingML markup. A document is composed of a collection of stories where each story is one of the following:
• Main document (the only required story)
• Glossary document
• Header and footer
• Comments
• Text box
• Footnote and endnote
PowerPoint 2007 presentations are described by PresentationML markup. Presentation packages can contain the following parts:
• Slide master
• Notes master
• Handout master
• Slide layout
• Notes
An Excel 2007 workbook is described by using SpreadsheetML markup. Workbook packages can contain:
• Workbook part (required part)
• One or more worksheets
• Charts
• Tables
• Custom XML
The Microsoft SDK for Open XML Formats Technology Preview
The object model in the Microsoft SDK for Open XML Formats Technology Preview simplifies the manipulation of OpenXML packages. The Open XML object model encapsulates many of the common tasks that developers typically perform on OpenXML packages, so you can perform complex operations with just a few lines of code. Some common tasks:
• Search: With a few lines of code, you can search a collection of Excel 2007 worksheets for some arbitrary data.
• Document assembly: You can create documents by combining the parts of existing documents programmatically. For example, you can pull slides from various PowerPoint 2007 presentations to create a single presentation.
• Validation: With a few lines of code, you can validate the parts in a package or validate an entire package against a schema.
• Data update: With the Open XML object model, you can easily modify the data in multiple packages.
• Privacy: With a few lines of code, you can remove comments and other personal information from a document before it is distributed.
The Open XML object model can be used in any language supported by the .NET framework. The help topics presented in this SDK provide code samples in C# and Visual Basic .NET.
Using the code samples in the help topics in this SDK as a starting point, you can take advantage of the OpenXML standards in the 2007 Office system. The Open XML object model relieves much of the tedium of working with OPC documents and is well worth your time to explore.

