Intermediate development releases are available for download here.
A first beta version of the Office Binary (doc, xls, ppt) Translator to Open XML is available for download. The converter accepts files in the Word 97-2007 binary file format and produces OpenXML documents.
This community technology preview release is a development snapshot demonstrating the project’s progress. It is not yet supporting the full feature set of Word documents. Currently converted features are:
Please feel free to try out and give feedback. There are also some sample documents available for download.
Download:
Structured storage (also known as compound file) is a technology to store hierarchical data within a single file. Microsoft Office uses the structured storage as a container for storing binary Office documents (doc, xls, ppt).
Such a structured storage container is made up of a number of virtual streams which contain text, data and control structures of the binary Office documents, i.e. the container is like a small file system of its own. The content of these streams or subfiles is document type-specific, i.e. Word documents contain other streams than Excel spreadsheets or PowerPoint presentations.
Based on the compound binary file format specification made available by Microsoft we have developed a .NET/C# assembly to open and read such a structured storage file. This assembly does not refer to any Windows API calls, thus, porting to another platform (e.g. using Mono) is feasible.
The assembly implements the following major classes:
StorageReaderVirtualStreamThese classes implement the following main methods for accessing a virtual stream:
public StorageReader(string fileName)public VirtualStream GetStream(string path)public int Read(byte[] array, int count, int position) / public int ReadByte(int
position)Consequently, reading a stream in a structured storage consists of the following sequence of calls:

The package available for download additionally contains a command line test application
called CompoundFileExtractTest. This test application can be used to extract
streams of one or more compound files.
Usage: CompoundFileExtractTest.exe <file1> [<file2> ...]
The streams of a given file are extracted to a folder with the same name as the
file except the name is prefixed by '_' and occurrences of '.' are replaced
by '_'. Example: The streams of a file with name 'file.name.ext' are
extracted to a folder with name '_file_name_ext'.
Download: