Office Binary (doc, xls, ppt) Translator to Open XML

Intermediate development releases are available for download here.

Table of Contents

Office Binary (doc, xls, ppt) Translator to Open XML 0.1 M1 Beta CTP

A first beta version of the Office Binary (doc, xls, ppt) Translator to Open XML is available for download. The converter accepts files in the Word 97-2007 binary file format and produces OpenXML documents.

This community technology preview release is a development snapshot demonstrating the project’s progress. It is not yet supporting the full feature set of Word documents. Currently converted features are:

Please feel free to try out and give feedback. There are also some sample documents available for download.

Download:

 

Structured Storage Reader

Structured storage (also known as compound file) is a technology to store hierarchical data within a single file. Microsoft Office uses the structured storage as a container for storing binary Office documents (doc, xls, ppt).

Such a structured storage container is made up of a number of virtual streams which contain text, data and control structures of the binary Office documents, i.e. the container is like a small file system of its own. The content of these streams or subfiles is document type-specific, i.e. Word documents contain other streams than Excel spreadsheets or PowerPoint presentations.

Based on the compound binary file format specification made available by Microsoft we have developed a .NET/C# assembly to open and read such a structured storage file. This assembly does not refer to any Windows API calls, thus, porting to another platform (e.g. using Mono) is feasible.

The assembly implements the following major classes:

These classes implement the following main methods for accessing a virtual stream:

Consequently, reading a stream in a structured storage consists of the following sequence of calls:

Storage Reader: Usage

The package available for download additionally contains a command line test application called CompoundFileExtractTest. This test application can be used to extract streams of one or more compound files.
Usage: CompoundFileExtractTest.exe <file1> [<file2> ...]
The streams of a given file are extracted to a folder with the same name as the file except the name is prefixed by '_' and occurrences of '.' are replaced by '_'. Example: The streams of a file with name 'file.name.ext' are extracted to a folder with name '_file_name_ext'.

Download:

Project page on SourceForge

SourceForge.net Logo