This community technology preview release is a development snapshot demonstrating the project’s progress. It is not yet supporting the full feature set of Word documents. Currently converted features are:
Download:
Excel's underlying file format structure is also known as BIFF (Binary Interchange File Format). For analysis and debugging purposes we started the xls/xlsx mapping with the development of a small tool for viewing the BIFF structure of a binary Excel sheet.
Since this byproduct is quite useful for analysing an Excel sheet we decided to make it available here as well.
Launch BiffView by double clicking BiffView.exe and select a binary Excel spreadsheet using the Browse button.
When clicking the Create button BiffView creates an HTML file which describes the structure of the Excel spreadsheet and opens this file in your browser.
Upon clicking one of the underlined structure elements, e.g. INTERFACEHDR, explanations about this element are given.
Structured storage (also known as compound file) is a technology to store hierarchical data within a single file. Microsoft Office uses the structured storage as a container for storing binary Office documents (doc, xls, ppt). In addition, it is used in OpenXML documents for storing OLE objects or macros.
Such a structured storage container is made up of a number of virtual streams which contain text, data and control structures of the binary Office documents, i.e. the container is like a small file system of its own. The content of these streams or subfiles is document type-specific, i.e. Word documents contain other streams than Excel spreadsheets or PowerPoint presentations.
Based on the compound binary file format specification made available by Microsoft we have developed a .NET/C# assembly to open and read such a structured storage file. For the latest version of the Office Binary (doc) to OpenXML Translator (doc2x) we have added support for creating and writing structured storage files, as well. As mentioned above, this is necessary for storing OLE objects and macros.
This assembly does not refer to any Windows API calls, thus, porting to another platform (e.g. using Mono) is feasible.
The assembly implements the following major classes:
StructuredStorageReader
StructuredStorageWriter
RootDirectoryEntry / StorageDirectoryEntry / StreamDirectoryEntry
VirtualStream
VirtualStreamReader
These classes implement the following main methods for accessing a virtual stream or for writing a compound file:
StorageReader(string fileName)
VirtualStream GetStream(string path)
byte[] ReadBytes(long position, int count); / byte ReadByte();
StorageDirectoryEntry AddStorageDirectoryEntry(string name)
void AddStreamDirectoryEntry(string name, Stream stream)
public void write(Stream outputStream)
Consequently, reading a stream in a structured storage consists of the following sequence of calls:
Writing a structured storage to a stream can be achieved as follows:
The package available for download additionally contains command line test applications:
Usage: CompoundFileExtractTest.exe <file1> [<file2> ...]
Usage: CompoundFileReadWriteExtractTest.exe <file>
You can download the complete source code from our download page on SourceForge.net.
To compile the source distribution, you will need Microsoft Visual Studio 2005 or the free Microsoft Visual C# 2005 Express edition.