Office Binary (doc, xls, ppt) Translator to Open XML

Table of Contents

Phase III Milestone 5

Older Downloads

BiffView Tool

Excel's underlying file format structure is also known as BIFF (Binary Interchange File Format). For analysis and debugging purposes we started the xls/xlsx mapping with the development of a small tool for viewing the BIFF structure of a binary Excel sheet.

Since this byproduct is quite useful for analysing an Excel sheet we decided to make it available here as well.

Installation and Usage Information

To install BiffView just extract the zip file to a suitable folder (e.g. c:\Program Files\BiffView).

Launch BiffView by double clicking BiffView.exe and select a binary Excel spreadsheet using the Browse button.

When clicking the Create button BiffView creates an HTML file which describes the structure of the Excel spreadsheet and opens this file in your browser.

Upon clicking one of the underlined structure elements, e.g. INTERFACEHDR, explanations about this element are given.

Structured Storage Class

Structured storage (also known as compound file) is a technology to store hierarchical data within a single file. Microsoft Office uses the structured storage as a container for storing binary Office documents (doc, xls, ppt). In addition, it is used in OpenXML documents for storing OLE objects or macros.

Such a structured storage container is made up of a number of virtual streams which contain text, data and control structures of the binary Office documents, i.e. the container is like a small file system of its own. The content of these streams or subfiles is document type-specific, i.e. Word documents contain other streams than Excel spreadsheets or PowerPoint presentations.

Based on the compound binary file format specification made available by Microsoft we have developed a .NET/C# assembly to open and read such a structured storage file. For the latest version of the Office Binary (doc) to OpenXML Translator (doc2x) we have added support for creating and writing structured storage files, as well. As mentioned above, this is necessary for storing OLE objects and macros.

This assembly does not refer to any Windows API calls, thus, porting to another platform (e.g. using Mono) is feasible.

The assembly implements the following major classes:

These classes implement the following main methods for accessing a virtual stream or for writing a compound file:

Consequently, reading a stream in a structured storage consists of the following sequence of calls:

Storage Reader: Usage

Writing a structured storage to a stream can be achieved as follows:

Storage Writer: Usage

The package available for download additionally contains command line test applications:

Download:

Source Distribution

Download

You can download the complete source code from our download page on SourceForge.net.

Minimum Software Requirements

To compile the source distribution, you will need Microsoft Visual Studio 2005 or the free Microsoft Visual C# 2005 Express edition.

Back to top

Project page on SourceForge

SourceForge.net Logo