Office Binary (doc, xls, ppt) Translator to Open XML

b2xTranslator Team Blog

Monday 22 September 2008

Crossing things off the checklist

Three weeks ago since we have started the work on the next doc2x build and we have already implemented some core features of Phase II. So let’s have a look at the current state of development:

  • We enhanced our paragraph conversion and are now able to convert the “Floating Properties” of paragraphs. Thus, the converter fully supports Frames and Textboxes, regardless if the Textbox is a floating paragraph or a floating shape.
  • A second core feature that we already finished is the conversion of “Comments”.
  • In addition we fixed many open bugs and made the converter more stable and efficient. By the way, all of you guys are welcome to test the converters and submit bugs to the tracker on Sourceforge.net!

But there is still something to do before the final release in December: Currently we are tackling the conversion of OLE objects, charts and macros. They are stored as several “Structured Storage Streams” in the binary file format. These streams need to be bundled up to a “Structured Storage File” in the “OpenXml” archive.

So one of the main goals of the next week will be to extend our “StructuredStorageReader” library and make it to a “StructuredStorageWriter” ;) If you want to know more about OLE objects, charts and macros, it might interest you that we just added a new documentation to our How To Guides section.

Finally I can tell you that we are going to release the first Milestone of Phase II as planned at the beginning of October.
So long!

Friday 12 September 2008

Binary Translator – Phase II

The Office Binary Translator to OpenXML for Word documents which we developed during the first half year of 2008 was already more than a prototype or proof of concept: A large number of binary Word documents could be translated to OpenXML without any loss of information and in some cases our resulting documents were even better than the documents created using the converter integrated in Office 2007.

Although our translator is already quite mature a number of more complex and less used features are not yet mapped/translated, e.g.

  • frame paragraphs lose their floating properties and are translated to normal paragraphs
  • OLE Objects are currently not supported and are lost after translation
  • SmartArts, Charts and Comments are currently not supported and are lost after translation
  • macros are currently not translated

In addition, some features have not yet been completely implemented:

  • 55 of about 200 different shape types are currently supported
  • Track Changes: Due to its high complexity, the revision marking (track changes) feature is not yet completely implemented; however, paragraph and character property modifications are implemented
  • a number of bugs have been reported which are not yet fixed

We are going to tackle all these features and bugs in Phase II of the Binary Translator project.

Hopefully, our work will be facilitated by the improved specification of the binary formats which has been released by Microsoft in June (we keep you informed about our findings).

Our schedule: Project start is now in September and two intermediate milestones are planned for beginning of October and November. The final version is planned for beginning of December. In addition to unit testing we are going to accomplish elaborate testing routines to guarantee a stable and high quality translator release.

Some weeks of hard coding and testing work are waiting for us – let’s roll up the sleeves and get it done …

Powered by DotClear

Project page on SourceForge

SourceForge.net Logo