Friday 26 June 2009
Phase III Reloaded
By makz, Friday 26 June 2009 at 10:44 :: posted to General
After the “final” milestone of Phase 3 was already released we decided to add two more milestones.
b2xTranslator Team Blog
Friday 26 June 2009
By makz, Friday 26 June 2009 at 10:44 :: posted to General
After the “final” milestone of Phase 3 was already released we decided to add two more milestones.
Thursday 7 May 2009
By Wolfgang Keber, Thursday 7 May 2009 at 08:21 :: posted to General
Phase III of the b2xtranslator project resulted in quite evolved tools to translate Microsoft binary Office documents into OpenXML. The main purpose of this project was for sure to demonstrate that it is feasible to develop such translators just using the available file format specifications for the binary formats as well as for OpenXML.
But you can find some more goodies in this project:
Is this now the end of the line? No, I don’t think so. Apart from a number of still missing features for the Excel and PowerPoint translators such as Chart Translation or Pivot Tables we have some more ideas to enhance the b2xtranslators, e.g.
These are just some of our ideas for a future version.
What’s about your input? Your feedback would be appreciated in our discussions how Phase IV of the b2xtranslator should be shaped.
Thursday 26 March 2009
By Wolfgang Keber, Thursday 26 March 2009 at 14:56 :: posted to General
All scheduled features have now been implemented in Phase III/Milestone 2 of the b2xTranslator:
The detailed feature mapping description can be found here.
Since the binary translator for Word documents (doc2x) was already quite complete we mainly concentrated in Phase III on high-priority bug fixes.
We will focus in the coming weeks on testing and code stabilization in order to release a stable and highly productive version in April. Then you will also come across an updated b2xTranslator web site on SourceForge including the test reports for the translators and updated documentation on binary Excel and PowerPoint file formats and xls/xlsx & ppt/pptx mapping.
And we are already paving the way for another phase of the b2xTranslator with some exciting new features …
Stay tuned!
Friday 27 February 2009
By Wolfgang Keber, Friday 27 February 2009 at 14:17 :: posted to General
Time flies when you're having fun developing such awesome software. I can't believe that it's already time to release M1 of Phase III.
The ppt2x and xls2x translators are now in a quite good shape, i.e. you can already run some serious translation jobs. Nevertheless, there is still some room for improvements (stay tuned for M2!).
Another nice feature of M1 is the single setup procedure (including context menu registration) for all the translators.
Have some fun with the new release, too!
Wednesday 21 January 2009
By Wolfgang Keber, Wednesday 21 January 2009 at 16:21 :: posted to General
Previous releases of the b2xTranslator were mainly focused on the translation from doc to docx (doc2x) while the translators for xls to xlsx (xls2x) and ppt to pptx (ppt2x) have been neglected a bit.
This is going to be changed in Phase III of the b2xTranslator project: The doc2x translator is quite mature now (nevertheless, some of the high priority bugs reported on SourceForge are going to be fixed in Phase III); consequently, the two other translators xls2x and ppt2x will benefit from new feature implementations.
Keeping this in mind the project scope is centered on the following topics:
While Microsoft provides with Office 2007 and the File Format Compatibiliy pack for earlier Office versions a migration path from binary Office formats to OpenXML the b2xTranslator project is still necessary for the following reasons
There are some other very interesting news coming from Microsoft's document format teams: They've published another set of document-format implementation notes, this time for the ECMA-376 1st Edition implementation in Office 2007 SP2. As with the ODF 1.1 implementation notes published in December, the goal of publishing these notes is to help other implementers improve interoperability with Office, by transparently documenting the details of Microsoft's implementation.
To get to the ECMA-376 implementer notes, go to the DII home page and click on Reference and then select ECMA-376 1st Edition from the dropdown list. You'll then see a treeview control in the panel on the left, which contains the entire structure of the ECMA-376 spec.
Check also Doug's blog for more information ...
Friday 5 December 2008
By Wolfgang Keber, Friday 5 December 2008 at 08:46 :: posted to General
Phase II of the binary translator for Word documents (doc to docx) has been finished. The latest version has been extensively tested and offers a number of new features as described in the previous blog entries.
Let me thank all internal and external contributors, which helped to improve the translation quality. In particular, I was amazed and happy at the same time that some of you guys really looked deep in our code and identified quite a few issues. Thanks again!
We are now looking forward to also improving the two other binary translators: xls to xlsx and ppt to pptx. We are currently planning the feature scope of the next release and hope you will follow up our work and contribute to it as you did so for the Word translator. For example, feature requests can be submitted via the SourceForge tracking system.
Wednesday 5 November 2008
By Wolfgang Keber, Wednesday 5 November 2008 at 14:05 :: posted to General
M2 is really a big jump ahead. Let me describe some of the highlights.
Twofold Interoperability
Of course, the main purpose of the binary translator is to provide for interoperability between the binary (legacy) Word documents and the new OpenXML world. However, it also proves to be interoperable on the platform level: The binary translator is not bound to the Windows platform only; it also runs on any platform supporting Mono.
Cross Platform Interoperability -- Use Case
To mention just one use case: You prefer to run Linux on your servers and you want to implement a service on your servers which translates from doc to docx. No problem, just run the binary translator with Mono.
Broadening the Scope
The binary translator is not only a tool for translating your documents from doc to docx but also your templates from dot to dotm. Such a translation includes:
Of course, the internal document type and the extensions are probably set, i.e. the translation of a doc file containing macros results in a docm file (and not in a docx file).
What Else?
Most of the defects reported on SourceForge are fixed (some will be added in the near future, see Next Steps below :-). The shape implementation is now complete (except shapes, which won’t be supported such as the Action* buttons).
The “StructuredStorage” library which was the basic component for reading structured storage files is now also able to create such storages. This extension was necessary for creating macros and OLE objects in the OpenXML documents.
Field handling, in particular for form fields, was improved and revision marks (track changes) and comments are now translated.
An installation routine (MSI) makes it easier for you to install the binary translator under Windows.
Next Steps
Some features still need some more finetuning, e.g. charts and SmartArts. This will be for sure taking into account in November. In addition we are planning to extensively test all the new features of the binary translator before making the final release available beginning of December.
Please let us know your feature requests and feedback.
Have fun!
Wednesday 1 October 2008
By Wolfgang Keber, Wednesday 1 October 2008 at 13:11 :: posted to General
Today we have uploaded the first milestone release of Phase II – on schedule :-)
You can download the executables and sources as usual from the download area on SourceForge. Some more information is available on the project web site under Supplementary Downloads:
These "convert & open" tests have proven again that they are a valuable tool for detecting problems: A test run yesterday resulted in an unacceptable error quote of 25%. The analysis of the problems revealed that most of them were related to the newly implemented comment translation feature (the binary file format specification is not so clear here, we will analyse this in more detail and report about it). The issue could easily be remedied and another "convert & open" test run resulted today in an error quote of 2% only (the remaining erroneous documents will be analysed and the problems fixed in the coming days).
In a nutshell, M1 is not a big, yet quite important interim release. We will continue our effort in implementing outstanding features and remedying found problems.
Stay tuned for future releases!
Monday 22 September 2008
By makz, Monday 22 September 2008 at 09:53 :: posted to General
Three weeks ago since we have started the work on the next doc2x build and we have already implemented some core features of Phase II. So let’s have a look at the current state of development:
But there is still something to do before the final release in December: Currently we are tackling the conversion of OLE objects, charts and macros. They are stored as several “Structured Storage Streams” in the binary file format. These streams need to be bundled up to a “Structured Storage File” in the “OpenXml” archive.
So one of the main goals of the next week will be to extend our “StructuredStorageReader” library and make it to a “StructuredStorageWriter” ;) If you want to know more about OLE objects, charts and macros, it might interest you that we just added a new documentation to our How To Guides section.
Finally I can tell you that we are going to release the first Milestone of Phase II as planned at the beginning of October.
So long!
Friday 12 September 2008
By Wolfgang Keber, Friday 12 September 2008 at 00:21 :: posted to General
The Office Binary Translator to OpenXML for Word documents which we developed during the first half year of 2008 was already more than a prototype or proof of concept: A large number of binary Word documents could be translated to OpenXML without any loss of information and in some cases our resulting documents were even better than the documents created using the converter integrated in Office 2007.
Although our translator is already quite mature a number of more complex and less used features are not yet mapped/translated, e.g.
In addition, some features have not yet been completely implemented:
We are going to tackle all these features and bugs in Phase II of the Binary Translator project.
Hopefully, our work will be facilitated by the improved specification of the binary formats which has been released by Microsoft in June (we keep you informed about our findings).
Our schedule: Project start is now in September and two intermediate milestones are planned for beginning of October and November. The final version is planned for beginning of December. In addition to unit testing we are going to accomplish elaborate testing routines to guarantee a stable and high quality translator release.
Some weeks of hard coding and testing work are waiting for us – let’s roll up the sleeves and get it done …
Thursday 10 July 2008
By Wolfgang Keber, Thursday 10 July 2008 at 01:56 :: posted to General
Microsoft did a good job to disclose the binary Office file formats specification in February 2008 (see http://www.microsoft.com/interop/docs/officebinaryformats.mspx). Everyone can now use this information to build tools to access existing content in binary Office documents and convert it to another format (e.g. OpenXML) or to use it in some other way.
I put the everyone in italics for some good reason: From a legal point of view everyone can actually do it, no question (see also Microsoft’s Open Specification Promise http://www.microsoft.com/interop/osp/default.mspx).
However, be aware! You really need some patience and persistence including a good amount of willingness to struggle through all these bits and bytes which define the contents and layout of the binary Office documents. Sometimes, we have brooded many hours about the hex dump of a Word document on one side and Microsoft’s (cryptic :-) February release of the specification on the other side until we have understood the intricacies of a binary substructure such as the PICF structure.
But don’t be too afraid about this. There is some good help for you:
The translator from doc to docx is already quite mature and only a few feature mappings are missing.The two other translators (ppt to pptx and xsl to xslx) are more in a proof-of-concept phase and need for sure some improvements.
We are currently discussing and planning how this project can evolve in the future. If you have some special requirements or ideas, don’t hesitate to contact us.
Enjoy the M2 release and stay tuned for our plans for the future!