Microsoft Makes XML the File Format for the Next Version of Microsoft Office
June 01, 2005
Q&A: Senior Vice President Steven Sinofsky explains how making XML the default file format is likely to help customers cut costs for data storage and bandwidth, improve security and boost data recovery.

REDMOND, Wash., June 1, 2005 — When Microsoft announced support for XML in Microsoft Office 2000 seven years ago, many corporate computing customers were unfamiliar with the business value possible from a common data format capable of being understood across applications, platforms and the Internet. Today, with more than 300,000 developers building XML into their solutions, according to Microsoft estimates, times have changed.

Steven Sinofsky, Senior Vice President, Office
Steven Sinofsky, Senior Vice President, Office
Image: Page

And they're about to change again. Just days ahead of Tech·Ed 2005, Microsoft today announced that it is adopting XML as the default file format for the next major version of its Microsoft Office software, currently codenamed "Office 12." To understand why Microsoft is making this change and what it means to customers, software developers, and the industry at large, PressPass spoke with Steven Sinofsky, senior vice president, Office.

PressPass: A new file format for Microsoft Office is a big deal. What's the context for this change?

Sinofsky: Two weeks ago, [Microsoft Chairman and Chief Software Architect] Bill Gates laid out our vision of "The New World of Work." That vision brings together emerging trends that are familiar to almost everyone who uses a computer in the workplace: exponential growth in the volume of business information people have to manage--and in the challenge of gaining business insight from that information; 24-by-7 connectivity leading to 24-by-7 work demands; an explosion in the need to collaborate efficiently and securely with people anywhere in the world.

Microsoft Office 2003 did a great job of beginning to address new workplace challenges. But there's more we can do to give people even greater control over their information, their time, their jobs and their results. The next version of Microsoft Office software, code-named "Office 12," does this. The introduction of default XML formats, Microsoft Office XML Open Format, is one of our key innovations.

PressPass: Why is XML so important?

Sinofsky: XML enables companies to capture information so it can be repurposed and reused however and whenever the organization needs to use it, regardless of platform. Building on XML support in Microsoft Office, customers can improve data flow throughout their organizations. They can build customized business process and productivity solutions that help information workers make a greater impact on their business.

For example, information that individuals create or capture on their desktops now can be connected directly to key business processes via XML, streamlining the management of those processes and reducing the need to re-key information in separate systems. Think of a customer-service representative who now can respond to a customer issue using standard document components stored on a server, rather than having to retype an entire document.

Likewise, XML can unlock information currently stored in back-end systems, which can then be processed and re-purposed on the desktop in the Office applications with which people are already very familiar. For example, executives could analyze up-to-the-minute performance of their companies with a desktop analysis program that receives real-time updates from separate back-end databases for financial, sales, and inventory status.

PressPass: But this isn't the first integration of XML in Microsoft Office.

Sinofsky: That's right. We began XML support for Office with Office 2000, when we introduced XML-based document properties, and then continued with Office XP, when we introduced SpreadsheetML, a way to use XML with the Excel file format. In Office 2003, we introduced Microsoft InfoPath 2003, an information-gathering program that is entirely based on XML. WordprocessingML was a way to use XML with the Microsoft Word file format. We also included support for XML-based Web services integration with several of our programs to ensure data could be easily transported into and out of the desktop applications to back-end systems.

PressPass: So what's new about the Microsoft Office XML Open Formats?

Sinofsky: The Microsoft Office XML Open Formats introduce significantly enhanced XML formats for Microsoft Word and Excel, and the first XML format for Microsoft PowerPoint. The formats use consistent, application-specific XML markup and are completely based on XML and use industry-standard ZIP-compression technology.

The new formats improve file and data management, data recovery, and interoperability with line-of-business systems beyond what's possible with Office 2003 binary files. And any program that supports XML -- it doesn't have to be part of Office or even from Microsoft -- can access and work with data in the new file format. Because the information is stored in XML, customers can use standard transformations to extract or repurpose the file data.

PressPass: Why is Microsoft doing this?

Sinofsky: The short answer is because these capabilities -- improved file and data management, improved interoperability, and a published file-format specification -- are exactly what customers have asked us for.

The slightly longer answer is what these capabilities do for our customers. For example, in the area of interoperability, the new format enables the building of archives of documents that can be used without Office code if required. And information created in Office can be integrated easily with back-office systems. So we're seeing a way to make Office more compelling to customers. Interoperability also means huge benefits to the larger software industry, since it enables other vendors to tap into Office documents and file formats, and have information contained in Office files flow more easily to and from third-party systems.

PressPass: Your customers and partners have a huge investment in the current Microsoft Office file formats. What will happen to that investment and how will you help them to move to "Office 12"?

Sinofsky: We made it a priority to ensure that customers and the industry at large can adopt "Office 12" with the least effort possible, benefit from its new file formats, and continue to gain maximum benefit from their existing Microsoft Office files. So, the first thing that flows from that effort is full backward compatibility with the versions of Microsoft Office that the vast majority of people and businesses are already using: Office 2000, Office XP, and Office 2003. Customers who use these versions can download a innovative, free patch we created that allows them to open, edit and save files using the new format from within their earlier versions of Office.

Next, the current .doc, .xls, and .ppt binary file formats will be fully compatible with "Office 12." People can save to these formats from "Office 12" without concern. When "Office 12" is installed, the default file formats can be set to whichever format a person chooses, which is particularly helpful in a managed desktop environment. This will help to ensure that people who use "Office 12" can continue to work with third-party solutions based on earlier versions of Office, as well as with their colleagues, suppliers, customers and others who haven't yet upgraded to "Office 12." In addition, documents will always be saved in the same format that they started in, which will make working with server-based documents or e-mail attachments in a workgroup setting completely seamless.

In the months leading up to the release, we'll provide more information about the new format--including drafts of the schema--to ensure that developers and IT pros can be prepared long before the product ships.

PressPass: How are you enabling the interoperability you described earlier?

Sinofsky: XML is inherently interoperable because it is a text-based standard that has been defined by the W3C. It can be consumed and created by a wide variety of tools already on the market today. We have used this standard as the foundation for the new Office XML Open Formats, which are open, published document formats. In addition, we are publishing with it a royalty-free license, so any customer or technology provider can use the file formats in its own systems without financial consideration to Microsoft. This will ensure that the new file formats can be used by everyone to create, access, and modify documents in these formats.

PressPass: Won't this make it easier for your competitors to copy Microsoft Office?

Sinofsky: Certainly this will make it easier for other developers to use our formats to build solutions that don't require Office. However, the ability of other technology providers to use the new file format to integrate their solutions with the Microsoft Office System is an important and frequently requested capability by the industry. We feel it's to everyone's advantage to respond. Customers also know that the true value of a desktop application is not the format in which data is stored but the full breadth of capabilities offered by that application, along with the quality and security of the user experience that it provides.

PressPass: You mentioned new file- and data-management capabilities. What are they?

Sinofsky: A key benefit of the new format is substantially smaller file sizes -- up to 75-percent smaller than comparable Office 2003 files. This is one of the advantages we get out of using the combination of XML and ZIP technologies for storing our files. Since XML is a text based format which compresses very well, and the ZIP container supports compressing the contents, we are able to achieve these significant reductions in file size.

That compression means smaller Word documents, Excel spreadsheets, and PowerPoint presentations--although compression and decompression happens automatically and users are never asked to ZIP or UNZIP files. Imagine all the documents stored across an enterprise and you can see how the benefits of a smaller file size will add up quickly in reduced bandwidth to share them over the network and reduced storage requirements to archive them.

With more and more documents traveling through e-mail attachments or removable storage, the chance of a network or storage failure increases the possibility of a document becoming corrupt. So it's important that the new file formats also will improve data recovery--and since data is the lifeblood of most businesses, better data recovery has the potential to save companies tremendous amounts of money.

PressPass: How do the new file formats do that?

Sinofsky: In the new formats, each type of data within a file is segmented and stored separately. So, when one file component is corrupt, the remainder of the file will still open within the application. For example, if a chart were to become damaged, this would not prevent people from opening every other part of the document, without the charts. This is different than the binary formats used in older applications, where corruption of a particular piece of data would prevent the entire file from loading properly.

Also, for those parts that do become corrupt or damaged, Office applications can detect these defects, and attempt to "fix" a document when it is opened by restoring the proper data structure to the content. Missing or improperly written XML data can be re-written to ensure that the files are compliant to the file format specification, and to improve the chances of opening the files correctly. And because the XML format is a text file, simply compressed, it's easier for any tool or person to recover information because the content is readily transparent.

PressPass: Security remains a key concern of companies. Do the Office XML Open Formats contribute to heightened information security?

Sinofsky: One area of feedback that was very clear from our customers is the need to identify and protect the sensitive information that they store within documents. Comments, tracked changes and document metadata are the types of information they don't want leaking outside their firewalls. The Office XML Open Formats stores each type of data as a separate tag within the file, making it easy to detect and remove specific types of content. For example, the comments that are stored inside a document as part of a review can be detected and removed before the document travels outside the company. In fact, a developer could write a solution to ensure that Web pages that are about to be published do not contain documents with embedded comments.

The Office XML Open Formats also helps to improve security against documents with embedded code or macros. By default, the new Word, Excel and PowerPoint file formats will not execute embedded code. So, if a person receives an e-mail message with a Word document attached, he or she could open that attachment knowing that the document would not execute harmful code. The Microsoft Office XML Open Formats will include a special-purpose format with a separate file extension for files with embedded code, enabling IT staff to quickly identify files that contain such code.

Of course, our customers are no longer vulnerable to macro-based security attacks when they use Microsoft Office with the improvements that have been made to Outlook starting with Outlook 98, as well as with the security measures around Visual Basic for Applications --VBA-- starting with Office 97 and the introduction of digitally signed macros in Office 2000.

PressPass: A final thought?

Sinofsky: We're about to release new technology that has the potential to make a hugely positive impact on workers' effectiveness and productivity without requiring a minute of additional training on their part. We're very excited about that.

Read More: