What’s new in Word Automation Services

This week’s post comes from Zeyad Rajabi, who helped create Word Automation Services in Office 2010 and has been driving improvements to the services in the new Office.

In Office 2010, we introduced a brand new SharePoint service called Word Automation Services. Word Automation Services allows developers to harness the capabilities of Word on the server. Word Automation Services allows developers to perform the following types of file operations:

  • Converting between document formats (ex. DOC to DOCX)
  • Converting to fixed formats (ex. PDF)
  • Updating the Table of Contents, the Table of Authorities, and index fields
  • Recalculating all field types
  • Importing “alternate format chunks”
  • Setting the compatibility mode of the document to the latest version or to previous versions of Word

We created this service because we wanted to help developers avoid the challenges of automating the Word client application as documented by this famous Knowledge Base article: http://support.microsoft.com/kb/257757.

In Office 2013, we’ve made improvements to Word Automation Services, based on some great user feedback, which we think will make the service even easier to use and allows the service to accommodate additional scenarios.

Feedback from Office 2010 Developers

Word Automation Services was initially created to accommodate bulk file operation scenarios. The service was optimized to perform file operations on many files at a time. These file operations were performed asynchronously based on a SharePoint Timer Job, which means the service operations are only kicked off when the SharePoint Timer Job ran. This behavior means Word Automation Services could only be kicked off, at the very least, in one minute iterations. Developers who wanted the conversion operation to be kicked off synchronously were out of luck. The Timer Job-based design and behavior works well for bulk operation scenarios, but is not ideal for scenarios involving a small number of documents.

Additionally, we heard from customers that the requirement of having files exist physically on SharePoint in order to consume, create, or edit files via Word Automation Services was limiting. This requirement means a developer must always work within the context of SharePoint when taking advantage of the service; the only way to deal with files outside of SharePoint is to first get those files on SharePoint. In addition, there were several scenarios where the output of the service was not the final output of the solution. In these particular cases, a developer is forced to manually move the intermediate files created by the service. Developers didn’t want the extra performance hit by hitting the SharePoint content database more than necessary.

On demand file operations

As part of SharePoint 2013, you will now be able to create on demand file operation requests to Word Automation Services. These requests are processed immediately and have higher priority than traditional asynchronous Timer Job-based requests. These on demand file operation requests do not depend on the SharePoint Timer Job. Think of these on demand file operation requests as synchronous Word Automation Services requests. On demand file operation requests can only be made for one file at a time as opposed to the existing Timer Job-based requests, which can handle many files at a time.

Word Automation Services is able to handle both asynchronous and synchronous file operation requests at the same time. Word Automation Services maintains two separate queues, one for on demand (immediate) file operation requests and one queue for SharePoint Time Job-based requests. Word Automation Services will pause all Timer Job-based requests whenever there is at least one on demand request, and the Timer Job-based requests will restart once all on demand requests have been processed. This prioritization allows the service to accommodate on demand requests more quickly. Note we also ensure that we prevent the complete starvation of Timer Job-based requests by on demand requests by periodically letting those requests be processed ahead of on demand requests.

The following diagram represents the Word Automation Services 2013 architecture:

Stream support

In addition to on demand file operation requests, Word Automation Services now supports streams. You are no longer limited to working on files stored in SharePoint libraries. Using streams, you will be able to leverage Word Automation Services functionalities for files stored outside of SharePoint.

You will only be able to use streams with Word Automation Services when using on demand file operation requests. In other words, streams will not work with Timer Job-based requests.

Code comparison

We tried to make coding Word Automation Services solutions as easy as possible. With only a few lines of code, you’ll be able to integrate Word Automation Services into your solution. Here is some sample code of using the Timer Job-based request:

ConversionJob pdfJob = new ConversionJob(Word Automation Services);
pdfJob.UserToken = myWebsite.CurrentUser.UserToken;
pdfJob.AddFile(outputFilename, outputFilename.Replace(“.docx”, “.pdf”));

In the above code sample, the conversion request is triggered after the Start() method is invoked. At which point, the conversion request is added to the Timer Job queue to await for the SharePoint Timer Job to start the request.

Working with on demand file operation requests is very similar. Instead of creating a ConversionJob object, you create a SyncConverter object. The SyncConverter object allows you to operate on one file at a time and is processed immediately. Here is some sample code of using the on demand request:

SyncConverter syncConv = new SyncConverter(“Word Automation Services”);
syncConv.Settings.OutputFormat = SaveFormat.PDF;
ConversionItemInfo convInfo = syncConv.Convert(inStream, outStream);

Get more out of Word Automation Services

Word Automation Services does not accomplish all file operation scenarios. Take for example, a scenario where a solution needs to merge and modify content within a document. The service was not created to be a replacement of the Word client object model.
Instead, the server is one half of a replacement for the existing object model – the other half being the Open XML SDK.

The Open XML SDK was designed to handle tasks that don’t require application logic and layout, such as inserting or deleting content (paragraph, tables, pictures), inserting data from other data sources, sanitizing content (removing content, accepting tracked changes), etc.
Word Automation Services was designed to handle file operations that do require application logic and layout, such as reading and laying out all Word document formats, converting to and from different file formats, recalculating dynamic fields, etc. These two pieces can be used together to enable rich, end-to-end solutions that never require automating Word client applications. Check out Brian Jones’ blog for more details and examples of using the Open XML SDK.

I hope the improvements mentioned in this post will make it even easier for you to use Word Automation Services in your solutions. Tell us what you think in the comments below the post.

The Word Automation Services feature crew, 2/3 of which is shown here, is excited to share these improvements with you.