Anand Chakravarty is an SDET on the Machine Translation team for the past 2.5 years, has been at Microsoft for 8 years, and was the first product tester on the MT team (and “still having fun with testing MT :-)”). Today’s guest blog is about testing translation quality.
One of the first points that comes to mind, when talking about verifying the quality of a translation system, is how do you measure the quality, or to be precise, the accuracy of translation? Translating between human languages using computers is a field that is almost half-a-century old. The area is challenging enough that even the best currently available machine translation systems are not close to obtaining linguistic quality that would be entirely satisfactory.
Part of the challenge is the many different data-points that humans process in order to understand the meaning of spoken/written text. There is the syntax, the parsing, the semantics, the context, the disambiguation, the reordering, all of which, and more, go into understanding a sentence. And this is just the sentence in 1 language. Now consider applying all of it to rebuild the sentence in another language and make it equally meaningful.
Some examples might help to make this point clearer. The term ‘Olympics 2008’ is fairly unambiguous. Similarly, one might expect the term ‘Elections 2008’ to mean the presidential elections in the USA. However, if the user is from, say, Canada, it would more likely refer to the local elections there.
A more general, and hence more common, example is a sentence like ‘The note was wrong’. Is the word ‘note’ a reference to an informative message or to a musical term? The proper translation depends upon context. Use more context, and your chances of getting a more accurate translation improve. This however comes at a cost: the more context the system tries to obtain, the slower its performance. Smart shipping decisions involve making the right balance between improving the accuracy of translation and delivering a workable translation result to users. Of course, both are important. The key is to understand where you direct efforts at improvement depending on how useful the end result is to the user.
This becomes particularly interesting when translating documents or web-pages, instead of just individual sentences. Let us say a translation request has been received for a web-page containing 100 sentences. Depending on the architecture of the translation system, these sentences could all go to one process, or be distributed across multiple processes/machines. Either way, it is clear that the time taken to translate this page in its entirety is proportional to the maximum time taken to translate a sentence. How long do we spend translating a sentence before that invested time becomes detrimental to the user’s time? In pursuit of the best translation, we might end up blocking the user from getting anything informative in response to their translation request. The utility of the system is thus governed by decisions that are made to balance linguistic quality and application performance.
With the Microsoft Translator product, there is the additional feature of our Bilingual Viewer, something unique among publically available translation products. It supports parallel text highlighting, synchronized scrolling and presents the page(s) with progressive rendering. This adds another layer to what our users see, and consequently another layer to polish and finish.
In the coming weeks, we hope to bring you more details of specific areas that were and are being tested to ship a top-quality translation system. Feel free to post any questions you have on this matter, something you always wanted to ask :-), in the Comments section.