Impact of controlled language on translation quality and post-editing in a statistical machine translation environment
This paper investigates the relationships among controlled language (CL), machine translation (MT) quality, and post-editing (PE). Previous research has shown that the use of CL improves the quality of MT. By extension, we assume that the use of CL will lead to greater productivity or reduced PE effort. The paper examines whether this three-way relationship among CL, MT quality, and PE holds. Beginning with a set of CL rules, we determine what types of CL rules have the greatest cross-linguistic impact on MT quality. We create two sets of English data, one which violates the CL rules and the other which conforms to them. We translate both sets of sentences into four typologically different languages (Dutch, Chinese, Arabic, and French) using MSR-MT, a statistical machine translation system developed at Microsoft. We measure the degree of impact of CL rules on MT quality based on the difference in human evaluation as well as BLEU scores between the two sets of MT output. Finally, we examine whether the use of CL improves productivity in terms of reduced PE effort, using character-based edit-distance.