Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
Impact of Controlled Language on Machine-Translation Quality and Post-Editing Efforts
Results from experiments conducted by Microsoft Research’s Machine Translation Incubation Team to investigate the impact of using good English (controlled language) on post-editing productivity—as well as on the overall quality of our statistical machine-translation system.
Microsoft Research Asia Chinese Word-Segmentation Data Set
A set of manually annotated Chinese word-segmentation data and specifications for training and testing a Chinese word-segmentation system for research purposes. The data was extracted from the People’s Daily, which we have licensed for commercial…
Conditional Maximum-Entropy Training Tool
This tool enables training and testing of maximum-entropy models using a general-feature file format. The tool also supports RProp and GIS as training algorithms.
Virtual Earth MapCruncher
MapCruncher lets users quickly convert existing maps into an online format that’s as fast and easy to use as Virtual Earth. PDF and raster maps can be converted in minutes just by clicking on corresponding…
NLP Data Sets for Comparative Study of Parameter-Estimation Methods
Data sets for comparative study of parameter-estimation methods for statistical natural-language processing.
HCRF Acoustic Model Trainer
Source code and scripts for acoustic model training for phonetic classification.