Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
ClueWeb 09 Labeled Near-Duplicate News Articles
This data release is a companion to the paper Duplicate News Story Detection Revisited by Omar Alonso, Dennis Fetterly, and Mark Manasse published at The Ninth Asia Information Retrieval Societies Conference (AIRS 2013) in December…
C# package for language identification
This package implements several algorithms for language identification, and includes two sets of pre-compiled language profiles. One set covers 52 languages and was trained on Wikipedia (i.e. a well-written corpus); the other covers 26 languages…
NoReplyAll Outlook Add-In
The primary function is to add buttons to several of the Outlook ribbons to prevent people from doing a reply-all to your message, or forwarding it (using a facility built into Outlook & Exchange which is…
JPEG XR HttpModule for IIS
The JPEG XR HttpModule for IIS enables websites to transparently take advantage of the JPEG XR image format by automatically redirecting requests for JPEG and PNG images to a JPEG XR version (if one exists).…
Entailment: An Effective Metric for Comparing and Evaluating Hierarchical and Non-hierarchical Annotation Schemes
Hierarchical or nested annotation of linguistic data often co-exists with simpler non-hierarchical or flat counterparts, a classic example being that of annotations used for parsing and chunking. In this work, we propose a general strategy…