Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
The use of Melodic Scales in Bollywood Music: An Empirical Study
Hindi film music, which is commonly referred to as Bollywood music, is one of the most popular forms of music in the world today. One of the reasons for its popularity has been the willingness…
Image Cropping Dataset
The Image Cropping Dataset contains the cropping parameters for 1000 images that were manually cropped by an experienced photographer. The cropping parameters indicate the coordinates of the upper-left and bottom-right corners of the crop box.…
MSR Identity Toolbox (With Binaries)
This is the MSR Identity Toolbox: A MATLAB toolbox for speaker-recognition research. This toolbox contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition. Version 1.0…
MSR Identity Toolbox (Without Binaries)
This is the MSR Identity Toolbox: A MATLAB toolbox for speaker-recognition research. This toolbox contains a collection of MATLAB tools and routines that can be used for research and development in speaker recognition. Version 1.0…
Powergrading Short Answer Grading Corpus
This corpus contains the original data analyzed in the following paper: Basu, Jacobs, and Vanderwende, “Powergrading: a Clustering Approach to Amplify Human Effort for Short Answer Grading,” Transactions of the ACL, 2013. It consists of…
ClueWeb 09 Labeled Near-Duplicate News Articles
This data release is a companion to the paper Duplicate News Story Detection Revisited by Omar Alonso, Dennis Fetterly, and Mark Manasse published at The Ninth Asia Information Retrieval Societies Conference (AIRS 2013) in December…