Rajasthani Hindi Speech Data
This dataset consists of audio recordings of participants reading out stories in Rajasthani Hindi, one sentence at a time. We had 98 participants from Soda, Rajasthan. Each participant read 30 stories. In total, we have…
Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
This dataset consists of audio recordings of participants reading out stories in Rajasthani Hindi, one sentence at a time. We had 98 participants from Soda, Rajasthan. Each participant read 30 stories. In total, we have…
Dataset containing Aggregated and anonymized queries from across the world with Coronavirus intent. This dataset was curated from the Bing search logs (desktop users only) over the period of Jan 1st, 2020 – (Current Month…
InnerEye-DICOM-RT contains tools to convert medical datasets in NIFTI format to DICOM-RT. Datasets converted using this tool can be consumed directly by InnerEye-DeepLearning. Most of the work is done by a .NET Core 2.1 project…
This is the release record to open source a part of our recent research “SEED-Encoder”. It includes the model weights of the pretrained model, and the codes to add into our existing open-source repo ANCE…
CyberBattleSim is an experimentation research platform to investigate the interaction of automated agents operating in a simulated abstract enterprise network environment. The simulation provides a high-level abstraction of computer networks and cyber security concepts. Its…
Generates synthetic data and user interfaces for privacy-preserving data sharing and analysis. In many cases, the best way to share sensitive datasets is not to share the actual sensitive datasets, but user interfaces to derived…
Code-switching or code-mixing (CM) refers to the juxtaposition of linguistic units from two or more languages in a single conversation or sometimes even a single utterance. It is quite commonly observed in speech conversations of…
This repository provides a set of code samples illustrating how the VROOM Cross-Reality (XR) telepresence prototype system was assembled. We hope that it will help other researchers prototype similar XR telepresence experiences.
This is a quick start guide for the document ranking task in the TREC Deep Learning (TREC-DL) (opens in new tab) benchmark. If you are new to TREC-DL, then this repository may make it more…
This code was written for conducting experiments that are published in an academic paper at Eurocrypt to find twin smooth integers. The code does not contain any cryptographic algorithms, but can be used to find…