Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
Self-training with Weak Supervision [Code]
State-of-the-art deep neural networks require large-scale labeled training data that is often either expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be…
Approximate nearest neighbor Negative Contrastive Estimation (ANCE)
A novel embedding training algorithm leveraging ANN search and achieved SOTA retrieval on Trec DL 2019 and OpenQA benchmarks.
Verified DICE for STM32H7 Microcontrollers
This repository contains a Verified Boot implementation for the STM32H7 devices (specifically STM32H753ZI, STM32H743ZI) for the paper. The implementation contains DICE* code generated from the dice-star repository and implements the Hardware Abstraction Interface. The cmake…
Rajasthani Hindi Speech Data
This dataset consists of audio recordings of participants reading out stories in Rajasthani Hindi, one sentence at a time. We had 98 participants from Soda, Rajasthan. Each participant read 30 stories. In total, we have…
Bing Coronavirus Query Set
Dataset containing Aggregated and anonymized queries from across the world with Coronavirus intent. This dataset was curated from the Bing search logs (desktop users only) over the period of Jan 1st, 2020 – (Current Month…
InnerEye-DICOM-RT
InnerEye-DICOM-RT contains tools to convert medical datasets in NIFTI format to DICOM-RT. Datasets converted using this tool can be consumed directly by InnerEye-DeepLearning. Most of the work is done by a .NET Core 2.1 project…
SEED-Encoder
This is the release record to open source a part of our recent research “SEED-Encoder”. It includes the model weights of the pretrained model, and the codes to add into our existing open-source repo ANCE…
CyberBattleSim
CyberBattleSim is an experimentation research platform to investigate the interaction of automated agents operating in a simulated abstract enterprise network environment. The simulation provides a high-level abstraction of computer networks and cyber security concepts. Its…
Synthetic data showcase
Generates synthetic data and user interfaces for privacy-preserving data sharing and analysis. In many cases, the best way to share sensitive datasets is not to share the actual sensitive datasets, but user interfaces to derived…