Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
DIABLo: Deep individual-agnostic binaural localizer
Public release of sample videos to show the performance of a localization model (DIABLo). The model itself (or any related training or test data) will NOT be part of this release. We will only release…
CheckList NLI Data and Code Release
Similar to the ID 3397 and ID 3860, the broad goal for this project is to research and develop Neuro-Symbolic systems for Natural Language Inferencing (NLI) to leverage the “correctness” guarantees and interpretability of symbolic…
HittER: Hierarchical Transformers for Knowledge Graph Embeddings [Code]
HittER generates embeddings for large-scale knowledge graphs and performs link prediction using a hierarchical Transformer model. It appeared in EMNLP 2021.
Protein sequence models
Codebase for generative modeling of protein sequence and structure, including code for CNNs and GNNs and custom data handling code.
CLUES: Few-Shot Learning Evaluation in Natural Language Understanding
This repo contains the data and source code for baseline models in the NeurIPS 2021 benchmark paper for Constrained Language Understanding Evaluation Standard (CLUES) under MIT License.
Aerial Wildlife Detection
AIDE: Annotation Interface for Data-driven Ecology – Tools for detecting wildlife in aerial images using active learning
DiCE: A library for generating Diverse Counterfactual Explanations
DiCE is a Python library that can generate counterfactual explanations for any machine learning classifier. Counterfactual explanations present “what-if” perturbations of the input such that an ML classifier outputs a different class for those perturbations…