Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
Web Demonstration and Explanation Dataset
This data was collected for and used in our ACL 2020 paper that demonstrates the potential to effectively combine explanations and demonstrations to learn web-based procedures. This data consists of 520 explanations and corresponding demonstrations…
SPLASH: Semantic Parsing with Language ASsistance from Humans
SPLASH is dataset for the task of semantic parse correction with natural language feedback. The task, dataset along with baseline results are presented in: Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback Ahmed…
Beluga Sounds
Using machine learning to detect beluga whale calls in hydrophone recordings. Of the five populations of beluga whales in Alaska, the Cook Inlet population is the smallest and has declined by about seventy-five percent since…
VL-BERT
VL-BERT is a simple yet powerful pre-trainable generic representation for visual-linguistic tasks. It is pre-trained on the massive-scale caption dataset and text-only corpus, and can be fine-tuned for various down-stream visual-linguistic tasks, such as Visual…
RaCT
This repository implements Ranking-Critical Training (RaCT) for Collaborative Filtering, accepted in International Conference on Learning Representations (ICLR), 2020. By using an actor-critic architecture to fine-tune a differentiable collaborative filtering model, we can improve the performance…
BERT-nmt
BERT-fused NMT is a new algorithm in which we first use BERT to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the…
KG-A2C
KG-A2C is a reinforcement learning agent that builds a dynamic knowledge graph while exploring and generates natural language using a template-based action space – outperforming all current agents on a wide set of text-based games.
FreeLB
FreeLB is an adversarial training approach for improving transformer-based language models on Natural Language Understanding tasks. It accumulates the gradient in the ascent steps and updates the parameters with the accumulated gradients, which is approximately…
Prevalent
Prevalent: A Pretrained Generic VLN Agent [Code Clean Is In Progress] This repository contains source code to reproduce the results presented in the paper: Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training, CVPR…