close-up image of interlocking gears turning with a rainbow gradient overlay

Research Tools: code, datasets, & models

Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.

Current selections

Sort by: Most recent

Clear selections

Search within these results

Published Date

Dataset Source Code

MInference: Accelerating Pre-filling for Long-context LLMs via Dynamic Sparse Attention

MInference 1.0 leverages the dynamic sparse nature of LLMs’ attention, which exhibits some static patterns, to speed up the pre-filling for long-context LLMs. It first determines offline which sparse pattern each head belongs to, then…

GitHub Project Publication Publication Publication

Dataset Source Code

Soroush: Max-min fair resource allocation solution

Soroush is a general and scalable max-min fair allocator. It consists of a group of approximate and heuristic methods that (a) solve at most one optimization, and (b) enable users to control the trade-offs between…

GitHub Publication

Dataset Source Code

Federated Learning under Distributed Concept Drift (FedDrift)

This repository is the source code for our paper: Federated Learning under Distributed Concept Drift (AISTATS’23).

GitHub Publication

Dataset Source Code

NeMoEval: A Benchmark Tool for Natural Language-based Network Management

This is a benchmark tool to evaluate natural language-based network management using LLM-generated code.

GitHub Publication

Dataset Source Code

Yatesbury: A Benchmark for East-West Network Security

This dataset serves as a benchmark for evaluting the performance and efficiency of anomaly detectors in east-west data center network traffic.

GitHub Publication

Dataset Source Code

Almagery Open Access

AImagery is an AI-powered multisensory relaxation system designed to reduce anxiety by providing personalized immersive experiences. The system uses AI to create guided imagery based on individual preferences and physiological feedback, incorporating elements like auditory…

GitHub

Dataset Source Code

VeriSMo Research Prototype

VeriSMo: A formally verified security module for AMD confidential VMs.

GitHub Publication

Dataset Source Code

ORCAS: Open Resource for Click Analysis in Search

ORCAS is a click-based dataset associated with the TREC Deep Learning Track. It covers 1.4 million of the TREC DL documents, providing 18 million connections to 10 million distinct queries.

GitHub Publication

Dataset Source Code

TREC Tip-of-the-Tongue Track

Tip-of-the-tongue (ToT) known-item retrieval is defined as “an item identification task in which the searcher has previously experienced an item but cannot recall a reliable identifier” (i.e., “It’s on the tip of my tongue…”). The…

GitHub Publication Publication Publication

Dataset Source Code

TREC Deep Learning Track

The TREC Deep Learning Track studies information retrieval in a large training data regime. This is the case where the number of training queries with at least one positive label is at least in the…

GitHub Publication Publication Publication Publication Publication Publication Publication Publication Publication