close-up image of interlocking gears turning with a rainbow gradient overlay

Research Tools: code, datasets, & models

Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.

Current selections

Sort by: Most recent

Clear selections

Search within these results

Published Date

Dataset Source Code

Microscopy for Single Cell Robustness

An evaluation of AI-driven microscopy image analysis suggests reliability issues in single cell analysis. This repository contains code used in our paper, “Representation Learning Methods for Single-Cell Microscopy are Confounded by Background Cells,” to evaluate…

GitHub Publication

Dataset Source Code

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition (LaDeCo)

In this work, we investigate automatic design composition from multimodal graphic elements. Although recent studies have developed various generative models for graphic design, they usually face the following limitations: they only focus on certain subtasks…

GitHub Publication

Dataset Source Code

Mu-Protein

µProtein is an open-source framework for protein sequence optimization, combining a protein fitness prediction model with reinforcement learning to efficiently explore the mutational landscape. It demonstrates strong generalization across diverse proteins and has been experimentally…

GitHub Publication

Dataset Source Code

Protein Language Model Subnetworks

Protein language models (PLMs) pretrained via a masked language modeling objective have proven effective across a range of structure-related tasks, including high-resolution structure prediction. However, it remains unclear to what extent these models factorize protein…

GitHub

Dataset Source Code

Lost in Conversation

Lost in Conversation is a code repository to facilitate benchmarking LLMs on multi-turn task completion and the reproduction of experiments included in the accompanying paper: “LLMs Get Lost in Multi-Turn Conversation”.

GitHub Publication

Tool

TypeAgent

TypeAgent is sample code that explores an architecture for building a single personal agent with natural language interfaces leveraging current advances in LLM technology. The goal of the TypeAgent team is to explore how to…

Access

Tool

Muse

Developed by Microsoft Research in collaboration with game studio Ninja Theory, Muse is a World and Human Action Model (WHAM) – a generative AI model of a video game that can generate game visuals, controller…

Access Publication

Dataset Source Code

Raha: Failure analysis of a production WAN

Raha (MetaOpt) uses our open-source heuristic analyzer to quantify the impact of failures on a traffic engineered WAN.

GitHub Publication

Dataset Source Code

Implicit Language Models are RNNs

Implicit Language models are language models that calculate their outputs as a fixed-point iteration rather than a single forward pass. Implicit models offer increased representational power compared to transformers as we show in the accompanying…

GitHub Publication

Dataset Source Code

HORIZON: A Benchmark for in-the-wild User Behavior Modeling

This repository provides the code to construct the HORIZON benchmark — a large-scale, cross-domain benchmark built by refactoring the popular Amazon-Reviews 2023 dataset for evaluating sequential recommendation and user behavior modeling. We do not release…

GitHub