close-up image of interlocking gears turning with a rainbow gradient overlay

Research Tools: code, datasets, & models

Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.

Current selections

Sort by: Most recent

Clear selections

Search within these results

Published Date

Dataset Source Code

Structured Entity Extraction

Structured Entity Extraction and the Approximate Entity Set OverlaP (AESOP) metric are designed to appropriately assess model performance.

GitHub Project Publication

Dataset Source Code

AttentionEngine: A Custom Model Optimization Framework

AttentionEngine accelerates transformer attention variants by generating efficient custom kernels, enabling model designers to easily create new variants with our flexible API.

GitHub

Dataset Source Code

TerraTrace: Spatio-Temporal Signatures for Land Use Analytics

Understanding land use over time is critical to tracking events related to climate change, like deforestation. However, satellite-based remote sensing tools which are used for monitoring struggle to differentiate vegetation types in farms and orchards…

GitHub Project Project

Dataset Source Code

SeerAttention

SeerAttention is a learning-based method to enable block-level sparse attention for long-context LLM without using prefined static pattern or heuristic methods. It can be applied in Post-training or Fine-tuning stages. The Attention Gate units learn…

GitHub

Tool

OmniParser V2

OmniParser is an advanced vision-based screen parsing module that converts user interface (UI) screenshots into structured elements, allowing agents to execute actions across various applications using visual data . By harnessing large vision-language model capabilities,…

Access Publication

Dataset Source Code

MICON (Molecular-Image Contrastive Learning)

This is the repository for paper “Causal integration of chemical structures in self-supervised learning improves representations of microscopy images for morphological profiling”. Learning effective representations of cells in microscopy images can fuel many applications. Here,…

GitHub Publication

Dataset Source Code

PromptPex

PromptPex is a tool for exploring and testing AI model prompts. PromptPex is intended to be used by developers who have prompts as part of their code base. PromptPex treats a prompt as a function…

GitHub

Dataset Source Code

ProtNote: a multimodal method for protein-function annotation

ProtNote is a multimodal deep learning model that leverages free-form text to enable both supervised and zero-shot protein function prediction.

GitHub

Dataset Source Code

AutoVerus

Automatically synthesize proof annotations that help Verus prove the correctness of Rust code.

GitHub Project Publication Publication

Tool

FStar Data Set v2

This dataset is the Version 2.0 of the FStar Data Set. This dataset’s primary objective is to train and evaluate Proof-oriented Programming with AI (PoPAI, in short). Given a specification of a program and proof…

Access Project Publication