close-up image of interlocking gears turning with a rainbow gradient overlay

Researcher tools: code, datasets, & models

An index of datasets, SDKs, APIs and other open source code created by Microsoft researchers and shared with the broader academic community. We also maintain a collection highlighting some of the tools you’ll find here.

Current selections

Sort by: Most recent

Clear selections

Search within these results

License Types

Published Date

Dataset Source Code

Research Analysis Tools

Rats is a collection of tools to help researchers define and run experiments. It is designed to be a modular and extensible framework currently supporting building and running pipelines, integrating configs and services.

GitHub

Dataset Source Code

MofDiff

MOFDiff is a diffusion model for generating coarse-grained MOF structures. This codebase also contains the code for deconstructing/reconstructing the all-atom MOF structures to train MOFDiff and assemble CG structures generated by MOFDiff.

GitHub Publication

Dataset Source Code

AI Controller Interface (AICI)

The AI Controller Interface is a system design and implementation that enables customer user code (AI Controllers, implemented as light-weight virtual machines) to tightly, efficiently, and securely integrate with LLM decoding in a cloud service.…

GitHub

Download

UDOP

UDOP adopts an encoder-decoder Transformer architecture based on T5 for document AI tasks like document image classification, document parsing and document visual question answering. You can use the model for document image classification, document parsing…

Download Publication

Dataset Source Code

Node Engine

Node Engine is a Python service that executes a computational flow. It is designed for rapid prototyping of services and applications, e.g. used as a chatbot service in a larger system. Each call to the service…

GitHub

Dataset Source Code

Diffy Config Analyzer

Diffy is a research prototype tool that analyzes JSON configuration files. The goal of Diffy is to assist with configuration management. It compares JSON files and warns about potential issues by finding configuration settings that…

GitHub

Dataset Source Code

Neural Invariant Ranker

Official code release of our EMNLP 2023 work NeuralInvariantRanker. We have designed a ranker that can distinguish between correct inductive invariants and incorrect attempts based on the problem definition. The ranker is optimized as a…

GitHub Publication

Download

KITAB Dataset

🕮 KITAB is a challenging dataset and a dynamic data collection approach for testing abilities of Large Language Models (LLMs) in answering information retrieval queries with constraint filters. A filtering query with constraints can be…

Download

Dataset Source Code

SAMMO

Structure-aware Multi-Objective Metaprompt Optimization Library for Python. A flexible, easy-to-use library for running and optimizing prompts for Large Language Models (LLMs).

GitHub

Dataset Source Code

CodePlan

CodePlan is a research project that formalizes repository-level coding tasks as planning problems and uses static analysis and large language models (LLMs) to solve them. This replication package is for the paper titled “CodePlan: Repository-level…

GitHub