Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
ProtNote: a multimodal method for protein-function annotation
ProtNote is a multimodal deep learning model that leverages free-form text to enable both supervised and zero-shot protein function prediction.
AutoVerus
Automatically synthesize proof annotations that help Verus prove the correctness of Rust code.
FStar Data Set v2
This dataset is the Version 2.0 of the FStar Data Set. This dataset’s primary objective is to train and evaluate Proof-oriented Programming with AI (PoPAI, in short). Given a specification of a program and proof…
FStar Data Set v1
This dataset contains programs and proofs in F* proof-oriented programming language. The data, proposed in Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming, is an archive of source code, build artifacts, and metadata assembled from eight…
Privacy-preserving in-context learning with differentially private few-shot generation
This is a codebase to perform privacy-preserving in-context learning with differentially private few-shot generation.
Eureka ML Insights
This repository contains the code for the Eureka ML Insights, a framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. The framework is designed to help researchers and practitioners run reproducible evaluations…
rStar
A self-play mutual reasoning approach that significantly improves reasoning capabilities of small language models (SLMs) without fine-tuning or superior models. rStar decouples reasoning into a self-play mutual generation-discrimination process.
VPTQ
Vector Post-Training Quantization (VPTQ) is a novel Post-Training Quantization method that leverages Vector Quantization to high accuracy on LLMs at an extremely low bit-width (