Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
Phi-1.5
The language model phi-1.5 is a Transformer with 1.3 billion parameters. It was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts. When…
Phi-1
The language model phi-1 is a Transformer with 1.3 billion parameters, specialized for basic Python coding. Its training involved a variety of data sources, including subsets of Python codes from The Stack v1.2, Q&A content…
InferredBugs
InferredBugs is a metadata-rich dataset of bugs and fixes in Java and C# programming languages, extracted using Infer (for Java) and InferSharp (for C#). The dataset has been constructed by systematically analyzing open-source repositories, scrutinizing…
NoFunEval
This repository hosts the official code and data artifact for the paper “NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness”. The work is a comprehensive evaluation of code language models on real-world code…
Lymphoma lesion segmentation DNN
Lymphoma lesion segmentation and quantitation plays a pivotal role in the diagnosis, treatment planning, and monitoring of lymphoma patients. Accurate segmentation allows for the precise delineation of pathological regions, aiding clinicians in assessing disease extent…
Sarathi-Serve
Sarathi-Serve (a research prototype) is a high throughput and low-latency LLM serving framework. This repository contains a benchmark suite for evaluating LLM performance from a systems point of view. It contains various workloads and scheduling…
VIDUR: LLM Simulator
Vidur is a high-fidelity and extensible LLM inference simulator. It can help you with capacity planning and finding the best deployment configuration for your LLM deployments, test new research ideas like new scheduling algorithms, optimizations…
MX Pytorch Emulation Library
PyTorch emulation library for Microscaling (MX)-compatible data formats
Platform for Situated Intelligence Framework
\psi (Platform for Situated Intelligence) is an open-source, extensible framework that accelerates development and research of multimodal, integrative AI systems. The platform consists of three layers. The Runtime layer provides a parallel programming model centered…