Protein sequence models
Codebase for generative modeling of protein sequence and structure, including code for CNNs and GNNs and custom data handling code.
Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
Codebase for generative modeling of protein sequence and structure, including code for CNNs and GNNs and custom data handling code.
This repo contains the data and source code for baseline models in the NeurIPS 2021 benchmark paper for Constrained Language Understanding Evaluation Standard (CLUES) under MIT License.
AIDE: Annotation Interface for Data-driven Ecology – Tools for detecting wildlife in aerial images using active learning
DiCE is a Python library that can generate counterfactual explanations for any machine learning classifier. Counterfactual explanations present “what-if” perturbations of the input such that an ML classifier outputs a different class for those perturbations…
As part of this release, Navana Tech and Microsoft Research India are open-sourcing 1648 hours of validated Odia speech dataset and a baseline model for Odia speech recognition. The speech dataset consists of recordings in…
We present a new method LiST for efficient fine-tuning of large pre-trained language models (PLMs) in few-shot learning settings. LiST significantly improves over recent methods that adopt prompt fine-tuning using two key techniques. The first…
LITMUS Predictor provides support for simulating performance in ~100 languages given training observations of the desired task-model. Each training observation specifies the finetuning-datasize + test-performance in different languages. Further, the tool provides support for constructing…
DistIR is an intermediate representation (IR) and associated set of tools for optimizing distributed machine learning computations (both training and inference). An IR is a format for representing programs used by compilers and software analysis…