Jigsaw Datasets
Jigsaw Dataset: Natural language to Python Pandas code. Two datasets (PandasEval1 and PandasEval2) described in our paper, “Jigsaw: Large Language Models meet Program Synthesis”.
Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
Jigsaw Dataset: Natural language to Python Pandas code. Two datasets (PandasEval1 and PandasEval2) described in our paper, “Jigsaw: Large Language Models meet Program Synthesis”.
Microsoft Collective Communication Library (MSCCL) is a platform to execute custom collective communication algorithms for multiple accelerators supported by Microsoft Azure.
The goal of this project is to use audio recordings and corresponding annotations to build an automatic classifier for calls from four different species of blue whales, and to estimate the total number of calls…
Maximal Update Parametrization (ÎĽP) and Hyperparameter Transfer (ÎĽTransfer), in association with the paper: Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
This repository contains the PyTorch implementation of the COMPASS model proposed in our paper: COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems. COMPASS aims to build general purpose representations for autonomous systems from multimodal observations. Given…
To examine the cognitive processes of remembering and imagining and their traces in language, we introduce Hippocorpus, a dataset of 6,854 English diary-like short stories about recalled and imagined events. Using a crowdsourcing framework, we…
A platform to display the carbon neutralization information for researchers, decision-makers, and other participants in the community.
A repository for training models from high-resolution aerial imagery and a dataset of predicted poultry barns across the United States.