ToxiGen
Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, as those groups are often the targets of online hate. Such over-reliance on spurious correlations also causes systems to struggle…
Discover an index of datasets, SDKs, APIs and open-source tools developed by Microsoft researchers and shared with the global academic community below. These experimental technologies—available through Azure AI Foundry Labs (opens in new tab)—offer a glimpse into the future of AI innovation.
Toxic language detection systems often falsely flag text that contains minority group mentions as toxic, as those groups are often the targets of online hate. Such over-reliance on spurious correlations also causes systems to struggle…
We introduce our full experimental data as Hybrid Hiring, a large-scale dataset for studying human AI decision-making that is collected and evaluated on real-world candidates. Comprised of 38,400 human judgements and over 9,600 unique prediction…
Source code and data for the CVPR 2022 paper “Learning to Detect Scene Landmarks for Camera Localization”.
Microsoft is working to make data that is relevant to important social problems as open as possible, including by contributing open data ourselves. The Data for Society resource center provides access to Microsoft’s open datasets,…
Here, we provide a plug-in-and-play implementation of Admin, which stabilizes previously-diverged Transformer training and achieves better performance, without introducing additional hyper-parameters. The design of Admin is half-precision friendly and can be reparameterized into the original…
XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale.
This repo contains the source code of the Python package loralib and several examples of how to integrate it with PyTorch models, such as those in HuggingFace. We only support PyTorch for now. See our…
Implementation of MoLeR: a generative model of molecular graphs which supports scaffold-constrained generation. This open-source code accompanies our paper “Learning to Extend Molecular Scaffolds with Structural Motifs”, which has been accepted at the ICLR 2022…
Github link to Iris – pretrained summarization models for structured datasets and cardinality estimation.