Microsoft Research Blog

Artificial intelligence

Generating Examples From CLI Usage: Can Transformers Help?

April 27, 2022

Continuous evolution in modern software often causes documentation, tutorials, and examples to be out of sync with changing interfaces and frameworks. Relying on outdated documentation and examples can lead programs to fail or be less efficient or even less secure. In response, programmers need to…
Varuna: Scalable, Low-cost Training of Massive Deep Learning Models

April 22, 2022

Systems for training massive deep learning models (billions of parameters) today assume and require specialized "hyper-clusters": hundreds or thousands of GPUs wired with specialized high-bandwidth interconnects such as NV-Link and Infiniband. Besides being expensive, such dependence on hyper-clusters and custom high-speed inter-connects limits the size…
Generating Programming Puzzles to Train Language Models

April 19, 2022 | Patrick Haluptzok, Matthew Bowers, and Adam Tauman Kalai

This work shows how one can use large-scale Language Models (LMs) to automatically generate programming problems with verified solutions, in the form of “programming puzzles,” which can then in turn be used to fine-tune other LMs to solve more difficult programming puzzles. This work builds…
Task-Oriented Dialogue System as Natural Language Generation

April 10, 2022

In this paper, we propose to formulate the task-oriented dialogue system as the purely natural language generation task, so as to fully leverage the large-scale pre-trained models like GPT-2 and simplify complicated delexicalization prepossessing. However, directly applying this method heavily suffers from the dialogue entity…
Learning to Extend Molecular Scaffolds with Structural Motifs

April 1, 2022

Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. A plethora of generative models is available, building molecules either atom-by-atom and bond-by-bond or fragment-by-fragment. However, many drug discovery projects require a fixed scaffold to be present in the generated…
Taming Sparsely Activated Transformer with Stochastic Experts

April 1, 2022

Sparsely activated models (SAMs), such as Mixture-of-Experts (MoE), can easily scale to have outrageously large amounts of parameters without significant increase in computational cost. However, SAMs are reported to be parameter inefficient such that larger models do not always lead to better performance. While most…
LoRA: Low-Rank Adaptation of Large Language Models

April 1, 2022

An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying…
Efficient Self-supervised Vision Transformers for Representation Learning

April 1, 2022

This paper investigates two techniques for developing efficient self-supervised vision transformers (EsViT) for visual representation learning. First, we show through a comprehensive empirical study that multi-stage architectures with sparse self-attentions can significantly reduce modeling complexity but with a cost of losing the ability to capture…
Beyond the Policy Gradient Theorem for Efficient Policy Updates in Actor-Critic Algorithms

March 28, 2022 | Romain Laroche and Remi Tachet des Combes

In Reinforcement Learning, the optimal action at a given state is dependent on policy decisions at subsequent states. As a consequence, the learning targets evolve with time and the policy optimization process must be efficient at unlearning what it previously learnt. In this paper, we…
HINT: Integration Testing for AI-based Features with Humans in the Loop

March 22, 2022 | Quanze Chen, Tobias Schnabel, Besmira Nushi, and Saleema Amershi

The dynamic nature of AI technologies makes testing human-AI interaction and collaboration challenging–especially before such features are deployed in the wild. This presents a challenge for designers and AI practitioners as early feedback for iteration is often unavailable in the development phase. In this paper,…
Active label cleaning for improved dataset quality under resource constraints

March 4, 2022

Abstract: Imperfections in data annotation, known as label noise, are detrimental to the training of machine learning models and have an often-overlooked confounding effect on the assessment of model performance. Nevertheless, employing experts to remove label noise by fully re-annotating large datasets is infeasible in…
Is explainable AI a race against model complexity?

March 1, 2022 | Advait Sarkar

Explaining the behaviour of intelligent systems will get increasingly and perhaps intractably challenging as models grow in size and complexity. We may not be able to expect an explanation for every prediction made by a brain-scale model, nor can we expect explanations to remain objective…

No results