Microsoft Research Blog

Artificial intelligence

  1. Generating Examples From CLI Usage: Can Transformers Help? 

    April 27, 2022

    Continuous evolution in modern software often causes documentation, tutorials, and examples to be out of sync with changing interfaces and frameworks. Relying on outdated documentation and examples can lead programs to fail or be less efficient or even less secure. In response, programmers need to…

  2. Varuna: Scalable, Low-cost Training of Massive Deep Learning Models 

    April 22, 2022

    Systems for training massive deep learning models (billions of parameters) today assume and require specialized "hyper-clusters": hundreds or thousands of GPUs wired with specialized high-bandwidth interconnects such as NV-Link and Infiniband. Besides being expensive, such dependence on hyper-clusters and custom high-speed inter-connects limits the size…

  3. Generating Programming Puzzles to Train Language Models 

    April 19, 2022 | Patrick Haluptzok, Matthew Bowers, and Adam Tauman Kalai

    This work shows how one can use large-scale Language Models (LMs) to automatically generate programming problems with verified solutions, in the form of “programming puzzles,” which can then in turn be used to fine-tune other LMs to solve more difficult programming puzzles. This work builds…

  4. Task-Oriented Dialogue System as Natural Language Generation 

    April 10, 2022

    In this paper, we propose to formulate the task-oriented dialogue system as the purely natural language generation task, so as to fully leverage the large-scale pre-trained models like GPT-2 and simplify complicated delexicalization prepossessing. However, directly applying this method heavily suffers from the dialogue entity…

  5. Learning to Extend Molecular Scaffolds with Structural Motifs 

    April 1, 2022

    Recent advancements in deep learning-based modeling of molecules promise to accelerate in silico drug discovery. A plethora of generative models is available, building molecules either atom-by-atom and bond-by-bond or fragment-by-fragment. However, many drug discovery projects require a fixed scaffold to be present in the generated…

  6. Taming Sparsely Activated Transformer with Stochastic Experts 

    April 1, 2022

    Sparsely activated models (SAMs), such as Mixture-of-Experts (MoE), can easily scale to have outrageously large amounts of parameters without significant increase in computational cost. However, SAMs are reported to be parameter inefficient such that larger models do not always lead to better performance. While most…

  7. LoRA: Low-Rank Adaptation of Large Language Models 

    April 1, 2022

    An important paradigm of natural language processing consists of large-scale pre-training on general domain data and adaptation to particular tasks or domains. As we pre-train larger models, full fine-tuning, which retrains all model parameters, becomes less feasible. Using GPT-3 175B as an example -- deploying…

  8. Efficient Self-supervised Vision Transformers for Representation Learning 

    April 1, 2022

    This paper investigates two techniques for developing efficient self-supervised vision transformers (EsViT) for visual representation learning. First, we show through a comprehensive empirical study that multi-stage architectures with sparse self-attentions can significantly reduce modeling complexity but with a cost of losing the ability to capture…

  9. Active label cleaning for improved dataset quality under resource constraints 

    March 4, 2022

    Abstract: Imperfections in data annotation, known as label noise, are detrimental to the training of machine learning models and have an often-overlooked confounding effect on the assessment of model performance. Nevertheless, employing experts to remove label noise by fully re-annotating large datasets is infeasible in…

  10. Is explainable AI a race against model complexity? 

    March 1, 2022 | Advait Sarkar

    Explaining the behaviour of intelligent systems will get increasingly and perhaps intractably challenging as models grow in size and complexity. We may not be able to expect an explanation for every prediction made by a brain-scale model, nor can we expect explanations to remain objective…