Microsoft Research Blog

Artificial intelligence

Meta-Learning for Variational Inference

April 12, 2021

Variational inference (VI) plays an essential role in approximate Bayesian inference due to its computational efficiency and broad applicability. Crucial to the performance of VI is the selection of the associated divergence measure, as VI approximates the intractable distribution by minimizing this divergence. In this…
UniDrop: A Simple Yet Effective Technique to Improve Transformer without Extra Cost

April 10, 2021

Transformer architecture achieves great success in abundant natural language processing tasks. The over-parameterization of the Transformer model has motivated plenty of works to alleviate its overfitting for superior performances. With some explorations, we find simple techniques such as dropout, can greatly boost model performance with…
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections

April 9, 2021 | Ruiqi Zhong, Kristy Lee, Zheng Zhang, and Dan Klein

Large pre-trained language models (LMs) such as GPT-3 have acquired a surprising ability to perform zero-shot learning. For example, to classify sentiment without any training examples, we can "prompt" the LM with the review and the label description "Does the user like this movie?", and…
SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching

April 9, 2021

We present a new method SOLOIST that uses transfer learning and machine teaching to build task bots at scale. We parameterize classical modular task-oriented dialog systems using a Transformer-based auto-regressive language model, which subsumes different dialog modules into a single neural model. We pre-train, on heterogeneous…
Dual Self-Attention with Co-Attention Networks for Visual Question Answering

April 8, 2021

Abstract Visual Question Answering (VQA) as an important task in understanding vision and language has been proposed and aroused wide interests. In previous VQA methods, Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are generally used to extract visual and textual features respectively, and…
BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for Text Classification

April 4, 2021 | Ishani Mondal

Healthcare predictive analytics aids medical decision-making, diagnosis prediction and drug review analysis. Therefore, prediction accuracy is an important criteria which also necessitates robust predictive language models. However, the models using deep learning have been proven vulnerable towards insignificantly perturbed input instances which are less likely…
A Case Study of Efficacy and Challenges in Practical Human-in-Loop Evaluation of NLP Systems Using Checklist

April 1, 2021 | Shaily Bhatt, Rahul Jain, Sandipan Dandapat, and Sunayana Sitaram

Despite state-of-the-art performance, NLP systems can be fragile in real-world situations. This is often due to insufficient understanding of the capabilities and limitations of models and the heavy reliance on standard evaluation benchmarks. Research into non-standard evaluation to mitigate this brittleness is gaining increasing attention.…
GCM: A Toolkit for Generating Synthetic Code-mixed Text

April 1, 2021

Code-mixing is common in multilingual communities around the world, and processing it is challenging due to the lack of labeled and unlabeled data. We describe a tool that can automatically generate code-mixed data given parallel data in two languages. We implement two linguistic theories of…
Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End

April 1, 2021 | Ramaravind Kommiya Mothilal, Divyat Mahajan, Chenhao Tan, and Amit Sharma

Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation…
Platform for Situated Intelligence

March 29, 2021

We introduce Platform for Situated Intelligence, an open-source framework created to support the rapid development and study of multimodal, integrative-AI systems. The framework provides infrastructure for sensing, fusing, and making inferences from temporal streams of data across different modalities, a set of tools that enable…
CvT: Introducing Convolutions to Vision Transformers

March 28, 2021

We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs. This is accomplished through two primary modifications: a hierarchy of Transformers…
Mask Attention Networks: Rethinking and Strengthen Transformer

March 24, 2021

Transformer is an attention-based neural network, which consists of two sublayers, namely, Self-Attention Network (SAN) and Feed-Forward Network (FFN). Existing research explores to enhance the two sublayers separately to improve the capability of Transformer for text representation. In this paper, we present a novel understanding…

No results