Microsoft Research Blog

Artificial intelligence

  1. Meta-Learning for Variational Inference 

    April 12, 2021

    Variational inference (VI) plays an essential role in approximate Bayesian inference due to its computational efficiency and broad applicability. Crucial to the performance of VI is the selection of the associated divergence measure, as VI approximates the intractable distribution by minimizing this divergence. In this…

  2. UniDrop: A Simple Yet Effective Technique to Improve Transformer without Extra Cost 

    April 10, 2021

    Transformer architecture achieves great success in abundant natural language processing tasks. The over-parameterization of the Transformer model has motivated plenty of works to alleviate its overfitting for superior performances. With some explorations, we find simple techniques such as dropout, can greatly boost model performance with…

  3. SOLOIST: Building Task Bots at Scale with Transfer Learning and Machine Teaching 

    April 9, 2021

    We present a new method SOLOIST that uses transfer learning and machine teaching to build task bots at scale. We parameterize classical modular task-oriented dialog systems using a Transformer-based auto-regressive language model, which subsumes different dialog modules into a single neural model. We pre-train, on heterogeneous…

  4. framework

    Dual Self-Attention with Co-Attention Networks for Visual Question Answering 

    April 8, 2021

    Abstract Visual Question Answering (VQA) as an important task in understanding vision and language has been proposed and aroused wide interests. In previous VQA methods, Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are generally used to extract visual and textual features respectively, and…

  5. BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for Text Classification 

    April 4, 2021 | Ishani Mondal

    Healthcare predictive analytics aids medical decision-making, diagnosis prediction and drug review analysis. Therefore, prediction accuracy is an important criteria which also necessitates robust predictive language models. However, the models using deep learning have been proven vulnerable towards insignificantly perturbed input instances which are less likely…

  6. A Case Study of Efficacy and Challenges in Practical Human-in-Loop Evaluation of NLP Systems Using Checklist 

    April 1, 2021 | Shaily Bhatt, Rahul Jain, Sandipan Dandapat, and Sunayana Sitaram

    Despite state-of-the-art performance, NLP systems can be fragile in real-world situations. This is often due to insufficient understanding of the capabilities and limitations of models and the heavy reliance on standard evaluation benchmarks. Research into non-standard evaluation to mitigate this brittleness is gaining increasing attention.…

  7. GCM: A Toolkit for Generating Synthetic Code-mixed Text 

    April 1, 2021

    Code-mixing is common in multilingual communities around the world, and processing it is challenging due to the lack of labeled and unlabeled data. We describe a tool that can automatically generate code-mixed data given parallel data in two languages. We implement two linguistic theories of…

  8. Towards Unifying Feature Attribution and Counterfactual Explanations: Different Means to the Same End 

    April 1, 2021 | Ramaravind Kommiya Mothilal, Divyat Mahajan, Chenhao Tan, and Amit Sharma

    Feature attributions and counterfactual explanations are popular approaches to explain a ML model. The former assigns an importance score to each input feature, while the latter provides input examples with minimal changes to alter the model's predictions. To unify these approaches, we provide an interpretation…

  9. Platform for Situated Intelligence 

    March 29, 2021

    We introduce Platform for Situated Intelligence, an open-source framework created to support the rapid development and study of multimodal, integrative-AI systems. The framework provides infrastructure for sensing, fusing, and making inferences from temporal streams of data across different modalities, a set of tools that enable…

  10. CvT: Introducing Convolutions to Vision Transformers 

    March 28, 2021

    We present in this paper a new architecture, named Convolutional vision Transformer (CvT), that improves Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs. This is accomplished through two primary modifications: a hierarchy of Transformers…

  11. Mask Attention Networks: Rethinking and Strengthen Transformer 

    March 24, 2021

    Transformer is an attention-based neural network, which consists of two sublayers, namely, Self-Attention Network (SAN) and Feed-Forward Network (FFN). Existing research explores to enhance the two sublayers separately to improve the capability of Transformer for text representation. In this paper, we present a novel understanding…