Microsoft Research Blog

Artificial intelligence

Decomposed Mutual Information Estimation for Contrastive Representation Learning

July 17, 2021

Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context. E.g., we can derive multiple views of a given image by applying data augmentation, or we can split a sequence into views comprising the past and future…
ChaCha for Online AutoML

July 17, 2021

We propose the ChaCha (Champion-Challengers) algorithm for making an online choice of hyperparameters in online learning settings. ChaCha handles the process of determining a champion and scheduling a set of `live' challengers over time based on sample complexity bounds. It is guaranteed to have sublinear…
Leveraging Lead Bias for Zero-shot Abstractive News Summarization

July 10, 2021

Lead bias is a common phenomenon in news summarization, where early parts of an article often contain the most salient information. While many algorithms exploit this fact in summary generation, it has a detrimental effect on teaching the model to discriminate and extract important information.…
Confidence-Budget Matching for Sequential Budgeted Learning

July 4, 2021 | Yonathan Efroni, Nadav Merlis, Aadirupa Saha, and Shie Mannor

A core element in decision-making under uncertainty is the feedback on the quality of the performed actions. However, in many applications, such feedback is restricted. For example, in recommendation systems, repeatedly asking the user to provide feedback on the quality of recommendations will annoy them.…
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters

July 3, 2021

We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, they may suffer from catastrophic forgetting. To address this,…
The FATE Landscape of Sign Language AI Datasets: An Interdisciplinary Perspective

July 1, 2021

Sign language datasets are essential to developing many sign language technologies. In particular, datasets are required for training artificial intelligence (AI) and machine learning (ML) systems. Though the idea of using AI/ML for sign languages is not new, technology has now advanced to a point…
PatchMatch-Based Neighborhood Consensus for Semantic Correspondence

June 21, 2021 | Jae Yong Lee, Joseph DeGol, Victor Fragoso, and Sudipta Sinha

We address estimating dense correspondences between two images depicting different but semantically related scenes. End-to-end trainable deep neural networks incorporating neighborhood consensus cues are currently the best methods for this task. However, these architectures require exhaustive matching and 4D convolutions over matching costs for all…
Learning and Generalization in Overparameterized Normalizing Flows

June 18, 2021 | Kulin Shah, Amit Deshpande, and Navin Goyal

In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with sufficiently small learning rate and suitable initialization. In contrast, the benefit of overparameterization in unsupervised learning is not…
InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

June 14, 2021

In this work, we formulate cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, the information-theoretic framework inspires us to propose a pre-training task based on…
Consistency Regularization for Cross-Lingual Fine-Tuning

June 14, 2021

Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others. In this work, we propose to improve cross-lingual fine-tuning with consistency regularization. Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations, i.e.,…
Speech Quality Assessment in Crowdsourcing: Comparison Category Rating Method

June 13, 2021 | Babak Naderi, Sebastian Moller, and Ross Cutler

Traditionally, Quality of Experience (QoE) for a communication system is evaluated through a subjective test. The most common test method for speech QoE is the Absolute Category Rating (ACR), in which participants listen to a set of stimuli, processed by the underlying test conditions, and…
Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment

June 12, 2021

The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. In this paper, we introduce denoising word alignment as a new cross-lingual pre-training task. Specifically, the model first self-labels word alignments for parallel sentences. Then we randomly mask…

No results