Microsoft Research Blog

Artificial intelligence

  1. ChaCha for Online AutoML 

    July 17, 2021

    We propose the ChaCha (Champion-Challengers) algorithm for making an online choice of hyperparameters in online learning settings. ChaCha handles the process of determining a champion and scheduling a set of `live' challengers over time based on sample complexity bounds. It is guaranteed to have sublinear…

  2. Leveraging Lead Bias for Zero-shot Abstractive News Summarization 

    July 10, 2021

    Lead bias is a common phenomenon in news summarization, where early parts of an article often contain the most salient information. While many algorithms exploit this fact in summary generation, it has a detrimental effect on teaching the model to discriminate and extract important information.…

  3. Confidence-Budget Matching for Sequential Budgeted Learning 

    July 4, 2021 | Yonathan Efroni, Nadav Merlis, Aadirupa Saha, and Shie Mannor

    A core element in decision-making under uncertainty is the feedback on the quality of the performed actions. However, in many applications, such feedback is restricted. For example, in recommendation systems, repeatedly asking the user to provide feedback on the quality of recommendations will annoy them.…

  4. K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters 

    July 3, 2021

    We study the problem of injecting knowledge into large pre-trained models like BERT and RoBERTa. Existing methods typically update the original parameters of pre-trained models when injecting knowledge. However, when multiple kinds of knowledge are injected, they may suffer from catastrophic forgetting. To address this,…

  5. PatchMatch-Based Neighborhood Consensus for Semantic Correspondence 

    June 21, 2021 | Jae Yong Lee, Joseph DeGol, Victor Fragoso, and Sudipta Sinha

    We address estimating dense correspondences between two images depicting different but semantically related scenes. End-to-end trainable deep neural networks incorporating neighborhood consensus cues are currently the best methods for this task. However, these architectures require exhaustive matching and 4D convolutions over matching costs for all…

  6. Learning and Generalization in Overparameterized Normalizing Flows 

    June 18, 2021 | Kulin Shah, Amit Deshpande, and Navin Goyal

    In supervised learning, it is known that overparameterized neural networks with one hidden layer provably and efficiently learn and generalize, when trained using stochastic gradient descent with sufficiently small learning rate and suitable initialization. In contrast, the benefit of overparameterization in unsupervised learning is not…

  7. InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training 

    June 14, 2021

    In this work, we formulate cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts. The unified view helps us to better understand the existing methods for learning cross-lingual representations. More importantly, the information-theoretic framework inspires us to propose a pre-training task based on…

  8. Consistency Regularization for Cross-Lingual Fine-Tuning 

    June 14, 2021

    Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others. In this work, we propose to improve cross-lingual fine-tuning with consistency regularization. Specifically, we use example consistency regularization to penalize the prediction sensitivity to four types of data augmentations, i.e.,…

  9. Improving Pretrained Cross-Lingual Language Models via Self-Labeled Word Alignment 

    June 12, 2021

    The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences. In this paper, we introduce denoising word alignment as a new cross-lingual pre-training task. Specifically, the model first self-labels word alignments for parallel sentences. Then we randomly mask…