Microsoft Research Blog

Artificial intelligence

Token-wise Curriculum Learning for Neural Machine Translation

March 19, 2021

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage. This is not always achievable for low-resource languages where the amount of training data is limited. To address such limitation, we…
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

March 15, 2021

Multimodal pre-training has propelled great advancement in vision-and-language research. These large-scale pre-trained models, although successful, fatefully suffer from slow inference speed due to enormous computation cost mainly from cross-modal attention in Transformer architecture. When applied to real-life applications, such latency and computation demand severely deter…
Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning

March 11, 2021

Few-shot text classification is a fundamental NLP task in which a model aims to classify text into a large number of categories, given only a few training examples per category. This paper explores data augmentation -- a technique particularly suitable for training with limited data…
Are NLP Models Really Able to Solve Simple Math Word Problems?

March 11, 2021 | Arkil Patel, Satwik Bhattamishra, and Navin Goyal

The problem of designing NLP solvers for math word problems (MWP) has seen sustained research activity and steady gains in the test accuracy. Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are…
MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization

March 10, 2021 | Chenguang Zhu, Yang Liu, Jie Mei, and Michael Zeng

MediaSum, a large-scale media interview dataset consisting of 463.6K transcripts with abstractive summaries. To create this dataset, we collect interview transcripts from NPR and CNN and employ the overview and topic descriptions as summaries. Compared with existing public corpora for dialogue summarization, our dataset is…
RMP2: A Structured Composable Policy Class for Robot Learning.

March 9, 2021

We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has…
Stronger NAS with Weaker Predictors

February 20, 2021

Neural Architecture Search (NAS) often trains and evaluates a large number of architectures. Recent predictor-based NAS approaches attempt to address such heavy computation costs with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. Given limited samples, these predictors, however, are…
Inducing a hierarchy for multi-class classification problems

February 19, 2021

In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not. Un-fortunately, the majority of classification datasets do not come pre-equipped with a hierarchical structure and classical flat classifiers must be employed. In this…
Training Large-Scale News Recommenders with Pretrained Language Models in the Loop

February 17, 2021

News recommendation calls for deep insights of news articles' underlying semantics. Therefore, pretrained language models (PLMs), like BERT and RoBERTa, may substantially contribute to the recommendation quality. However, it's extremely challenging to have news recommenders trained together with such big models: the learning of news…
Detecting Anomalous Time Series by GAMLSS-Akaike-Weights-Scoring

February 11, 2021 | Cole Sodja

An extensible statistical framework for detecting anomalous time series including those with heavy-tailed distributions and nonstationarity in higher-order moments is introduced based on penalized likelihood distributional regression. Specifically, generalized additive models for location, scale, and shape are used to infer sample path representations defined by…
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

February 8, 2021

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform…
Educational Question Mining At Scale: Prediction, Analysis and Personalization

February 6, 2021

Online education platforms enable teachers to share a large number of educational resources such as questions to form exercises and quizzes for students. With large volumes of available questions, it is important to have an automated way to quantify their properties and intelligently select them…

No results