Microsoft Research Blog

Artificial intelligence

  1. Token-wise Curriculum Learning for Neural Machine Translation 

    March 19, 2021

    Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage. This is not always achievable for low-resource languages where the amount of training data is limited. To address such limitation, we…

  2. LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval 

    March 15, 2021

    Multimodal pre-training has propelled great advancement in vision-and-language research. These large-scale pre-trained models, although successful, fatefully suffer from slow inference speed due to enormous computation cost mainly from cross-modal attention in Transformer architecture. When applied to real-life applications, such latency and computation demand severely deter…

  3. Are NLP Models Really Able to Solve Simple Math Word Problems? 

    March 11, 2021 | Arkil Patel, Satwik Bhattamishra, and Navin Goyal

    The problem of designing NLP solvers for math word problems (MWP) has seen sustained research activity and steady gains in the test accuracy. Since existing solvers achieve high performance on the benchmark datasets for elementary level MWPs containing one-unknown arithmetic word problems, such problems are…

  4. MediaSum: A Large-scale Media Interview Dataset for Dialogue Summarization 

    March 10, 2021 | Chenguang Zhu, Yang Liu, Jie Mei, and Michael Zeng

    MediaSum, a large-scale media interview dataset consisting of 463.6K transcripts with abstractive summaries. To create this dataset, we collect interview transcripts from NPR and CNN and employ the overview and topic descriptions as summaries. Compared with existing public corpora for dialogue summarization, our dataset is…

  5. RMP2: A Structured Composable Policy Class for Robot Learning. 

    March 9, 2021

    We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has…

  6. Stronger NAS with Weaker Predictors 

    February 20, 2021

    Neural Architecture Search (NAS) often trains and evaluates a large number of architectures. Recent predictor-based NAS approaches attempt to address such heavy computation costs with two key steps: sampling some architecture-performance pairs and fitting a proxy accuracy predictor. Given limited samples, these predictors, however, are…

  7. Inducing a hierarchy for multi-class classification problems 

    February 19, 2021

    In applications where categorical labels follow a natural hierarchy, classification methods that exploit the label structure often outperform those that do not. Un-fortunately, the majority of classification datasets do not come pre-equipped with a hierarchical structure and classical flat classifiers must be employed. In this…

  8. Training Large-Scale News Recommenders with Pretrained Language Models in the Loop 

    February 17, 2021

    News recommendation calls for deep insights of news articles' underlying semantics. Therefore, pretrained language models (PLMs), like BERT and RoBERTa, may substantially contribute to the recommendation quality. However, it's extremely challenging to have news recommenders trained together with such big models: the learning of news…

  9. Detecting Anomalous Time Series by GAMLSS-Akaike-Weights-Scoring 

    February 11, 2021 | Cole Sodja

    An extensible statistical framework for detecting anomalous time series including those with heavy-tailed distributions and nonstationarity in higher-order moments is introduced based on penalized likelihood distributional regression. Specifically, generalized additive models for location, scale, and shape are used to infer sample path representations defined by…