Microsoft Research Blog

Artificial intelligence

  1. When deep learning met code search 

    August 11, 2019

    There have been multiple recent proposals on using deep neural networks for code search using natural language. Common across these proposals is the idea of embedding code and natural language queries into real vectors and then using vector distance to approximate semantic correlation between code…

  2. Towards Ethical Deployment of AI for Conservation Systems 

    August 1, 2019

    The ability to collect, aggregate, and process “big data”, particularly with artificial intelligence (AI) tools, has the potential to facilitate breakthrough research in conservation. As institutions develop and deploy such systems at scale, it is critical that they be designed with ethics, fairness, and transparency…

  3. Gradient Boosting With Piece-Wise Linear Regression Trees 

    July 31, 2019 | Yu Shi, Jian Li, and Zhize Li

    Gradient Boosted Decision Trees (GBDT) is a very successful ensemble learning algorithm widely used across a variety of applications. Recently, several variants of GBDT training algorithms and implementations have been designed and heavily optimized in some very popular open sourced toolkits including XGBoost, LightGBM and…

  4. Machine Learning at the Network Edge: A Survey 

    July 30, 2019

    Resource-constrained IoT devices, such as sensors and actuators, have become ubiquitous in recent years. This has led to the generation of large quantities of data in real-time, which is an appealing target for AI systems. However, deploying machine learning models on such end-devices is nearly…

  5. Improving Subseasonal Forecasting in the Western U.S. with Machine Learning 

    July 24, 2019

    Water managers in the western United States (U.S.) rely on long-term forecasts of temperature and precipitation to prepare for droughts and other wet weather extremes. To improve the accuracy of these long-term forecasts, the U.S. Bureau of Reclamation and the National Oceanic and Atmospheric Administration…

  6. Zero-Shot Adaptive Transfer for Conversational Language Understanding 

    July 16, 2019 | Sungjin Lee and Rahul Jha

    Conversational agents such as Alexa and Google Assistant constantly need to increase their language understanding capabilities by adding new domains. A massive amount of labeled data is required for training each new domain. While domain adaptation approaches alleviate the annotation cost, prior approaches suffer from…

  7. Unsupervised Learning with Contrastive Latent Variable Models 

    July 16, 2019 | Kristen Severson, Soumya Ghosh, and Kenney Ng

    In unsupervised learning, dimensionality reduction is an important tool for data exploration and visualization. Because these aims are typically open-ended, it can be useful to frame the problem as looking for patterns that are enriched in one dataset relative to another. These pairs of datasets…

  8. Machine-learning-guided directed evolution for protein engineering. 

    July 14, 2019 | Kevin Kaichuang Yang, Zachary Wu, and Frances H Arnold

    Protein engineering through machine-learning-guided directed evolution enables the optimization of protein functions. Machine-learning approaches predict how sequence maps to function in a data-driven manner without requiring a detailed model of the underlying physics or biological pathways. Such methods accelerate directed evolution by learning from the…

  9. Efficient Pipeline for Camera Trap Image Review 

    July 14, 2019 | Sara Beery, Dan Morris, and Siyu Yang

    Biologists all over the world use camera traps to monitor biodiversity and wildlife population density. The computer vision community has been making strides towards automating the species classification challenge in camera traps, but it has proven difficult to to apply models trained in one region…

  10. Towards Improving Neural Named Entity Recognition with Gazetteers 

    July 1, 2019 | Tianyu Liu, Jin-Ge Yao, and Chin-Yew Lin

    Most of the recently proposed neural models for named entity recognition have been purely data-driven, with a strong emphasis on getting rid of the efforts for collecting external resources or designing hand-crafted features. This could increase the chance of overfitting since the models cannot access…