Microsoft Research Blog

Research Blog

  1. Diagram shows The role of computational modelling in the early-stage drug-discovery process. Following target identification and the screening of many molecules to identify possible candidates, the process of optimization can occur by human-led cycles of synthesis and testing in the laboratory. However, if computational modelling is used, most molecules are tested in silico, and it becomes necessary to synthesize and test only a small fraction of the candidate molecules. Just as with in vitro testing, in silico testing must be followed by clinical trials before the drug reaches the market.

    FS-Mol: Bringing Deep Learning to Early-Stage Drug Discovery 

    December 10, 2021 | Marc Brockschmidt and Megan Stanley

    The drug development process is an iterative one that consists of discovery, design, and testing. Historically, drugs were derived from plants and discovered through trial-and-error experiments. Fortunately, this drug discovery process now occurs in a lab, with each iteration of custom-designed compounds producing a more…

  2. Diagram of BugLab

    Finding and fixing bugs with deep learning 

    December 8, 2021 | Miltos Allamanis and Marc Brockschmidt

    Finding and fixing bugs in code is a time-consuming, and often frustrating, part of everyday work for software developers. Can deep learning address this problem and help developers deliver better software, faster? In a new paper, Self-Supervised Bug Detection and Repair, presented at the 2021…

  3. A graphic illustrating that SynapseML unifies a variety of different ML frameworks (including LightGBM, Azure Cognitive Services, Deep Learning, reinforcement learning), scales (including single node, cluster, and serverless + elastic), paradigms (including batch, streaming, and serving), cloud data stores, and languages.

    SynapseML: A simple, multilingual, and massively parallel machine learning library 

    November 17, 2021 | Mark Hamilton

    Today, we’re excited to announce the release of SynapseML (previously MMLSpark), an open-source library that simplifies the creation of massively scalable machine learning (ML) pipelines. Building production-ready distributed ML pipelines can be difficult, even for the most seasoned developer. Composing tools from different ecosystems often…

  4. An illustration of how the image text contrastive and translation text contrastive tasks work together to help align the space of images, English text and non-English text. On the left side of the illustration, the three domains—Image Domain, English Domain, and Non-English Domain--are segregated. An arrow labeled ā€œImage-Captions training dataā€ points to another depiction of the three domains where the image domain and the English domain intersect but the non-English domain is still separate and shown in gray to show that it’s not significantly affected. A two headed arrow with the label ā€œImage-Text contrastive lossā€ is drawn between the image and English domains. Towards the bottom of the image, an arrow labeled ā€œParallel corpus training dataā€ points to another depiction of the three domains where the English domain and the non-English domain intersect but the image domain is separate and shown in gray to indicate that it is not significantly affected. A two-headed arrow with the label ā€œTranslated Text Contrastive lossā€ is drawn between the English and non-English domains. Finally, a third arrow with the label ā€œResulting Effectā€ is drawn to the right of the image which points to a depiction of all three domains intersecting.

    Turing Bletchley: A Universal Image Language Representation model by Microsoft 

    November 1, 2021 | Saurabh Tiwary

    Today, the Microsoft Turing team (opens in new tab) is thrilled to introduce Turing Bletchley, a 2.5-billion parameter Universal Image Language Representation model (T-UILR) that can perform image-language tasks in 94 languages. T-Bletchley has an image encoder and a universal language encoder that vectorize input image and text respectively…

  5. ACAV100M text on top of a series of small images.

    ACAV100M: Scaling up self-supervised audio-visual learning with automatically curated internet videos 

    October 28, 2021 | Yale Song

    The natural association between visual observations and their corresponding sounds has exhibited powerful self-supervision signals for learning video representations, which makes the ever-growing amount of online video an attractive data source for self-supervised learning. However, online videos often provide imperfectly aligned audio-visual signals because of…