Microsoft Research Blog

Research Blog

  1. A graphic illustrating that SynapseML unifies a variety of different ML frameworks (including LightGBM, Azure Cognitive Services, Deep Learning, reinforcement learning), scales (including single node, cluster, and serverless + elastic), paradigms (including batch, streaming, and serving), cloud data stores, and languages.

    SynapseML: A simple, multilingual, and massively parallel machine learning library 

    November 17, 2021 | Mark Hamilton

    Today, we’re excited to announce the release of SynapseML (previously MMLSpark), an open-source library that simplifies the creation of massively scalable machine learning (ML) pipelines. Building production-ready distributed ML pipelines can be difficult, even for the most seasoned developer. Composing tools from different ecosystems often…

  2. An illustration of how the image text contrastive and translation text contrastive tasks work together to help align the space of images, English text and non-English text. On the left side of the illustration, the three domains—Image Domain, English Domain, and Non-English Domain--are segregated. An arrow labeled “Image-Captions training data” points to another depiction of the three domains where the image domain and the English domain intersect but the non-English domain is still separate and shown in gray to show that it’s not significantly affected. A two headed arrow with the label “Image-Text contrastive loss” is drawn between the image and English domains. Towards the bottom of the image, an arrow labeled “Parallel corpus training data” points to another depiction of the three domains where the English domain and the non-English domain intersect but the image domain is separate and shown in gray to indicate that it is not significantly affected. A two-headed arrow with the label “Translated Text Contrastive loss” is drawn between the English and non-English domains. Finally, a third arrow with the label “Resulting Effect” is drawn to the right of the image which points to a depiction of all three domains intersecting.

    Turing Bletchley: A Universal Image Language Representation model by Microsoft 

    November 1, 2021 | Saurabh Tiwary

    Today, the Microsoft Turing team (opens in new tab) is thrilled to introduce Turing Bletchley, a 2.5-billion parameter Universal Image Language Representation model (T-UILR) that can perform image-language tasks in 94 languages. T-Bletchley has an image encoder and a universal language encoder that vectorize input image and text respectively…

  3. ACAV100M text on top of a series of small images.

    ACAV100M: Scaling up self-supervised audio-visual learning with automatically curated internet videos 

    October 28, 2021 | Yale Song

    The natural association between visual observations and their corresponding sounds has exhibited powerful self-supervision signals for learning video representations, which makes the ever-growing amount of online video an attractive data source for self-supervised learning. However, online videos often provide imperfectly aligned audio-visual signals because of…

  4. On left, text reads ORBIT benchmark dataset: 77 blind and low-vision collectors, 486 objects, 3822 videos, and 2687934 frames. On right, a graphic of a face mask with a line that connects to a picture of a cloth mask with a black and white zig-zag pattern. The line reads seven to eight videos per object. Below the face mask graphic, there are three yellow objects resembling a watering can, a key, and a comb. A line next to these reads two to ten objects per user. The objects are falling into a green bucket, with a line to the right of the bucket that reads user’s bucket.

    Announcing the ORBIT dataset: Advancing real-world few-shot learning using teachable object recognition 

    October 19, 2021 | Daniela Massiceti, Cecily Morrison, Katja Hofmann, and Ed Cutrell

    Object recognition systems have made spectacular advances in recent years, but they rely on training datasets with thousands of high-quality, labelled examples per object category. Learning new objects from only a few examples could open the door to many new applications. For example, robotics manufacturing…