Microsoft Research Blog

Research Blog

  1. ACAV100M text on top of a series of small images.

    ACAV100M: Scaling up self-supervised audio-visual learning with automatically curated internet videos 

    October 28, 2021 | Yale Song

    The natural association between visual observations and their corresponding sounds has exhibited powerful self-supervision signals for learning video representations, which makes the ever-growing amount of online video an attractive data source for self-supervised learning. However, online videos often provide imperfectly aligned audio-visual signals because of…

  2. On left, text reads ORBIT benchmark dataset: 77 blind and low-vision collectors, 486 objects, 3822 videos, and 2687934 frames. On right, a graphic of a face mask with a line that connects to a picture of a cloth mask with a black and white zig-zag pattern. The line reads seven to eight videos per object. Below the face mask graphic, there are three yellow objects resembling a watering can, a key, and a comb. A line next to these reads two to ten objects per user. The objects are falling into a green bucket, with a line to the right of the bucket that reads user’s bucket.

    Announcing the ORBIT dataset: Advancing real-world few-shot learning using teachable object recognition 

    October 19, 2021 | Daniela Massiceti, Cecily Morrison, Katja Hofmann, and Ed Cutrell

    Object recognition systems have made spectacular advances in recent years, but they rely on training datasets with thousands of high-quality, labelled examples per object category. Learning new objects from only a few examples could open the door to many new applications. For example, robotics manufacturing…

  3. Figure 1. Trend of sizes of state-of-the-art NLP models over time

    Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model 

    October 11, 2021 | Ali Alvi and Paresh Kharya

    We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration between Microsoft and NVIDIA to…

  4. chart: CTDC global dataset on victims of trafficking

    Real-world evidence and the path from data to impact 

    September 23, 2021 | Darren Edge and Jonathan Larson

    From the intense shock of the COVID-19 pandemic to the effects of climate change, our global society has never faced greater risk. The Societal Resilience team at Microsoft Research was established in recognition of this risk and tasked with developing open technologies that enable a scalable response in times of crisis. And just as we think about scalability…

  5. Technical diagram of MEB model. MEB is a sparse neural network model composed of an input layer taking in binary features, a feature embedding layer transforming each binary feature into a 15-dimension vector, a sum pooling layer applied on each of 49 feature groups and concatenated to produce a 735-dimension vector, which is then passed through two dense layers to produce a click probability. Features shown in this figure are generated from the example query “Microsoft Windows” and document www.microsoft.com/en-us/windows.

    Make Every feature Binary: A 135B parameter sparse neural network for massively improved search relevance 

    August 4, 2021 | Junyan Chen, Frédéric Dubut, Jason (Zengzhong) Li, and Rangan Majumder

    Recently, Transformer-based deep learning models like GPT-3 have been getting a lot of attention in the machine learning world. These models excel at understanding semantic relationships, and they have contributed to large improvements in Microsoft Bing’s search experience (opens in new tab) and surpassing human…