Microsoft Research Blog

Research Blog

  1. On left, text reads ORBIT benchmark dataset: 77 blind and low-vision collectors, 486 objects, 3822 videos, and 2687934 frames. On right, a graphic of a face mask with a line that connects to a picture of a cloth mask with a black and white zig-zag pattern. The line reads seven to eight videos per object. Below the face mask graphic, there are three yellow objects resembling a watering can, a key, and a comb. A line next to these reads two to ten objects per user. The objects are falling into a green bucket, with a line to the right of the bucket that reads user’s bucket.

    Announcing the ORBIT dataset: Advancing real-world few-shot learning using teachable object recognition 

    October 19, 2021 | Daniela Massiceti, Cecily Morrison, Katja Hofmann, and Ed Cutrell

    Object recognition systems have made spectacular advances in recent years, but they rely on training datasets with thousands of high-quality, labelled examples per object category. Learning new objects from only a few examples could open the door to many new applications. For example, robotics manufacturing…

  2. Figure 1. Trend of sizes of state-of-the-art NLP models over time

    Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model 

    October 11, 2021 | Ali Alvi and Paresh Kharya

    We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration between Microsoft and NVIDIA to…

  3. chart: CTDC global dataset on victims of trafficking

    Real-world evidence and the path from data to impact 

    September 23, 2021 | Darren Edge and Jonathan Larson

    From the intense shock of the COVID-19 pandemic to the effects of climate change, our global society has never faced greater risk. The Societal Resilience team at Microsoft Research was established in recognition of this risk and tasked with developing open technologies that enable a scalable response in times of crisis. And just as we think about scalability…

  4. Technical diagram of MEB model. MEB is a sparse neural network model composed of an input layer taking in binary features, a feature embedding layer transforming each binary feature into a 15-dimension vector, a sum pooling layer applied on each of 49 feature groups and concatenated to produce a 735-dimension vector, which is then passed through two dense layers to produce a click probability. Features shown in this figure are generated from the example query “Microsoft Windows” and document www.microsoft.com/en-us/windows.

    Make Every feature Binary: A 135B parameter sparse neural network for massively improved search relevance 

    August 4, 2021 | Junyan Chen, Frédéric Dubut, Jason (Zengzhong) Li, and Rangan Majumder

    Recently, Transformer-based deep learning models like GPT-3 have been getting a lot of attention in the machine learning world. These models excel at understanding semantic relationships, and they have contributed to large improvements in Microsoft Bing’s search experience (opens in new tab) and surpassing human…

  5. While the NNGP and NTK limits essentially only considers the neural network initialization, the feature learning limit incorporates the entire training trajectory. A Neural network is represented by a stack of vertical shapes: an inverted trapezoid, a square, and a triangle. On the left side of the shape, A blue arrow moves upward and represents the first forward pass. The NNGP limit can be thought of as the limit of this first forward pass. On the right side of the shape, a green arrow moves downward and represents the first backward pass. The NTK limit can be thought of as the limit for this first backward pass. In contrast, the feature learning limit takes into account the many cycles of forward and backward passes that take place during the entire training process. These cycles are represented by many repetitions of blue upward arrow and green downward arrows to the right of the neural network. An orange box encloses all of these cycles. On top of the box is the annotation “SGD Training Progress” with an arrow to the right. An arrow comes out from the bottom of the box pointing to a textbox that says “Feature Learning Limit, This Work.”

    On infinitely wide neural networks that exhibit feature learning 

    July 22, 2021 | Edward Hu and Greg Yang

    In the pursuit of learning about fundamentals of the natural world, scientists have had success with coming at discoveries from both a bottom-up and top-down approach. Neuroscience is a great example of the former. Spanish anatomist Santiago Ramón y Cajal discovered the neuron in the…