Microsoft Research Blog

Artificial intelligence

On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology

May 24, 2023

Message Passing Neural Networks (MPNNs) are instances of Graph Neural Networks that leverage the graph to send messages over the edges. This inductive bias leads to a phenomenon known as over-squashing, where a node feature is insensitive to information contained at distant nodes. Despite recent…
ReWOO: Decoupling Reasoning from Observations for Efficient Augmented Language Models

May 23, 2023

Augmented Language Models (ALMs) blend the reasoning capabilities of Large Language Models (LLMs) with tools that allow for knowledge retrieval and action execution. Existing ALM systems trigger LLM thought processes while pulling observations from these tools in an interleaved fashion. Specifically, an LLM reasons to…
Dual-Alignment Pre-training for Cross-lingual Sentence Embedding

May 16, 2023

Recent studies have shown that dual encoder models trained with the sentence-level translation ranking task are effective methods for cross-lingual sentence embedding. However, our research indicates that token-level alignment is also crucial in multilingual scenarios, which has not been fully explored previously. Based on our…
Incident-aware Duplicate Ticket Aggregation for Cloud Systems

May 14, 2023

In cloud systems, incidents are potential threats to customer satisfaction and business revenue. When customers are affected by incidents, they often request customer support service (CSS) from the cloud provider by submitting a support ticket. Many tickets could be duplicate as they are reported in…
Code Execution with Pre-trained Language Models

May 8, 2023

Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures. In this paper, we investigate how well…
Automatic Prompt Optimization with “Gradient Descent” and Beam Search

May 4, 2023

Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort. We propose a simple and nonparametric solution to this problem, Automatic Prompt Optimization (APO), which is inspired…
From Words to Code: Harnessing Data for Program Synthesis from Natural Language

May 2, 2023

Creating programs to correctly manipulate data is a difficult task, as the underlying programming languages and APIs can be challenging to learn for many users who are not skilled programmers. Large language models (LLMs) demonstrate remarkable potential for generating code from natural language, but in…
AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers

May 1, 2023

Mixture-of-Expert (MoE) models have obtained state-of-the-art performance in Neural Machine Translation (NMT) tasks. Existing works in MoE mostly consider a homogeneous design where the same number of experts of the same size are placed uniformly throughout the network. Furthermore, existing MoE works do not consider…
A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training

May 1, 2023 | Nitay Calderon, Subhabrata (Subho) Mukherjee, Roi Reichart, and Amir Kantor

Modern Natural Language Generation (NLG) models come with massive computational and storage requirements. In this work, we study the potential of compressing them, which is crucial for real-world applications serving millions of users. We focus on Knowledge Distillation (KD) techniques, in which a small student…
LMGQS: A Large-scale Dataset for Query-focused Summarization

May 1, 2023

Query-focused summarization (QFS) aims to extract or generate a summary of an input document that directly answers or is relevant to a given query. The lack of large-scale datasets in the form of documents, queries, and summaries has hindered model development in this area. In…
Cornet: Learning Table Formatting Rules By Example

May 1, 2023

Spreadsheets are widely used for table manipulation and presentation. Stylistic formatting of these tables is an important property for presentation and analysis. As a result, popular spreadsheet software, such as Excel, supports automatically formatting tables based on rules. Unfortunately, writing such formatting rules can be…
Benchmarking Spatial Relationships in Text-to-Image Generation

May 1, 2023

Spatial understanding is a fundamental aspect of computer vision and integral for human-level reasoning about images, making it an important component for grounded language understanding. While recent text-to-image synthesis (T2I) models have shown unprecedented improvements in photorealism, it is unclear whether they have reliable spatial…

No results