Microsoft Research Blog

Artificial intelligence

  1. Dual-Alignment Pre-training for Cross-lingual Sentence Embedding 

    May 16, 2023

    Recent studies have shown that dual encoder models trained with the sentence-level translation ranking task are effective methods for cross-lingual sentence embedding. However, our research indicates that token-level alignment is also crucial in multilingual scenarios, which has not been fully explored previously. Based on our…

  2. Incident-aware Duplicate Ticket Aggregation for Cloud Systems 

    May 14, 2023

    In cloud systems, incidents are potential threats to customer satisfaction and business revenue. When customers are affected by incidents, they often request customer support service (CSS) from the cloud provider by submitting a support ticket. Many tickets could be duplicate as they are reported in…

  3. Code Execution with Pre-trained Language Models 

    May 8, 2023

    Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures. In this paper, we investigate how well…

  4. Automatic Prompt Optimization with “Gradient Descent” and Beam Search 

    May 4, 2023

    Large Language Models (LLMs) have shown impressive performance as general purpose agents, but their abilities remain highly dependent on prompts which are hand written with onerous trial-and-error effort. We propose a simple and nonparametric solution to this problem, Automatic Prompt Optimization (APO), which is inspired…

  5. AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers 

    May 1, 2023

    Mixture-of-Expert (MoE) models have obtained state-of-the-art performance in Neural Machine Translation (NMT) tasks. Existing works in MoE mostly consider a homogeneous design where the same number of experts of the same size are placed uniformly throughout the network. Furthermore, existing MoE works do not consider…

  6. A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training 

    May 1, 2023 | Nitay Calderon, Subhabrata (Subho) Mukherjee, Roi Reichart, and Amir Kantor

    Modern Natural Language Generation (NLG) models come with massive computational and storage requirements. In this work, we study the potential of compressing them, which is crucial for real-world applications serving millions of users. We focus on Knowledge Distillation (KD) techniques, in which a small student…

  7. LMGQS: A Large-scale Dataset for Query-focused Summarization 

    May 1, 2023

    Query-focused summarization (QFS) aims to extract or generate a summary of an input document that directly answers or is relevant to a given query. The lack of large-scale datasets in the form of documents, queries, and summaries has hindered model development in this area. In…

  8. Cornet: Learning Table Formatting Rules By Example 

    May 1, 2023

    Spreadsheets are widely used for table manipulation and presentation. Stylistic formatting of these tables is an important property for presentation and analysis. As a result, popular spreadsheet software, such as Excel, supports automatically formatting tables based on rules. Unfortunately, writing such formatting rules can be…

  9. Benchmarking Spatial Relationships in Text-to-Image Generation 

    May 1, 2023

    Spatial understanding is a fundamental aspect of computer vision and integral for human-level reasoning about images, making it an important component for grounded language understanding. While recent text-to-image synthesis (T2I) models have shown unprecedented improvements in photorealism, it is unclear whether they have reliable spatial…