Microsoft Research Blog

English

  1. Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency 

    November 1, 2025

    Recently, Test-Time Scaling (TTS) has gained increasing attention for improving LLM reasoning performance at test time without retraining the model. A notable TTS technique is Self-Consistency (SC), which generates multiple reasoning chains in parallel and selects the final answer via majority voting. While effective, the…

  2. Four-panel collage showing mixed reality and virtual interaction: - Person wearing headset, gesturing with hands toward virtual objects. - Second person with headset, interacting with virtual interface. - Two virtual avatars, one in purple, one in green, appearing to converse. - Three avatars in a virtual room with tables, suggesting collaboration

    Avatars in mixed-reality meetings: A longitudinal field study of realistic versus cartoon facial likeness effects on communication, task satisfaction, presence, and emotional perception 

    November 1, 2025

    We conducted a within-subjects study to examine how realistic faces and cartoon faces on avatars affect communication, task satisfaction, sense of presence, and mood perception in mixed reality meetings. Over the course of two weeks, six groups of co-workers (14 people) held recurring meetings using…

  3. Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations 

    November 1, 2025

    LLMs often fail to meet the specialized needs of distinct user groups due to their one-size-fits-all training paradigm \cite{lucy-etal-2024-one} and there is limited research on what personalization aspects each group expect. To address these limitations, we propose a group-aware personalization framework, Group Preference Alignment (GPA),…

  4. GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration 

    November 1, 2025

    Text-to-video generation models have shown significant progress in the recent years. However, they still struggle with generating complex dynamic scenes based on compositional text prompts, such as attribute binding for multiple objects, temporal dynamics associated with different objects, and interactions between objects. Our key motivation…

  5. Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models 

    November 1, 2025

    The biases exhibited by text-to-image (TTI) models are often treated as independent, though in reality, they may be deeply interrelated. Addressing bias along one dimension—such as ethnicity or age—can inadvertently affect another, like gender, either mitigating or exacerbating existing disparities. Understanding these interdependencies is crucial…

  6. Supporting Industry Computing Researchers in Assessing, Articulating, and Addressing the Potential Negative Societal Impact of Their Work 

    November 1, 2025 | Wesley Hanwen Deng, Solon Barocas, and Jennifer Wortman Vaughan

    Recent years have witnessed increasing calls for computing researchers to grapple with the societal impacts of their work. Tools such as impact assessments have gained prominence as a method to uncover potential impacts, and a number of publication venues now encourage authors to include an…

  7. Table-Specialist: Language Model Specialists for Tables using Iterative Fine-tuning 

    November 1, 2025

    Language models such as GPT and Llama have shown remarkable ability on diverse natural language tasks, yet their performance on complex table tasks (e.g., NL-to-Code, data cleaning, etc.) continue to be sub-optimal. To improve their performance, task-specific fine-tuning is often needed, which however require expensive…

  8. Sherlock: Reliable and efficient workflow execution 

    November 1, 2025

    With the increasing adoption of large language models (LLM), agentic workflows, which compose multiple LLM calls with tools, retrieval, and reasoning steps, are increasingly replacing traditional applications. However, such workflows are inherently error-prone: incorrect or partially correct output at one step can propagate or even…

  9. Generative Caching for Structurally Similar Prompts and Responses 

    November 1, 2025

    Large Language Models (LLMs) are increasingly being used to plan, reason, and execute tasks across various scenarios. Use cases like repeatable workflows, chatbots, and AI agents often involve recurring tasks and tend to reuse similar prompts when interacting with the LLM. This opens up opportunities…