Microsoft Research Blog

English

Slim-SC: Thought Pruning for Efficient Scaling with Self-Consistency

November 1, 2025

Recently, Test-Time Scaling (TTS) has gained increasing attention for improving LLM reasoning performance at test time without retraining the model. A notable TTS technique is Self-Consistency (SC), which generates multiple reasoning chains in parallel and selects the final answer via majority voting. While effective, the…
Avatars in mixed-reality meetings: A longitudinal field study of realistic versus cartoon facial likeness effects on communication, task satisfaction, presence, and emotional perception

November 1, 2025

We conducted a within-subjects study to examine how realistic faces and cartoon faces on avatars affect communication, task satisfaction, sense of presence, and mood perception in mixed reality meetings. Over the course of two weeks, six groups of co-workers (14 people) held recurring meetings using…
Group Preference Alignment: Customized LLM Response Generation from In-Situ Conversations

November 1, 2025

LLMs often fail to meet the specialized needs of distinct user groups due to their one-size-fits-all training paradigm \cite{lucy-etal-2024-one} and there is limited research on what personalization aspects each group expect. To address these limitations, we propose a group-aware personalization framework, Group Preference Alignment (GPA),…
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration

November 1, 2025

Text-to-video generation models have shown significant progress in the recent years. However, they still struggle with generating complex dynamic scenes based on compositional text prompts, such as attribute binding for multiple objects, temporal dynamics associated with different objects, and interactions between objects. Our key motivation…
Mitigate One, Skew Another? Tackling Intersectional Biases in Text-to-Image Models

November 1, 2025

The biases exhibited by text-to-image (TTI) models are often treated as independent, though in reality, they may be deeply interrelated. Addressing bias along one dimension—such as ethnicity or age—can inadvertently affect another, like gender, either mitigating or exacerbating existing disparities. Understanding these interdependencies is crucial…
Image as a World: Generating Interactive World from Single Image via Panoramic Video Generation

November 1, 2025 | Dongnan Gui, Xun Guo, Wengang Zhou, and Yan Lu

Generating an interactive visual world from a single image is both challenging and practically valuable, as single-view inputs are easy to acquire and align well with prompt-driven applications such as gaming and virtual reality. This paper introduces a novel unified framework, Image as a World…
Rad-Phi4-Vision-CXR: A Compact Multimodal Assistant for Versatile Radiology Workflows

November 1, 2025 | Mercy Ranjit and Tanuja Ganu

The integration of artificial intelligence into radiology underscores the need for efficient models capable of supporting a wide range of clinical tasks. We introduce Rad-Phi4-Vision-CXR, a compact multimodal vision-language model designed to seamlessly integrate into radiology workflows for chest X-rays. It supports radiology report generation,…
Supporting Industry Computing Researchers in Assessing, Articulating, and Addressing the Potential Negative Societal Impact of Their Work

November 1, 2025 | Wesley Hanwen Deng, Solon Barocas, and Jennifer Wortman Vaughan

Recent years have witnessed increasing calls for computing researchers to grapple with the societal impacts of their work. Tools such as impact assessments have gained prominence as a method to uncover potential impacts, and a number of publication venues now encourage authors to include an…
Table-Specialist: Language Model Specialists for Tables using Iterative Fine-tuning

November 1, 2025

Language models such as GPT and Llama have shown remarkable ability on diverse natural language tasks, yet their performance on complex table tasks (e.g., NL-to-Code, data cleaning, etc.) continue to be sub-optimal. To improve their performance, task-specific fine-tuning is often needed, which however require expensive…
Sherlock: Reliable and efficient workflow execution

November 1, 2025

With the increasing adoption of large language models (LLM), agentic workflows, which compose multiple LLM calls with tools, retrieval, and reasoning steps, are increasingly replacing traditional applications. However, such workflows are inherently error-prone: incorrect or partially correct output at one step can propagate or even…
From Measurement to Expertise: Empathetic Expert Adapters for Context-Based Empathy in Conversational AI Agents

November 1, 2025

Empathy is a critical factor in fostering positive user experiences in conversational AI. While models can display empathy, it is often generic rather than tailored to specific tasks and contexts. In this work, we introduce a novel framework for developing and evaluating context-specific empathetic large…
Generative Caching for Structurally Similar Prompts and Responses

November 1, 2025

Large Language Models (LLMs) are increasingly being used to plan, reason, and execute tasks across various scenarios. Use cases like repeatable workflows, chatbots, and AI agents often involve recurring tasks and tend to reuse similar prompts when interacting with the LLM. This opens up opportunities…

No results