Microsoft Research Blog

English

  1. VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents 

    October 18, 2025

    A key challenge in training Vision-Language Model (VLM) agents, compared to Language Model (LLM) agents, lies in the shift from textual states to complex visual observations. This transition introduces partial observability and demands robust world modeling. We ask: Can VLM agents construct internal world models…

  2. chart, bar chart

    Distant conversational speech recognition: Challenges and Opportunities 

    October 17, 2025 | Dr. Samuele Cornell and Sunit Sivasankaran

    State-of-the-art ASR systems excel on close-talk benchmarks but struggle with far-field conversational speech, where error rates remain above 20%. Current benchmark datasets inadequately assess generalization across domains and real-world conditions, often relying on oracle segmentation that yields overly optimistic results. Distant ASR (DASR) faces unique…

  3. Ultra Ethernet for next-generation AI and HPC workloads 

    October 17, 2025 | Torsten Hoefler, Abdul Kabbani, and Sujata Banerjee

    The Ultra Ethernet Consortium set out to redefine Ethernet-based interconnects for AI and high-performance computing (HPC), culminating in the recent release of its first specification (version 1.0). This talk will highlight key innovations that distinguish Ultra Ethernet from existing solutions, ranging from lossy operation—both with…

  4. graphical user interface

    BRAIN SIGNALS TO ACTION: Monitoring and Explaining User Cognitive Load with Foundation Models 

    Passive monitoring of cognitive load can enable personalized user experiences and even accelerate human learning by leveraging closed-loop adaptive training systems. Electroencephalography (EEG) provides a cost-effective, non-invasive window into brain activity, yet conventional methods struggle with cross-subject variability. Leveraging the power of large pretrained brain…

  5. IronDict: Transparent Dictionaries from Polynomial Commitments 

    October 17, 2025 | Hossein Hafezi and Melissa Chase

    We present IronDict, a transparent dictionary construction based on polynomial commitment schemes. Transparent dictionaries enable an untrusted server to maintain a mutable dictionary and provably serve clients lookup queries. A major open challenge is supporting efficient auditing by lightweight clients. Previous solutions either incurred high…

  6. graphical user interface, text, application

    FOA Tokenizer: Learning Discrete Representations of Spatial Audio with Multichannel VQ-GAN 

    October 17, 2025 | Parthasaarathy Sudarsanam and Hannes Gamper

    Spatial audio captures the directional and environmental characteristics of sound, enabling immersive listening experiences. First-Order Ambisonics (FOA) provides a compact representation of spatial audio by encoding the sound field’s directional components across four channels, allowing full-scene coverage independent of microphone array geometry. A key advantage…

  7. diagram

    Efficient Secure Aggregation for Federated Learning 

    October 17, 2025 | Varun  Madathil and Melissa Chase

    Federated Learning (FL) trains a global model by having each selected device push only its model update to a central server, keeping raw data local. However, those updates can still leak sensitive information unless the server learns only their sum. A naïve approach is to run…

  8. Microsoft Research Asia — StarLeap Program 

    October 16, 2025

    The StarLeap Program, launched by Microsoft Research Asia (MSRA), is designed to provide exceptional students with the opportunity to collaborate with multiple research teams at MSRA and to address real-world, frontier research challenges. Since its establishment in January 2021, the program has received enthusiastic responses…

  9. 5 maturity levles: Latent, Emerging, Developing, Realizing, Leading

    Towards a Responsible AI Organizational Maturity Model 

    October 16, 2025

    Artificial intelligence (AI) holds tremendous potential but also poses consequential risks. Regulation frameworks like the EU AI Act aim to mitigate these risks, yet organizations struggle to understand and operationalize Responsible AI (RAI). We introduce the RAI Organizational Maturity (RAI-OM) framework as an initial step…