Microsoft Research Blog

Artificial intelligence

“Chat, Should I Leave Him?” Risks, Rewards, and Roles for AI in Relationship Advice

April 1, 2026 | Emily Tseng and Calvin A. Liang

As more people turn to chatbots for socioemotional support—often termed psychosocial AI—the stakes of understanding these interactions grow. Psychosocial AI might foster healthier human-human relationships—and also might exacerbate loneliness, abuse, and self-harm. We provide an empirical account of one less-studied facet: seeking AI advice on…
Evaluating LLM Reasoning Beyond Correctness and CoT

February 12, 2026 | Soheil Abbasloo

What does it truly mean for a language model to “reason”? Current evaluations reward models’ correct standalone answers—but correctness alone reveals little about the process that produced them. We argue that reasoning should be understood not as a static chain of steps but as a…
AgentRx: Diagnosing AI Agent Failures from Execution Trajectories

February 2, 2026

AI agents often fail in ways that are difficult to localize because executions are probabilistic, long-horizon, multi-agent, and mediated by noisy tool outputs. We address this gap by manually annotating failed agent runs and release a novel benchmark of 115 failed trajectories spanning structured API…
RedCodeAgent: Automatic Red-teaming Agent against Diverse Code Agents

February 1, 2026

Code agents have gained widespread adoption due to their strong code generation capabilities and integration with code interpreters, enabling dynamic execution, debugging, and interactive programming capabilities. While these advancements have streamlined complex workflows, they have also introduced critical safety and security risks. Current static safety…
When does predictive inverse dynamics outperform behavior cloning?

January 29, 2026

Behavior cloning (BC) is a practical offline imitation learning method, but it often fails when expert demonstrations are limited. Recent works have introduced a class of architectures named predictive inverse dynamics models (PIDM) that combine a future state predictor with an inverse dynamics model (IDM).…
Scaling medical imaging report generation with multimodal reinforcement learning

January 23, 2026

Frontier models have demonstrated remarkable capabilities in understanding and reasoning with natural-language text, but they still exhibit major competency gaps in multimodal understanding and reasoning especially in high-value verticals such as biomedicine. Medical imaging report generation is a prominent example. Supervised fine-tuning can substantially improve…
SALAD-VAE: Semantic Audio Compression with Language-Audio Distillation

January 1, 2026 | Sebastian Braun, Hannes Gamper, and Dimitra Emmanouilidou

Modern generative and multimodal models increasingly rely on compact latent representations that trade and balance semantic richness with high-fidelity reconstruction. We introduce SALAD-VAE, a continuous and highly compact semantic Audio Variational Autoencoder, which operates in the frequency domain and achieves state-of-the-art compression with very low…
Towards Real-Time Generative Speech Restoration with Flow-Matching

January 1, 2026 | Tsun-An Hsieh and Sebastian Braun

Generative models have shown robust performance on speech enhancement and restoration tasks, but most prior approaches operate offline with high latency, making them unsuitable for streaming applications. In this work, we investigate the feasibility of a low-latency, real-time generative speech restoration system based on flow-matching…
Sci-Phi: A Large Language Model Spatial Audio Descriptor

January 1, 2026 | Xilin Jiang, Sebastian Braun, and Hannes Gamper

Acoustic scene perception involves describing the type of sounds, their timing, their direction and distance, as well as their loudness and reverberation. While audio language models excel in sound recognition, single-channel input fundamentally limits spatial understanding. This work presents Sci-Phi, a spatial audio large language…
Adapting Language Models for Low-Resource Programming Languages

December 20, 2025

Large Language Models (LLMs) have achieved remarkable success in code generation, yet their capabilities remain predominantly concentrated in well-resourced programming languages such as Python and Java. In contrast, low-resource programming languages present a significant challenge due to limited available data and unique syntax features. In…
Multimodal AI generates virtual population for tumor microenvironment modeling

December 9, 2025

The tumor immune microenvironment (TIME) critically impacts cancer progression and immunotherapy response. Multiplex immunofluorescence (mIF) is a powerful imaging modality for deciphering TIME, but its applicability is limited by high cost and low throughput. We propose GigaTIME, a multimodal AI framework for population-scale TIME modeling by…
SimSort: A Data-Driven Framework for Spike Sorting by Large-Scale Electrophysiology Simulation

December 5, 2025

Spike sorting is an essential process in neural recording, which identifies and separates electrical signals from individual neurons recorded by electrodes in the brain, enabling researchers to study how specific neurons communicate and process information. Although there exist a number of spike sorting methods which…

No results