Microsoft Research Blog

English

Serving Models, Fast and Slow:Optimizing Heterogeneous LLM Inferencing Workloads at Scale

June 8, 2026

Global cloud service providers handle inference workloads for Large Language Models (LLMs) that span latency-sensitive (e.g., chatbots) and insensitive (e.g., report writing) tasks, resulting in diverse and often conflicting Service Level Agreement (SLA) requirements. Managing such mixed workloads is challenging due to the complexity of…
Computer Vision in the Wild Workshop at CVPR 2026

June 3, 2026

Full workshop title: The 5th Workshop on Computer Vision in the Wild (CVinW): Towards Unified Multimodal Agents for Reasoning in the Wild Host conference: The Conference on Computer Vision and Pattern Recognition (CVPR) (opens in new tab) | June 3-4, 2026 Workshop organizers: Reuben Tan, Zhengyuan…
Enter, Exit, Page Fault, Leak: Testing Isolation Boundaries for Microarchitectural Leaks

May 1, 2026

CPUs provide isolation mechanisms like virtualization and privilege levels to protect software. Yet these focus on architectural isolation while typically overlooking microarchitectural side channels, exemplified by Meltdown and Foreshadow. Software must therefore supplement architectural defenses with ad-hoc microarchitectural patches, which are constantly evolving as new…
DroidSpeak: Efficient Context Sharing for Multiple-LLM Inference

May 1, 2026

Large Language Models (LLMs) are increasingly employed in complex workflows, where different LLMs and fine-tuned variants collaboratively address complex tasks. However, these systems face significant inefficiencies due to redundant context processing of the shared context. We propose DroidSpeak, a framework that optimizes context sharing between…
Harvesting Spare CPU Resources in Container Systems

May 1, 2026

Container platforms like Kubernetes are widely adopted for deploying latency-sensitive cloud services, and CPU resources for these containers are over-provisioned to ensure low 99th percentile tail latency under peak load. At the same time, cloud services exhibit bursty traffic patterns resulting in CPU usage variability…
Cognitive Load Estimation Using Brain Foundation Models and Interpretability for BCIs

May 1, 2026 | Deeksha M Shama, Dimitra Emmanouilidou, and Ivan Tashev

Figure 1: Overall pipeline of cognitive load estimation with brain foundation model (BFM) in adaptive training systems. We investigate the use of Brain Foundational Models (BFMs) for continuous cognitive-load monitoring and examine key challenges in scalability, generalization, interpretability. We apply BFMs to continuous cognitive-load estimation…
Concord: Learning Network Configuration Contracts

April 27, 2026

Misconfiguration is frequently cited as a leading cause of service disruptions and outages. To prevent misconfiguration, we introduce network contracts — lightweight configuration checks that run efficiently, localize errors to specific lines, and require no heavyweight modeling of network protocols. We develop a tool Concord…
EgoBrain: Synergizing Minds and Eyes For Human Action Understanding

April 23, 2026

The integration of brain-computer interfaces (BCIs), in particular electroencephalography (EEG), with artificial intelligence (AI) has shown tremendous promise in decoding human cognition and behavior from neural signals. In particular, the rise of multimodal AI models have brought new possibilities that have never been imagined before.…
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs

April 23, 2026

Reinforcement learning (RL) has become a cornerstone for enhancing the reasoning capabilities of large language models (LLMs), with recent innovations such as Group Relative Policy Optimization (GRPO) demonstrating exceptional effectiveness. In this study, we identify a critical yet underexplored issue in RL training: low-probability tokens…
Algorithm Generation via Creative Ideation

April 23, 2026 | Ruiying Ma, Chieh-Jan Mike Liang, Yanjie Gao, and Francis Y. Yan

Designing system algorithms remains challenging, where the discontinuous nature of the solution space often forces system engineers to rely on generic heuristics at the expense of performance. We study whether LLMs can practically drive algorithm generation, and find that they are biased towards well-known generic…
VidGuard-R1: AI-Generated Video Detection and Explanation via Reasoning MLLMs and RL

April 23, 2026

With the rapid advancement of AI-generated videos, there is an urgent need for effective detection tools to mitigate societal risks such as misinformation and reputational harm. In addition to accurate classification, it is essential that detection models provide interpretable explanations to ensure transparency for regulators…
Learning to Generate Unit Test via Adversarial Reinforcement Learning

April 1, 2026 | Dongjun Lee, Changho Hwang, and Kimin Lee

Unit testing is a core practice in programming, enabling systematic evaluation of programs produced by human developers or large language models (LLMs). Given the challenges in writing comprehensive unit tests, LLMs have been employed to automate unit test generation, yet methods for training LLMs to…

No results