Microsoft Research Blog

English

Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts

March 3, 2026

While large language models (LLMs) fine-tuned with lightweight adapters achieve strong performance across diverse tasks, their performance on individual tasks depends on the fine-tuning strategy. Fusing independently trained models with different strengths has shown promise for multi-task learning through three main strategies: ensembling, which combines…
Contextualized Privacy Defense for LLM Agents

March 3, 2026

LLM agents increasingly act on users'personal information, yet existing privacy defenses remain limited in both design and adaptability. Most prior approaches rely on static or passive defenses, such as prompting and guarding. These paradigms are insufficient for supporting contextual, proactive privacy decisions in multi-step agent…
Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use

March 3, 2026

Agentic language models operate in a fundamentally different safety regime than chat models: they must plan, call tools, and execute long-horizon actions where a single misstep, such as accessing files or entering credentials, can cause irreversible harm. Existing alignment methods, largely optimized for static generation…
Beyond Swahili: Designing Inclusive AI for Bantu Languages

March 2, 2026 | Alfred Malengo Kondoro

Swahili has become one of the most consistently represented African languages in modern AI benchmarks, spanning machine translation, language modeling, and multilingual evaluation suites, far exceeding the coverage of any other Bantu language. This prominence reflects its scale, standardization, and regional reach, but it also…
CARE: Towards Clinical Accountability in Multi-Modal Medical Reasoning with an Evidence-Grounded Agentic Framework

March 2, 2026

Large visual language models (VLMs) have shown strong multi-modal medical reasoning ability, but most operate as end-to-end black boxes, diverging from clinicians'evidence-based, staged workflows and hindering clinical accountability. Complementarily, expert visual grounding models can accurately localize regions of interest (ROIs), providing explicit, reliable evidence that…
Learning to Draft: Adaptive Speculative Decoding with Reinforcement Learning

March 2, 2026

Speculative decoding accelerates large language model (LLM) inference by using a small draft model to generate candidate tokens for a larger target model to verify. The efficacy of this technique hinges on the trade-off between the time spent on drafting candidates and verifying them. However,…
Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

March 2, 2026

LLM-based agents for machine learning engineering (MLE) predominantly rely on tree search, a form of gradient-free optimization that uses scalar validation scores to rank candidates. As LLM reasoning capabilities improve, exhaustive enumeration becomes increasingly inefficient compared to directed updates, analogous to how accurate gradients enable…
Advancing earth observation through machine learning: A TorchGeo tutorial

March 1, 2026

Earth observation machine learning pipelines differ fundamentally from standard computer vision workflows. Imagery is typically delivered as large, georeferenced scenes, labels may be raster masks or vector geometries in distinct coordinate reference systems, and both training and evaluation often require spatially aware sampling and splitting…
CineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation

March 1, 2026

Cinematic video production requires control over scene-subject composition and camera movement, but live-action shooting remains costly due to the need for constructing physical sets. To address this, we introduce the task of cinematic video generation with decoupled scene context: given multiple images of a static…
Better Thinking Through AI – 10 Strategies for Using AI to Improve Your Thinking

March 1, 2026 | Richard Banks and Sean Rintel

The world has spent the last year getting used to AI as a writing tool, a code assistant, a meeting summarizer. But AI's real power may lie somewhere more subtle — and more profound: it can help us think better. Thinking well is hard. It…
Individual Turing Test: A Case Study of LLM-based Simulation Using Longitudinal Personal Data

March 1, 2026

Large Language Models (LLMs) have demonstrated remarkable human-like capabilities, yet their ability to replicate a specific individual remains under-explored. This paper presents a case study to investigate LLM-based individual simulation with a volunteer-contributed archive of private messaging history spanning over ten years. Based on the…
From pixels to patches: Pooling strategies for earth embeddings

March 1, 2026 | Isaac Corley, Caleb Robinson, Inbal Becker-Reshef, and Juan M. Lavista Ferres

As geospatial foundation models shift from patch-level to pixel-level embeddings, practitioners must aggregate thousands of pixel vectors into patch representations that preserve class-discriminative signal while matching downstream label resolution. The default choice, mean pooling, discards within-patch variability and can reduce accuracy by more than 10%…

No results