Microsoft Research Blog

English

Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models

March 8, 2026

Modern code generation models exhibit longer outputs, accelerated capability growth, and changed training dynamics, rendering traditional training methodologies, algorithms, and datasets ineffective for improving their performance. To address these training bottlenecks, we propose MicroCoder-GRPO, an improved Group Relative Policy Optimization approach with three innovations: conditional…
Catalyst Lab

March 6, 2026

Catalyst Lab works end-to-end on high-value, high-uncertainty problems from foundational theory to production code Catalyst Lab advances foundational ideas and builds them into end-to-end systems. We work on ambitious technical challenges that benefit from tight iteration between research and execution. That means developing new ideas,…
CROSS — Leveraging AI ASICs for Homomorphic Encryption

March 6, 2026 | Jianming Tong

Artificial Intelligence (AI) is driving a new industrial revolution, transforming human workflows increasingly into digital tokens, i.e., tokenizing the entire world. However, this transformation exposes sensitive data at an unprecedented scale, leading to heavy privacy breaches that stalled AI's adoption. Homomorphic Encryption (HE) provides strong data…
Efficient Distributed Orthonormal Optimizers for Large-Scale Training

March 6, 2026 | Kwangjun Ahn

Kwangjun delivered a 50-minute technical talk on recent advances in orthonormal update methods for large-scale AI model training. This topic has been rapidly gaining attention in the community, emerging as a strong successor to AdamW following the success of orthonormal optimizers in training production-scale models…
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs

March 6, 2026

What happens when a storyteller forgets its own story? Large Language Models (LLMs) can now generate narratives spanning tens of thousands of words, but they often fail to maintain consistency throughout. When generating long-form narratives, these models can contradict their own established facts, character traits,…
LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis

March 6, 2026

GPU design space exploration (DSE) for modern AI workloads, such as Large-Language Model (LLM) inference, is challenging because of GPUs'vast, multi-modal design spaces, high simulation costs, and complex design optimization objectives (e.g. performance, power and area trade-offs). Existing automated DSE methods are often prohibitively expensive,…
Latent Policy Steering through One-Step Flow Policies

March 5, 2026 | Hokyun Im, Andrey Kolobov, Jianlong Fu, and Youngwoon Lee

Offline reinforcement learning (RL) allows robots to learn from offline datasets without risky exploration. Yet, offline RL's performance often hinges on a brittle trade-off between (1) return maximization, which can push policies outside the dataset support, and (2) behavioral constraints, which typically require sensitive hyperparameter…
Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity

March 5, 2026

Semi-structured N:M sparsity and low-bit quantization (e.g., 1.58-bit BitNet) are two promising approaches for improving the efficiency of large language models (LLMs), yet they have largely been studied in isolation. In this work, we investigate their interaction and show that 1.58-bit BitNet is naturally more…
SlideSparse: Fast and Flexible (2N-2):2N Structured Sparsity

March 5, 2026

NVIDIA's 2:4 Sparse Tensor Cores deliver 2x throughput but demand strict 50% pruning -- a ratio that collapses LLM reasoning accuracy (Qwen3: 54% to 15%). Milder $(2N-2):2N$ patterns (e.g., 6:8, 25% pruning) preserve accuracy yet receive no hardware support, falling back to dense execution without…
Scaling Agentic Capabilities, Not Context: Efficient Reinforcement Finetuning for Large Toolspaces

March 5, 2026

Agentic systems operating over large tool ecosystems must plan and execute long-horizon workflows under weak or non-verifiable supervision. While frontier models mitigate these challenges through scale and large context budgets, small language models (SLMs) remain brittle: eager tool loading saturates context, execution errors compound over…
Research Intern – AI Safety and Security

March 4, 2026

Protecting large language models (LLMs) from malicious inputs is critical. LLMs can also be used to protect users from malicious attacks. The Deep Learning Team in Microsoft Research – Redmond is seeking Research Interns interested in the areas of LLM safety or using LLMs for…
Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model

March 4, 2026

We are pleased to announce Phi-4-reasoning-vision-15B, a 15 billion parameter open‑weight multimodal reasoning model, available through Microsoft Foundry (opens in new tab), HuggingFace (opens in new tab) and GitHub (opens in new tab). Phi-4-reasoning-vision-15B is a broadly capable model that can be used for a…

No results