Microsoft Research Blog

English

  1. Breaking Training Bottlenecks: Effective and Stable Reinforcement Learning for Coding Models 

    March 8, 2026

    Modern code generation models exhibit longer outputs, accelerated capability growth, and changed training dynamics, rendering traditional training methodologies, algorithms, and datasets ineffective for improving their performance. To address these training bottlenecks, we propose MicroCoder-GRPO, an improved Group Relative Policy Optimization approach with three innovations: conditional…

  2. Catalyst Lab | GraghRAG image

    Catalyst Lab 

    March 6, 2026

    Catalyst Lab works end-to-end on high-value, high-uncertainty problems from foundational theory to production code Catalyst Lab advances foundational ideas and builds them into end-to-end systems. We work on ambitious technical challenges that benefit from tight iteration between research and execution. That means developing new ideas,…

  3. graphical user interface, text

    CROSS — Leveraging AI ASICs for Homomorphic Encryption 

    March 6, 2026 | Jianming Tong

    Artificial Intelligence (AI) is driving a new industrial revolution, transforming human workflows increasingly into digital tokens, i.e., tokenizing the entire world. However, this transformation exposes sensitive data at an unprecedented scale, leading to heavy privacy breaches that stalled AI's adoption. Homomorphic Encryption (HE) provides strong data…

  4. graphical user interface, text, application

    Efficient Distributed Orthonormal Optimizers for Large-Scale Training 

    March 6, 2026 | Kwangjun Ahn

    Kwangjun delivered a 50-minute technical talk on recent advances in orthonormal update methods for large-scale AI model training. This topic has been rapidly gaining attention in the community, emerging as a strong successor to AdamW following the success of orthonormal optimizers in training production-scale models…

  5. Lost in Stories: Consistency Bugs in Long Story Generation by LLMs 

    March 6, 2026

    What happens when a storyteller forgets its own story? Large Language Models (LLMs) can now generate narratives spanning tens of thousands of words, but they often fail to maintain consistency throughout. When generating long-form narratives, these models can contradict their own established facts, character traits,…

  6. LUMINA: LLM-Guided GPU Architecture Exploration via Bottleneck Analysis 

    March 6, 2026

    GPU design space exploration (DSE) for modern AI workloads, such as Large-Language Model (LLM) inference, is challenging because of GPUs'vast, multi-modal design spaces, high simulation costs, and complex design optimization objectives (e.g. performance, power and area trade-offs). Existing automated DSE methods are often prohibitively expensive,…

  7. Latent Policy Steering through One-Step Flow Policies 

    March 5, 2026 | Hokyun Im, Andrey Kolobov, Jianlong Fu, and Youngwoon Lee

    Offline reinforcement learning (RL) allows robots to learn from offline datasets without risky exploration. Yet, offline RL's performance often hinges on a brittle trade-off between (1) return maximization, which can push policies outside the dataset support, and (2) behavioral constraints, which typically require sensitive hyperparameter…

  8. Sparse-BitNet: 1.58-bit LLMs are Naturally Friendly to Semi-Structured Sparsity 

    March 5, 2026

    Semi-structured N:M sparsity and low-bit quantization (e.g., 1.58-bit BitNet) are two promising approaches for improving the efficiency of large language models (LLMs), yet they have largely been studied in isolation. In this work, we investigate their interaction and show that 1.58-bit BitNet is naturally more…

  9. SlideSparse: Fast and Flexible (2N-2):2N Structured Sparsity 

    March 5, 2026

    NVIDIA's 2:4 Sparse Tensor Cores deliver 2x throughput but demand strict 50% pruning -- a ratio that collapses LLM reasoning accuracy (Qwen3: 54% to 15%). Milder $(2N-2):2N$ patterns (e.g., 6:8, 25% pruning) preserve accuracy yet receive no hardware support, falling back to dense execution without…

  10. Research Intern – AI Safety and Security 

    March 4, 2026

    Protecting large language models (LLMs) from malicious inputs is critical. LLMs can also be used to protect users from malicious attacks. The Deep Learning Team in Microsoft Research – Redmond is seeking Research Interns interested in the areas of LLM safety or using LLMs for…

  11. White line icons against a blue-green gradient background form an architecture flow chart. In the middle of the chart is a three-by-three matrix of circles and lines within a round-edge square. Above the matrix, three icons in a row – an equation, a person using a desktop, and a head with gears flow by dotted lines to the matrix. To the left of the matrix is an icon representing a stack of files with an arrow pointing to the matrix. To the right of the matrix is a graph with a double headed arrow pointing to the matrix and to itself. Below the matrix is an icon representing a document. A dotted line arrow connects this graph to the matrix, showing the direction flowing from the matrix to the document. To the right of the document icon is an hourglass icon and three list icons with a dotted line connecting the hourglass to the lists.

    Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model 

    March 4, 2026

    We are pleased to announce Phi-4-reasoning-vision-15B, a 15 billion parameter open‑weight multimodal reasoning model, available through Microsoft Foundry (opens in new tab), HuggingFace (opens in new tab) and GitHub (opens in new tab). Phi-4-reasoning-vision-15B is a broadly capable model that can be used for a…