| Sidharth Sinha, Anson Bastos, Xuchao Zhang, Akshay Nambi, Rujia Wang, and Chetan Bansal
Deploying large language models (LLMs) in real-world, high-stakes settings is harder than it should be. In high-stakes settings like law, medicine, and cloud incident response, performance and reliability can quickly break down because adapting models to domain-specific requirements is a…
By Ziyi Liu (opens in new tab), Bahar Sarrafzadeh, Pei Zhou, Longqi Yang (opens in new tab), Ashish Sharma Imagine you are in a high-stakes group discussion, stuck in a circular argument with no consensus in sight. Now, imagine an AI agent sitting at that…
By Corby Rosset, Pratyusha Sharma, Andrew Zhao, Miguel Gonzalez-Fernandez, Ahmed Awadallah We share lessons learned from building a best-in-class verifier for computer use agent trajectories on the web, called the Universal Verifier. False positive rates drop to near zero (vs.…
| Doug Burger, Amy Luers, and Ishai Menache
Doug Burger, sustainability expert Amy Luers, and optimization researcher Ishai Menache examine the global emissions implications of datacenter operations, efficiency gains, and AI's potential across electrification, materials, and food systems.
Microsoft transformed cloud fulfillment with the Intelligent Fulfillment Service (IFS), which integrates machine learning, optimization, and generative AI.
| Jaime Teevan, Sonia Jaffe, Rebecca Janssen, Nancy Baym, Siân Lindley, Bahar Sarrafzadeh, Brent Hecht, Jenna Butler, Jake Hofman, and Sean Rintel
For the past five years, the New Future of Work report has captured how work is changing. This year, the shift feels especially sharp. Previous editions have focused on technology’s role in increasing productivity by automating tasks, accelerating communication, and…
| Jaime Teevan, Jenna Butler, Jake Hofman, and Rebecca Janssen
Microsoft Chief Scientist Jaime Teevan and researchers Jenna Butler, Jake Hofman, and Rebecca Janssen unpack the New Future of Work Report 2025 and explore the ideal AI-driven working world. Plus, is AI a tool or a collaborator? And why the answer matters.
Vasilis Kontonis, Yuchen Zeng, Shivam Garg, Lingjiao Chen, Hao Tang, Ziyan Wang, Ahmed Awadallah, Eric Horvitz, John Langford, Dimitris Papailiopoulos We taught models to compress their own chain-of-thought mid-generation. Peak KV cache drops 2–3x, throughput nearly doubles, and the…
近年来,图像生成模型的飞速发展令人瞩目。从早期的通用图像生成,到如今逐步迈向更具实用价值的视觉内容创作,这一领域正经历从“好看”到“好用”的关键跃迁。然而,在繁荣表象之下,一个核心挑战正日益凸显:现有主流评测基准仍以自然图像为主,缺乏面向商业设计场景的系统性评估,无法有效衡量模型在结构化和多重约束下的表现。 与通用图像相比,商业视觉文档往往包含高密度文本、复杂版式结构以及多种视觉元素的协同布局,其...