Given a language model, can we tell whether it is truly reasoning, or if its performance owes only to pattern recognition and memorization?
编者按:大语言模型(LLMs)在语言生成与基础推理中已展现出强大的能力,但它们在数学解题上的能力仍存在明显短板,尤其是难以兼顾复杂计算与定理证明。这背后的关键原因在于,现有模型普遍依赖于单一的推理范式(如自然语言、代码或符号推理),缺乏人类思考问题时那种灵活的推理能力。 为此,微软亚洲研究院与清华大学联合提出了“推理链”(Chain-of-Reasoning, CoR)框架,引入了自然语言、代码与...
| Kathleen Sullivan and Amanda Craig Deckard
In the series finale, Amanda Craig Deckard returns to examine what Microsoft has learned about testing as a governance tool. She also explores the roles of rigor, standardization, and interpretability in testing and what’s next for Microsoft’s AI governance work.
编者按:面对信息密集、时长数小时的长视频内容,即便是当下最强大的大语言模型(LLMs)与视觉语言模型(VLMs)也难以轻松应对。为此,微软亚洲研究院提出了 Deep Video Discovery(DVD)智能体,通过推理驱动与工具协同,探索更高效、更智能的视频理解。在多个挑战性基准测试中,DVD 展现出领先的性能,进一步推动长视频理解迈向“可用”、“可控”的智能时代。 近年来,大语言模型(LLM...
Alex Lu, Stan Hua (opens in new tab), Lauren Erdman (opens in new tab) Artificial intelligence (AI) is transforming healthcare with applications from interpreting electronic healthcare records (opens in new tab) to detecting cancer from medical images. We ask: are…
编者按:欢迎阅读“科研上新”栏目!“科研上新”汇聚了微软亚洲研究院最新的创新成果与科研动态。在这里,你可以快速浏览研究院的亮点资讯,保持对前沿领域的敏锐嗅觉。 7月13日至7月19日,人工智能、机器学习领域全球顶级的学术盛会之一 ICML 在温哥华举办。来自微软亚洲研究院的多篇论文入选。上一期“ICML 上新”精选介绍了与决策模型相关的研究工作,内容涵盖强化学习、RLHF、扩散建模等方向。本期将聚...
| Shirley Wu, Michel Galley, Baolin Peng, Swadheen Shukla, and Jianfeng Gao
Recipient of an ICML 2025 Outstanding Paper Award, CollabLLM improves how LLMs collaborate with users, including knowing when to ask questions and how to adapt tone and communication style to different situations. This approach helps move AI toward more user-centric…
In the news | Microsoft Blog
Announcing Microsoft Elevate and the AI Economy Institute—to ensure that as AI transforms our world, we’re putting people first by equipping them with the skills, knowledge, and tools to thrive with AI.