编者按:在虚拟数字人技术飞速发展的今天,如何让 3D 头像拥有真实感与表现力,始终是计算机视觉与图形学领域的核心挑战之一。微软亚洲研究院最新提出的 VASA-3D 技术,实现了从单张肖像照片生成可实时驱动的逼真的 3D 说话头像,不仅突破了传统方法对多视角数据的依赖,更将情绪表现力和面部微表情细腻度提升至全新高度。该工作已被 NeurIPS 2025 接收。 从视频会议中的虚拟形象,到元宇宙里的数...
Computer-use agents are AI systems that autonomously navigate and interact with software applications through graphical user interfaces (GUIs), and they are emerging as a new capability in artificial intelligence. By navigating and manipulating the same visual interfaces that people use,…
编者按:当“更大、更快、更高效”的 AI 计算成为业界追求的方向,在硬件革新之外,系统软件层的突破同样关键。微软亚洲研究院与爱丁堡大学联合提出的 WaferLLM 正是这一探索的重要尝试。该研究聚焦于晶圆级 AI 计算平台的系统软件优化,从架构原理到推理性能全面重构,为 AI 计算的未来提供新的视角。相关论文已被 OSDI 2025 接收。 随着 AI 模型规模持续膨胀、计算复杂度急剧上升,传统芯...
In the news | LinkedIn Article
Around the world, the dangers of extreme weather are a daily reality. In 2024, extreme weather displaced or disrupted the lives of more than 800,000 people worldwide —a reminder that accurate, timely forecasts aren’t just about data; they’re about people. From farmers deciding when to plant to coastal communities preparing for hurricanes, better forecasting can save lives,…
In the news | Microsoft Sustainably Speaking
AI is transforming conservation in ways that were out of reach just a few years ago. The biodiversity crisis spans continents and species, so our tools must be open, adaptable, and collaborative—able to move from one habitat or use case…
In recent years, as the shift toward agentic AI has accelerated, automation has advanced to handle increasingly complex tasks, from document and code generation to image creation, visual understanding, and mathematical reasoning. This trend points to the growing need to…
| Akshay Nambi, Kavyansh Chourasia, and Tanuja Ganu
MMCTAgent enables dynamic multimodal reasoning with iterative planning and reflection. Built on Microsoft’s AutoGen framework, it integrates language, vision, and temporal understanding for complex tasks like long video and image analysis.
AI tools can perform poorly in non-Western languages and lack critical cultural context for many populations. Project Gecko uses small language models to bring vital expertise to farmers in underserved areas using local languages and multi-modal content.
In the news | Microsoft Research Story
AI tools can perform poorly in non-Western languages and lack critical cultural context for many populations. Project Gecko uses small language models to bring vital expertise to farmers in underserved areas using local languages and multi-modal content.