编者按:在虚拟数字人技术飞速发展的今天,如何让 3D 头像拥有真实感与表现力,始终是计算机视觉与图形学领域的核心挑战之一。微软亚洲研究院最新提出的 VASA-3D 技术,实现了从单张肖像照片生成可实时驱动的逼真的 3D 说话头像,不仅突破了传统方法对多视角数据的依赖,更将情绪表现力和面部微表情细腻度提升至全新高度。该工作已被 NeurIPS 2025 接收。 从视频会议中的虚拟形象,到元宇宙里的数...
In the news | European Disability Forum
The future of technology and disability The afternoon started with a presentation by Dr Cecily Morrison, Senior Principal Researcher Manager, Microsoft Research Cambridge. Dr Morrison leads a diverse team to ensure technological solutions are inclusive. She explained some examples of practical…
Computer-use agents are AI systems that autonomously navigate and interact with software applications through graphical user interfaces (GUIs), and they are emerging as a new capability in artificial intelligence. By navigating and manipulating the same visual interfaces that people use,…
In the news | BBC Science
Dream engineering has been named the most powerful idea of the 21st century so far by BBC Science Focus (opens in new tab) experts, marking a profound shift in how science understands dreaming. Once considered subjective and largely inaccessible to experimentation, dreams…
编者按:当“更大、更快、更高效”的 AI 计算成为业界追求的方向,在硬件革新之外,系统软件层的突破同样关键。微软亚洲研究院与爱丁堡大学联合提出的 WaferLLM 正是这一探索的重要尝试。该研究聚焦于晶圆级 AI 计算平台的系统软件优化,从架构原理到推理性能全面重构,为 AI 计算的未来提供新的视角。相关论文已被 OSDI 2025 接收。 随着 AI 模型规模持续膨胀、计算复杂度急剧上升,传统芯...
In the news | Microsoft Sustainably Speaking
AI is transforming conservation in ways that were out of reach just a few years ago. The biodiversity crisis spans continents and species, so our tools must be open, adaptable, and collaborative—able to move from one habitat or use case…
In the news | LinkedIn Article
Around the world, the dangers of extreme weather are a daily reality. In 2024, extreme weather displaced or disrupted the lives of more than 800,000 (opens in new tab) people worldwide —a reminder that accurate, timely forecasts aren’t just about data; they’re about people. From farmers deciding when to plant to coastal communities preparing for hurricanes, better…
In recent years, as the shift toward agentic AI has accelerated, automation has advanced to handle increasingly complex tasks, from document and code generation to image creation, visual understanding, and mathematical reasoning. This trend points to the growing need to…
| Akshay Nambi, Kavyansh Chourasia, and Tanuja Ganu
MMCTAgent enables dynamic multimodal reasoning with iterative planning and reflection. Built on Microsoft’s AutoGen framework, it integrates language, vision, and temporal understanding for complex tasks like long video and image analysis.