Project Gecko

新闻与深度文章

Three white icons on a blue-to-purple gradient background: the first icon shows an image/photo; the second icon depicts a computer monitor with vertical bars; the third icon displays three connected circles with user silhouettes.

微软研究院博客

MMCTAgent: Enabling multimodal reasoning over large video and image collections

2025年11月12日 | Akshay Nambi, Kavyansh Chourasia, 和 Tanuja Ganu

MMCTAgent enables dynamic multimodal reasoning with iterative planning and reflection. Built on Microsoft’s AutoGen framework, it integrates language, vision, and temporal understanding for complex tasks like long video and image analysis.