Project Gecko

Nouvelles et reportages

Three white icons on a blue-to-purple gradient background: the first icon shows an image/photo; the second icon depicts a computer monitor with vertical bars; the third icon displays three connected circles with user silhouettes.

Blog de recherche Microsoft

MMCTAgent: Enabling multimodal reasoning over large video and image collections

novembre 12, 2025 | Akshay Nambi, Kavyansh Chourasia, et Tanuja Ganu

MMCTAgent enables dynamic multimodal reasoning with iterative planning and reflection. Built on Microsoft’s AutoGen framework, it integrates language, vision, and temporal understanding for complex tasks like long video and image analysis.