뉴스 & 기능
로딩 중…
Microsoft Research 블로그
MMCTAgent: Enabling multimodal reasoning over large video and image collections
| Akshay Nambi, Kavyansh Chourasia, 그리고 Tanuja Ganu
MMCTAgent enables dynamic multimodal reasoning with iterative planning and reflection. Built on Microsoft’s AutoGen framework, it integrates language, vision, and temporal understanding for complex tasks like long video and image analysis.