News & features
Loading…
Microsoft Research Blog
MMCTAgent: Enabling multimodal reasoning over large video and image collections
| Akshay Nambi, Kavyansh Chourasia, and Tanuja Ganu
MMCTAgent enables dynamic multimodal reasoning with iterative planning and reflection. Built on Microsoft’s AutoGen framework, it integrates language, vision, and temporal understanding for complex tasks like long video and image analysis.
In the news | Microsoft Research Story
Advancing AI to meet needs of the global majority
AI tools can perform poorly in non-Western languages and lack critical cultural context for many populations. Project Gecko uses small language models to bring vital expertise to farmers in underserved areas using local languages and multi-modal content.