Learning in Data Scarce Visual and Multimodal Applications Using Vectorized and Composable Representations

February 20, 2019
Ajay Divakaran, Yi Yao | SRI International

The vision and learning group at CVT (Center for Vision Technologies) at SRI has developed a framework and a suite of algorithms for machine learning in data scarce conditions. We present our results on zero shot object detection as well as multi way retrieval for social media applications using multimodal embeddings. We will then present a novel spatiotemporal graph convolutional network that enables composable representations. We show results with activity detection.

- Sudipta Sinha
  
  Principal Researcher
Research Area
- Algorithms
- Computer vision
Research Lab
- Microsoft Research Lab - Redmond

Watch Next

GeoMind: A Multi-Agent Framework for Geospatial Decision Support
January 28, 2026
Muhammad Sohail Danish
Microsoft Research India 2025 Highlights
December 31, 2025
From Microfarms to the Moon: A Teen Innovator’s Journey in Robotics
December 9, 2025
Pranav Kumar Redlapalli
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
August 12, 2025
Reuben Tan
DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
July 11, 2025
Sadegh Aliakbarian,

Tadas Baltrusaitis,

Antonio Criminisi

, et. al.
VoluMe: Authentic 3D Video Calls from Live Gaussian Splat Prediction
July 11, 2025
Antonio Criminisi,

Charlie Hewitt,

Marek Kowalski (HE/HIM)
Episode 6: Healthcare Agent Orchestrator
May 1, 2025
Jonathan M. Carlson,

Will Guyman,

Matthew Lungren

, et. al.
Episode 7: The road ahead
May 1, 2025
Jonathan M. Carlson,

Will Guyman,

Matthew Lungren

, et. al.
Microsoft Research India - The evolution
March 1, 2025
Venkat Padmanabhan,

P. Anandan,

Rick Rashid

, et. al.
Microsoft Research India - The lab culture
March 1, 2025
P. Anandan,

Indrani Medhi Thies,

B. Ashok

, et. al.