Learning in Data Scarce Visual and Multimodal Applications Using Vectorized and Composable Representations
- Ajay Divakaran, Yi Yao | SRI International
The vision and learning group at CVT (Center for Vision Technologies) at SRI has developed a framework and a suite of algorithms for machine learning in data scarce conditions. We present our results on zero shot object detection as well as multi way retrieval for social media applications using multimodal embeddings. We will then present a novel spatiotemporal graph convolutional network that enables composable representations. We show results with activity detection.
-
-
Sudipta Sinha
Principal Researcher
-
-
Watch Next
-
GeoMind: A Multi-Agent Framework for Geospatial Decision Support
- Muhammad Sohail Danish
-
-
From Microfarms to the Moon: A Teen Innovator’s Journey in Robotics
- Pranav Kumar Redlapalli
-
-
DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
- Sadegh Aliakbarian,
- Tadas Baltrusaitis,
- Antonio Criminisi
-
VoluMe: Authentic 3D Video Calls from Live Gaussian Splat Prediction
- Antonio Criminisi,
- Charlie Hewitt,
- Marek Kowalski (HE/HIM)
-
Episode 6: Healthcare Agent Orchestrator
- Jonathan M. Carlson,
- Will Guyman,
- Matthew Lungren
-
Episode 7: The road ahead
- Jonathan M. Carlson,
- Will Guyman,
- Matthew Lungren
-
-
Microsoft Research India - The lab culture
- P. Anandan,
- Indrani Medhi Thies,
- B. Ashok