Learning in Data Scarce Visual and Multimodal Applications Using Vectorized and Composable Representations

  • Ajay Divakaran, Yi Yao | SRI International

The vision and learning group at CVT (Center for Vision Technologies) at SRI has developed a framework and a suite of algorithms for machine learning in data scarce conditions. We present our results on zero shot object detection as well as multi way retrieval for social media applications using multimodal embeddings. We will then present a novel spatiotemporal graph convolutional network that enables composable representations. We show results with activity detection.

[Slides]