Hairmony: Fairness-aware hairstyle classification
We present a method for prediction of a person’s hairstyle from a single image. Despite growing use cases in user digitization and enrollment for virtual experiences, available methods are limited, particularly in the range of…
Look Ma, no markers: holistic performance capture without the hassle
We tackle the problem of highly-accurate, holistic performance capture for the face, body and hands simultaneously. Motion-capture technologies used in film and game production typically focus only on face, body or hand capture independently, involve…
OmniParser V2
OmniParser is an advanced vision-based screen parsing module that converts user interface (UI) screenshots into structured elements, allowing agents to execute actions across various applications using visual data . By harnessing large vision-language model capabilities,…
Interactive Multimodal Futures (IMF)
Interactive Multimodal Futures focuses on creating interactive systems and experiences that blend the richness and complexity of people and their real, physical world with advanced technology. We seek to leverage multimodal generative AI models that…