E2 TTS
Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS E2 TTS (Embarrassingly Easy TTS) is a fully non-autoregressive zero-shot text-to-speech (TTS) system capable of generating the voice of any speaker. Despite its extremely simple model architecture and training…
Making Sentence Embeddings Robust to User-Generated Content
This seminar was hosted by Microsoft Research Africa, Nairobi together with the Microsoft AI for Good team in May 2024. User-generated content (UGC), e.g. social media posts written in “Internet language”, presents a lot of…
Insights into the Challenges and Opportunities of Large Multi-Modal Models for Blind and Low Vision Users: CLIP
Daniela Massiceti delves into the transformative potential of multimodal models such as CLIP for assistive technologies. Specifically focusing on the blind/low-vision community, the talk explores the current distance from realizing this potential and the advancements…
Panel: Generative AI for Global Impact: Challenges and Opportunities
Microsoft researchers discuss the challenges and opportunities of making AI more inclusive and impactful for everyone—from data that represents a broader range of communities and cultures to novel use cases for AI that are globally…
DOSA
A dataset of social artifacts from different Indian geographical subcultures. This repo hosts the code to run experiments on the DOSA dataset.