ECBD: Evidence-Centered Benchmark Design for NLP
Final intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact
Speaker: Benjamin StahlHost: Hannes Gamper In this talk, we explore advancements in computational models for speech quality assessment. Self-supervised learning models have emerged as powerful front-ends, outperforming supervised-only models. However, their large size renders them…
Advances in Natural Language Generation for Indian Languages
Much of recent progress for natural language generation (NLG) has been in the context of English and, in general, high resource languages, however, Indian languages have yet to see similar paradigm shifts despite their speaking…
MInference: Million-Tokens Prompt Inference for Long-context LLMs
Million-Tokens Prompt Inference for Long-context LLMs MInference 1.0 leverages the dynamic sparse nature of LLMs’ attention, which exhibits some static patterns, to speed up the pre-filling for long-context LLMs. It first determines offline which sparse pattern…
EmoCtrl-TTS
Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech EmoCtrl-TTS is an emotion-controllable zero-shot TTS that can generate highly emotional speech with non-verbal vocalizations such as laughter and crying for any speaker. EmoCtrl-TTS is purely a…
Research Focus: Week of June 24, 2024
In this issue: RENC makes 5G vRAN servers more energy efficient; CoExplorer uses AI to keep video meetings on track; Automatic bug detection in LLM-powered text-based games; MAIRA-2: Grounded radiology report generation.
E2 TTS
Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS E2 TTS (Embarrassingly Easy TTS) is a fully non-autoregressive zero-shot text-to-speech (TTS) system capable of generating the voice of any speaker. Despite its extremely simple model architecture and training…