Audio and acoustics

Microsoft Research Blog

Research Focus: Week of September 9, 2024

September 12, 2024 | Sara Abdali, Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Jinyu Li, Sheng Zhao, Naoyuki Kanda, Carmen Badea, Christian Bird, Tom Zimmermann, Rob DeLine, Nicole Forsgren, Denae Ford Robinson, Xenofon Foukas

Investigating vulnerabilities in LLMs; A novel total-duration-aware (TDA) duration model for text-to-speech (TTS); Generative expert metric system through iterative prompt priming; Integrity protection in 5G fronthaul networks:

Publication

Multi-label audio classification with a noisy zero-shot teacher

Sebastian Braun, Hannes Gamper

International Workshop on Acoustic Signal Enhancement (IWAENC) | September 2024

Project

Publication

Target conversation extraction: Source separation using turn-taking dynamics

Tuochao Chen, Qirui Wang, Bohan Wu, Malek Itani, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota

Interspeech 2024 | September 2024

Publication

Knowledge boosting during low-latency inference

Tuochao Chen, Malek Itani, Vidya Srinivas, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota

Interspeech 2024 | September 2024

Publication

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

Sefik Emre Eskimez, Xiaofei Wang, Manthan Thakker, Chung-Hsien Tsai, Canrun Li, Zhen Xiao, Hemin Yang, Zirun Zhu, Min Tang, Jinyu Li, Sheng Zhao, Naoyuki Kanda

Interspeech 2024 | September 2024

Video

Final intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact

July 18, 2024 | Benjamin Stahl, Hannes Gamper

Speaker: Benjamin StahlHost: Hannes Gamper In this talk, we explore advancements in computational models for speech quality assessment. Self-supervised learning models have emerged as powerful front-ends, outperforming supervised-only models. However, their large size renders them…

42:02

Publication

Autoregressive Speech Synthesis without Vector Quantization

Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

July 2024

Project

Publication

VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment

Bing Han, Long Zhou, Shujie Liu, Sanyuan Chen, Lingwei Meng, Yanming Qian, Yanqing Liu, Sheng Zhao, Jinyu Li, Furu Wei

June 2024

Project

Publication

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

ICML | June 2024

Project

TransVIP

Speech to Speech Translation System with Voice and Isochrony Preservation We introduce a novel model framework TransVIP that leverages diverse datasets in a cascade fashion yet facilitates end-to-end inference through joint probability. Furthermore, we propose…