A novel paradigm for nonlinear speech processing through local singularity analysis

  • Khalid Daoudi | INRIA-Bordeaux

The existence of nonlinear and turbulent phenomena in the speech production process has been theoretically and experimentally established. However, most of the current approaches in speech processing are based on linear techniques which basically rely on the linear source-filter model. These linear approaches cannot adequately capture all the complex dynamics of speech (despite their undeniable importance). For this reason, nonlinear speech processing has gained a significant attention in recent years.

Among the numerous attempts dedicated to the development of nonlinear methods and models for speech processing, a class have taken analogies from the study of turbulent flows and dynamical systems in statistical physics. I will start the talk by giving a brief overview of such methods and argue that they belong to the first phase of complex systems theory, where only global measurements of the degree of complexity may be achieved. This fact, added to the difficulty of the practical computation of such measurements, limits the usefulness and applications of these methods. For instance, signal classification such as voice pathology detection is the most widely used application.

Since the 90’s, a new phase in complex systems theory has emerged where it is now possible to quantify complexity in a geometrical and local manner. Within this framework, the GeoStat Group and its collaborators have developed the so called Microcanonical Multiscale Formalism (MFF) for natural image processing. In MMF, the relation between geometry and statistics is unlocked through the notion of local singularity/predictability exponents and system reconstructability. During the last 3 years, we have been conducting research attempting to adapt MMF to the particular case of speech signals, viewed as realizations of a complex system. A particular aspect of our strategy has been to study the potential of MMF in fundamental speech problems and to develop efficient and robust processing algorithms. I will show that by appropriate definition and estimation of singularity exponents, critical system transitions can be identified thus providing interesting descriptions of some speech dynamics and characteristics. As a consequence, we could achieve promising results and outperform state-of-the-art linear techniques in several speech applications, such as speech segmentation, GCI identification, sparse source modeling and coding. These promising results open the gap for many perspectives that we will discuss at the end of the talk.

Speaker Details

Khalid Daoudi received both the Master and the Ph.D. degrees in applied mathematics from University Paris 9 Dauphine in 1993 and 1996, respectively. His Ph.D. dissertation was prepared at INRIA-Rocquencourt, France. During 1997, he held a post-doctoral position at the Department of Mathematics, Ecole Polytechnique de Montréal, Canada. From December 1997 to July 1999, he held a post-doctoral position at the Stochastic Systems Group (SSG) of the Laboratory for Information and Decision Systems (LIDS), Massachusetts Institute of Technology (MIT), Cambridge, USA. In October 1999, he started a permanent position at INRIA, with the Speech Group of INRIA/LORIA. From October 2003 to February 2009, he was on leave at CNRS with the Samova team of IRIT in Toulouse. Since March 2009, he is at INRIA-Bordeaux where he co-founded the GeoStat team (http://geostat.bordeaux.inria.fr/). His research interests include Statistical Modeling and Estimation, Machine Learning, Multiscale signal processing, Speech processing.

Series: Microsoft Research Talks