Decomposition of Speech and Sound into Modulators and Carriers

  • Les Atlas | University of Washington

…musical tones are the simpler and more regular elements of the sensations of hearing, and that we have consequently first to study the laws and peculiarities of this class of sensations.

— Hermann von Helmholtz, On the Sensations of Tone as a Physiological Basis for the Theory of Music, 2nd English Edition (A. Ellis), translated from the 4th German Edition of 1877, Longman Green, London, 1885, Page 7.

It has been 135 years since this passage was written, yet we still have no formal foundation for going beyond what Helmholtz brilliantly saw as the building blocks he called “musical tones,” which we now simply call “frequency.” Helmholtz also saw that “beats of simple tones” and “beats due to combinational tones” or “differential tones” [op. cit., Page 159.] formed sum and difference beats. We now call the generalization of this effect “modulations” or “envelopes.”
Since the time of Helmholtz, science and technology has developed radio and then very high-speed digital communications, revolutionizing the way we now live. Concepts from the 1920’s to 1930’s AM and FM radio communications still provide a perhaps outdated foundation. Researchers conventionally model the above modulations as “envelopes,” which multiply “carriers” or, equivalently, “temporal fine structure.” These envelopes as typically derived after subband filtering, are Hilbert envelopes, squared and lowpass filtered real envelopes, or, with perhaps the closest connection to physiology, rectified and lowpass filtered real envelopes. Yet as will be argued, science’s current foundation and methods for envelopes and temporal fine structure is still not as advanced as Helmholtz was with single tones and harmonics.
Our talk will begin with demonstrations of simple two-complexes which have identical envelopes yet sound obviously different. We will show, assuming sufficiently low rate envelopes, how important it is to remove this ambiguity, especially for speech. We will then suggest how a novel modulator/carrier decomposition, which takes into account the common types of dynamic content seen in speech and sound, counteracts this ambiguity. New conceptual results, in conjunction with auditory filters taking the role of frequency multiplexing as in OFDM in modern high speech data communications, raise new questions about potent roles of temporal fine structure in everyday audio and speech. These results suggest novel features for recognition of speech in noise, reverberation, and/or multiple simultaneous talkers.

Speaker Details

Les Atlas received his M.S. and Ph.D. degrees in Electrical Engineering from Stanford University in 1979 and 1984, respectively. He joined the University of Washington in 1984, where he is a Professor of Electrical Engineering. Professor Atlas received a 2004 Fulbright Senior Research Scholar and a 2012 Virginia Merrill Bloedel Scholar Award. He is a Fellow of the IEEE “for contributions to time-varying spectral analysis and acoustical signal processing,” Prof. Atlas has presented invited tutorials on demodulation signal processing at IEEE signal processing and other large international conferences, such as Eurospeech. His current research is on complementary statistics and demodulation theory for signal processing, especially for acoustics.

    • Portrait of Jeff Running

      Jeff Running

Series: Microsoft Research Talks