Abstract

A technique for high-accuracy tracking of formants or vocal
tract resonances is presented in this paper using a novel nonlinear
predictor and using a target-directed temporal constraint.
The nonlinear predictor is constructed from a parameter-free,
discrete mapping function from the formant (frequencies and
bandwidths) space to the LPC-cepstral space, with trainable
residuals. We examine in this study the key role of vocal tract
resonance targets in the tracking accuracy. Experimental results
show that due to the use of the targets, the tracked formants in
the consonantal regions (including closures and short pauses)
of the speech utterance exhibit the same dynamic properties as
for the vocalic regions, and reflect the underlying vocal tract
resonances. The results also demonstrate the effectiveness of
training the prediction-residual parameters and of incorporating
the target-based constraint in obtaining high-accuracy formant
estimates, especially for non-sonorant portions of speech.

‚Äč