Abstract

It has been shown that standard cepstral speaker recognition models
can be enhanced by region-constrained models, where features
are extracted only from certain speech regions defined by linguistic
or prosodic criteria. Such region-constrained models can capture
features that are more stable, highly idiosyncratic, or simply complementary
to the baseline system. In this paper we ask if another major
class of speaker recognition models, those based on MLLR speaker
adaptation transforms, can also benefit from region-constrained feature
extraction. In our approach, we define regions based on phonetic
and prosodic criteria, based on automatic speech recognition
output, and performMLLR estimation using only frames selected by
these criteria. The resulting transform features are appended to those
of a state-of-the-art MLLR speaker recognition system and jointly
modeled by SVMs. Multiple regions can be added in this fashion.
We find consistent gains over the baseline system in the SRE2010
speaker verification task.

‚Äč