Abstract

We describe recent progress in the field of prosodic modeling for
speaker verification. In a previous paper, we proposed a technique
for modeling syllable-based prosodic features that uses a multinomial
subspace model for feature extraction and within-class covariance
normalization or linear discriminant analysis for session variability
compensation. In this paper, we show that performance can
be significantly improved with the use of probabilistic linear discriminant
analysis (PLDA) for session variability compensation. This
system does not require score normalization. We report an equal error
rate below 7% on a NIST 2008 task. To our knowledge, this is the
best reported result to date for a prosodic system for speaker recognition.
Fusion of this system with a state-of-the-art acoustic baseline
system yields 10% relative improvement in the new detection cost
function (DCF) as defined by NIST.

‚Äč