Abstract

Speaker verification is a technology of verifying the claimed identity of a speaker based on the speech signal from the speaker (voice print). To learn the score of similarity between each pair of target and trial utterances, we investigated two different discriminative learning frameworks: fisher mapping followed by SVM learning and utterance transform followed by Iterative Cohort Modeling (ICM). In both methods, a mapping is applied to map speech utterance from a variable-length acoustic feature sequence into a fixed dimensional vector. SVM learning constructs a classifier in the mapped vector space for speaker verification. ICM learns a metric in this vector space by incorporating discriminative learning methods. The obtained metric is then used by a Nearest Neighbor classifier for speaker verification. The experiments conducted on NIST02 corpus show that both discriminative learning methods outperform the baseline GMM-UBM system. Furthermore, we observe that the ICMbased method is more effective than the SVM-based method, indicating that the metric learning scheme is more powerful in constructing a better metric in the mapped vector space.