Abstract

Traditionally, the Universal Background Model (UBM) is viewed as the background model of the entire acoustic feature space. We propose a novel interpretation of the UBM model, and consider it as a mapping function that transforms the variable length observations (speech utterances) into a fixed dimensional feature vector (sufficient statistics). After this mapping, a similarity measurement is computed on the fixed dimensional features. With this novel interpretation, we proposed a new similarity measurement which produces more than 10% relative improvement over the conventional UBM-MAP framework in both equal error rate and detection cost function.