Abstract

Due to physiology and linguistic difference between speakers, the spectrumpattern for the same phoneme of two speakers can be quite dissimilar. Without appropriate alignment on the frequency axis, themisalignment will reduce themodeling efficiency resutling in performance degradation. In this paper, a novel data-driven framework is proposed to build the alignment of the frequency axes of two speakers. This alignment between two frequency axes is essentially a frequency domain correspondence of the two speakers. To establish the correspondence, we formulate the task as a global optimal matching problem. The local matching of frequency bins is achieved by comparing the local feature of the spectrogram along the frequency bins. The local feature is actually capturing the local pattern in the spectrogram. Given the local matching score, a dynamic programming is then applied to find the optimal correspondence. Experiments on TIMIT corpus and TIDIGITS corpus clearly show the effectiveness of this method.