Abstract

This paper proposed a post-refining method with fine contextual-dependent GMMs for the auto-segmentation task. A GMM trained with a super feature vector extracted from multiple evenly spaced frames near the boundary is suggested to describe the waveform evolution across a boundary. CART is used to cluster acoustically similar GMMs, so that the GMM for each leaf node is reliably trained by the limited manually labeled boundaries. An accuracy of 90% is thus achieved when only 250 manually labeled sentences are provided to train the refining models.