In a recent study, we proposed soft margin estimation (SME) to
learn parameters of continuous density hidden Markov models
(HMMs). Our earlier experiments with connect digit recognition
have shown that SME offers great advantages over other state-ofthe-
art discriminative training methods. In this paper, we illustrate
SME from a perspective of statistical learning theory and show that
by including a margin in formulating the SME objective function it
is capable of directly minimizing the approximate test risk, while
most other training methods intent to minimize only the empirical
risks. We test SME on the 5k-word Wall Street Journal task, and
find the proposed approach achieves a relative word error rate
reduction of about 10% over our best baseline results in different
experimental configurations. We believe this is the first attempt to
show the effectiveness of margin-based acoustic modeling for large
vocabulary continuous speech recognition. We also expect further
performance improvements in the future because the approximate
test risk minimization principle offers a flexible and yet rigorous
framework to facilitate easy incorporation of new margin-based
optimization criteria into HMM training.