As speech recognition matures and becomes more
practical in commercial English applications, localization has
quickly become the bottleneck for having more speech features.
Not only are some technologies highly language dependent, there
are simply not enough speech experts in the large number of
target languages to develop the data modules and investigate
potential performance related issues. This paper shows how data
driven methods like Utterance Classification (UC) successfully
address these major issues. Our experiments demonstrate that
UC performs as well as or better than hand crafted Context Free
Grammars (CFGs) for spontaneous Mandarin speech
understanding, even when applied without linguistic knowledge.
We also discuss two pragmatic modifications of the UC algorithm
adopted to handle multiple choice answers and to be more robust
to feature selections.