Abstract

Voice search technology has been successfully applied to help drivers reply SMS messages in automobiles, in which a predefined SMS message template set is searched with ASR hypotheses to form the reply candidate list. In order to efficiently organize the SMS message template set and improve the quality of the reply candidate list, we proposed to apply n-gram translation model and logistic regression to detect paraphrase SMS messages. Both of the proposed algorithms outperform the edit distance based paraphrase detection baseline, brining 40.9% and 50.5% EER reduction (relative), respectively.