SpeechPad: Multimodal Text Entry on Mobile Devices


August 26, 2005


Bo-June (Paul) Hsu


Massachusetts Institute of Technology and Microsoft Research


As the prevalence of SMS/IM increases on cell phones and other mobile devices, the need for efficient and robust text entry interfaces increases. By combining speech and keypad inputs, the SpeechPad prototype application demonstrates the design of a multimodal interface that improves text entry rate on keyboard-less mobile devices. The use of the familiar per word entry interface increases the perceived accuracy and hides recognition latency. Furthermore, the use of an efficient T9-based filtering of recognition lattice and n-gram results achieves graceful degradation as recognition accuracy decreases.

In this talk, we will examine the designs, algorithms, and implementation considerations of this multimodal text entry interface. We will also show the Pocket PC SpeechPad prototype that some have labeled as the next killer app for speech recognition.


Bo-June (Paul) Hsu

Bo-June (Paul) Hsu is a Ph.D. student at Massachusetts Institute of Technology working with James Glass in the Spoken Language Systems group. He is currently a research intern working with Milind Mahajan on multimodal text entry interfaces in the Microsoft Speech Research Group. After receiving his B.A. and M.S. degrees in Electrical Engineering and Computer Science from Harvard University in 2000, Paul joined the Microsoft Speech Components Group and worked as a software design engineer on speech recognizer front-end and APIs before starting his Ph.D. program at MIT four years later. His research interests include multimodal speech interface and spoken query retrieval.