Multi-Modal Interaction Group

Mission

The mission of Multi-Modal Interaction (MMI) Group is to develop state-of-the-art technologies and industry-leading product solutions for multimodal interaction, including Automatic Speech Recognition (ASR) and Text-to-Speech (TTS), Optical Character Recognition (OCR) and Document Understanding (DU), Ink Analysis and Recognition, Gesture Recognition. The current focus of our research is on visual document intelligence, including universal OCR, universal math OCR, universal table understanding, universal layout analysis, universal information extraction, and synthetic data generation. By deploying the above technologies in Azure Read cognitive service, Azure Form Recognizer applied AI service and AI Builder in Power Platform, Microsoft has been empowering numerous customers to achieve more by unlocking information hidden within image/PDF documents to enable process automation, knowledge mining, and other industry-specific document intelligence applications.