Information Extraction Crossing Language, Robustness and Domain Barriers
- Imed Zitouni | Microsoft - Bing
Modern communication technologies have made massive amounts of real-time news information in several languages readily available. This led to the need to develop news-monitoring system that allows users to monitor multilingual news media in near real-time and search over stored content. One example of such a system is Translingual Automatic Language Exploration System, codenamed TALES. In this talk I will briefly describe the architecture of TALES and focus on its information extraction component. Information extraction is a crucial step toward understanding a text, as it identifies the important conceptual objects and relations between them in a discourse. I will address the portability of the used approach to different languages and show a method of propagating information into low resource languages from richer ones. Compared to other approaches that focuses on clean-text, I will also show the robustness of our technique to less-well-formed input. For example, information extraction in a multilingual broadcast processing system has to deal with inaccurate automatic transcription and translation. The resulting presence of non-target-language text in this case yields many false alarms, which raise the research problem of making information extraction robust to such noisy input text. If time permit, I will also discuss the application and adaptation of these techniques to health-care domain.
Speaker Details
Imed Zitouni joined Microsoft Bing recently. Imed was a research member of the IBM Multilingual NLP group since 2004. Before joining IBM, he was a scientist at a startup company DIALOCA in 98-99. He then joined Bell-Laboratories between 99 and 04 as a research staff member. He received his M.Sc. and Ph.D. with the highest-honors from the University-of-Nancy1 France in 1996 and 2000, respectively. In 1995, he obtained a MEng degree in computer science from ENSI, a prestigious national computer institute in Tunisia. His research interests include natural language processing, information retrieval, machine translation, spoken-dialogue-systems, speech-recognition and machine learning. He is a senior member of IEEE, member of the IEEE Speech and Language Processing Technical Committee (99-11), the Information Officer of the ACL SIG on Semitic-Languages, and a member of ISCA and ACL. He served as team-lead in several NLP projects at IBM and served as chair and reviewing-committee-member of several conferences and journals. He has also authored/co-authored more than 75 papers in international conferences and journals. Imed’s recent book is “Multilingual Natural Language Processing Application: from Theory to Practice”, by Prentice Hall.
-
-
Imed Zitouni
Principal Research Manager
-
Jeff Running
-
-
Series: Microsoft Research Talks
-
Decoding the Human Brain – A Neurosurgeon’s Experience
- Dr. Pascal O. Zinn
-
-
-
-
-
-
Challenges in Evolving a Successful Database Product (SQL Server) to a Cloud Service (SQL Azure)
- Hanuma Kodavalla,
- Phil Bernstein
-
Improving text prediction accuracy using neurophysiology
- Sophia Mehdizadeh
-
Tongue-Gesture Recognition in Head-Mounted Displays
- Tan Gemicioglu
-
DIABLo: a Deep Individual-Agnostic Binaural Localizer
- Shoken Kaneko
-
-
-
-
Audio-based Toxic Language Detection
- Midia Yousefi
-
-
From SqueezeNet to SqueezeBERT: Developing Efficient Deep Neural Networks
- Forrest Iandola,
- Sujeeth Bharadwaj
-
Hope Speech and Help Speech: Surfacing Positivity Amidst Hate
- Ashique Khudabukhsh
-
-
-
Towards Mainstream Brain-Computer Interfaces (BCIs)
- Brendan Allison
-
-
-
-
Learning Structured Models for Safe Robot Control
- Subramanian Ramamoorthy
-