Emotion Recognition in Speech Signal: Experimental Study, Development and Applications
- Valery Petrushin | Accenture Technology Labs
In this talk I will overview my research on emotion expression and emotion recognition in speech signal and its applications. Two proprietary databases of emotional utterances were used in this research. The first database consists of 700 emotional utterances in English pronounced by 30 subjects portraying five emotional states: unemotional (normal), anger, happiness, sadness, and fear. The second database consists of 3660 emotional utterances in Russian by 61 subjects portraying the following six emotional states: unemotional, anger, happiness, sadness, fear and surprise. An experimental study has been conducted to determine how well people recognize emotions in speech. Based on the results of the experiment the most reliable utterances were selected for feature selection and for training recognizers. Several machine learning techniques have been applied to create recognition agents including k-nearest neighbor, neural networks, and ensembles of neural networks. The agents can recognize five emotional states with the following accuracy: normal or unemotional state – 55-75%, happiness – 60-70%, anger – 70-80%, sadness – 75-85%, and fear – 35-55%. The total average accuracy is about 70%. The agents can be adapted to a particular environment depending on parameters of speech signal and the number of target emotional states. For a practical application an agent has been created that is able to analyze telephone quality speech signal and distinguish between two emotional states (“agitation” which includes anger, happiness and fear, and “calm” which includes normal state and sadness) with the accuracy 77%. The agent was used as a part of a decision support system for prioritizing voice messages and assigning a proper human agent to response the message at call center environment.
I will also give a summary of other research topics in the lab including fast pitch-synchronous segmentation of speech signal, the use of speech analysis techniques for language learning and video clip recognition using a joint audio-visual model.
Speaker Details
Dr. Valery Petrushin received the Ph.D. degree in computer science from Glushkov Institute for Cybernetics, Kiev, USSR in 1983. He worked as a researcher (1983-1990) and then as a Director of the Intelligent Tutoring Systems Lab (1991-1994) at Glushkov Institute for Cybernetics where he did research in student modeling and adaptive knowledge testing using probabilistic inference approaches (Bayesian Believe Networks, Dampster-Shaffer approach). In 1994-1997 he worked as a researcher at the School of Computer Science at Georgia Tech where he did research in Web-based education using learning environments. At the same time he did research in gene recognition (using Markov Chains and Hidden Markov Models) collaborating with faculty at the School of Biology. Since 1997 he is a Sr. Researcher at Accenture Technology Labs in Chicago. His research interests include data mining and customer modeling, speech processing (voice quality, emotion recognition and synthesis, speaker separation, signal segmentation), speech and multimedia annotation and information retrieval, spoken language learning. Dr. Petrushin is the author of more than 150 papers, two books, and two textbooks for University level students. He also is the author of six US patents.
-
-
Jeff Running
-
-
Watch Next
-
-
-
Accelerating MRI image reconstruction with Tyger
- Karen Easterbrook,
- Ilyana Rosenberg
-
-
-
-
From Microfarms to the Moon: A Teen Innovator’s Journey in Robotics
- Pranav Kumar Redlapalli
-
-
-