Automatic Key Term Extraction from Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features

  • Yun-Nung Chen ,
  • Yu Huang ,
  • Sheng-Yi Kong ,
  • Lin-Shan Lee

Proceedings of The 3rd IEEE Workshop on Spoken Language Technology (SLT 2010) |

Published by IEEE - Institute of Electrical and Electronics Engineers

Best Student Paper Award (2/~150; < 2%)

This paper proposes a set of approaches to automatically extract key terms from spoken course lectures including audio signals, ASR transcriptions and slides. We divide the key terms into two types: key phrases and keywords and develop different approaches to extract them in order. We extract key phrases using right/left branching entropy and extract keywords by learning from three sets of features: prosodic features, lexical features and semantic features from Probabilistic Latent Semantic Analysis (PLSA). The learning approaches include an unsupervised method (K-means exemplar) and two supervised ones (AdaBoost and neural network). Very encouraging preliminary results were obtained with a corpus of course lectures, and it is found that all approaches and all sets of features proposed here are useful.