A New Pre-training Method for Training Deep Learning Models with Application to Spoken Language Understanding

  • Asli Celikyilmaz ,
  • Ruhi Sarikaya ,
  • Dilek Hakkani-Tür ,
  • Xiaohu Liu ,
  • Nikhil Ramesh ,
  • Gokhan Tur

Proceedings of The 17th Annual Meeting of the International Speech Communication Association (INTERSPEECH 2016) |

We propose a simple and efficient approach for pre-training
deep learning models with application to slot filling tasks in
spoken language understanding. The proposed approach leverages
unlabeled data to train the models and is generic enough to
work with any deep learning model. In this study, we consider
the CNN2CRF architecture that contains Convolutional Neural
Network (CNN) with Conditional Random Fields (CRF) as
top layer, since it has shown great potential for learning useful
representations for supervised sequence learning tasks. The
proposed pre-training approach with this architecture learns the
feature representations from both labeled and unlabeled data at
the CNN layer, covering features that would not be observed
in limited labeled data. At the CRF layer, the unlabeled data
uses predicted classes of words as latent sequence labels together
with labeled sequences. Latent labeled sequences, in
principle, has the regularization effect on the labeled sequences,
yielding a better generalized model. This allows the network to
learn representations that are useful for not only slot tagging
using labeled data but also learning dependencies both within
and between latent clusters of unseen words. The proposed
pre-training method with the CRF2CNN architecture achieves
significant gains with respect to the strongest semi-supervised
baseline.