Natural Language Processing From Scratch


December 3, 2009


Ronan Collobert


NEC Labs


We will describe recent advances in deep learning techniques for
Natural Language Processing (NLP). Traditional NLP approaches favour
shallow systems, possibly cascaded, with adequate hand-crafted
features. In this work we purposefully try to disregard domain-
specific knowledge in favor of large-scale semi-supervised end-to-end
learning. Our systems include several feature layers, with increasing
abstraction level at each layer, that is, a multi-layer neural
network. We will describe training techniques that easily scale to a
billion of unlabeled words. We will discuss multi-tasking different
tasks and end-to-end structured output learning. We will demonstrate
state-of-the-art accuracies with considerable speedups.


Ronan Collobert

Ronan Collobert received his master degree in pure mathematics from University of Rennes (France) in 2000. He then performed machine-learning graduate studies in University of Montreal and IDIAP (Switzerland) under the Bengio brothers, and received his PhD in 2004 from University of Paris VI. He joined NEC Labs (USA) in January 2005 as a postdoc, and became a research staff member after about one year. His research interests always focused on large-scale machine-learning algorithms, with a particular interest in semi-supervised learning and “deep” learning architectures. Two years ago, his research shifted in the natural language processing area, slowly going towards automatic text understanding.