Statistically-Enhanced New Word Identification in a Rule-Based Chinese System

  • Andi Wu ,
  • Zixin Jiang

Published by Association for Computational Linguistics

Publication

This paper presents a mechanism of new word identification in Chinese text where probabilities are used to filter candidate character strings and to assign POS to the selected strings in a ruled-based system. This mechanism avoids the sparse data problem of pure statistical approaches and the over-generation problem of rule-based approaches. It improves parser coverage and provides a tool for the lexical acquisition of new words.