Statistically-Enhanced New Word Identification in a Rule-Based Chinese System

Andi Wu; Zixin Jiang

Statistically-Enhanced New Word Identification in a Rule-Based Chinese System

Andi Wu ,
Zixin Jiang

November 2000

Published by Association for Computational Linguistics

Publication

Download BibTex

This paper presents a mechanism of new word identification in Chinese text where probabilities are used to filter candidate character strings and to assign POS to the selected strings in a ruled-based system. This mechanism avoids the sparse data problem of pure statistical approaches and the over-generation problem of rule-based approaches. It improves parser coverage and provides a tool for the lexical acquisition of new words.