Word Segmentation In Sentence Analysis
- Andi Wu ,
- Zixin Jiang
Published by Tsinghua University Press
This paper presents a model of language processing where word segmentation is an integral part of sentence analysis. We show that the use of a parser can enable us to achieve the best ambiguity resolution in word segmentation. The lexical component of this model resolves most of the ambiguities, but the final disambiguation takes place in the parsing process. In this model, word segmentation is a by-product of sentence analysis, where the correct segmentation is represented by the leaves of a parse tree. We also show that the complexity usually associated with the use of a parser in segmentation can be reduced dramatically by using a dictionary that contains useful information on word segmentation. With the aid of such information, the sentence analysis process is reasonably fast and does not suffer from the problems other people have encountered. The model is implemented in NLPWin, the general-purpose language understanding system developed at Microsoft Research. A demo of the system is available.