Determining Similarity and Inferring Relations in a Lexical Knowledge Base
This dissertation describes the creation of a large-scale, richly structured lexical knowledge base (LKB) from complex structures of labeled semantic relations. These structures were automatically extracted using a natural language parser from the definitions and example sentences contained in two machine readable dictionaries. The structures were then completely inverted and propagated across all of the relevant headwords in the dictionaries to create the LKB. A method is described for efficiently accessing salient paths of semantic relations between words in the LKB using weights assigned to those paths. The weights are based on a unique computation called averaged vertex probability. Extended paths, created by joining sub-paths from two different semantic relation structures, are allowed in order to increase the coverage of the information in the LKB. A novel procedure is used to determine the similarity between words in the LKB based on the patterns of the semantic relation paths connecting those words. The patterns were obtained by extensive training using word pairs from an online thesaurus and a specially created anti-thesaurus. The similarity procedure and the path accessing mechanism are used in a procedure to infer semantic relations that are not explicitly stored in the LKB. In particular, the utility of such inferences is discussed in the context of disambiguating phrasal attachments in a natural language understanding system. Quantitative results indicate that the size and coverage of the LKB created in this research and the effectiveness of the methods for accessing explicit and implicit information contained therein represent significant progress toward the development of a truly broad-coverage semantic component for natural language processing.