Leveraging Syntactic Information for Text Normalization

  • Deborah A. Coughlin

Syntactic information provided by a broad-coverage parser aids text normalization. This paper introduces a text normalizer for text-to-speech (TTS) and language modeling that makes use of syntactic information to improve output quality. This normalizer takes in raw text and outputs text that has abbreviations, numerals, and symbols spelled out as words. Part-of-speech ambiguous abbreviations, ambiguous abbreviations in coordinated structures, and quantified measure abbreviations in text input can be correctly rewritten when syntactic information provided by the parser is considered.