The data set made available by the PASCAL Recognizing Textual Entailment Challenge provides a great opportunity to focus on the very difficult task of determining whether one sentence (the hypothesis, H) is entailed by another (the text, T). In RTE-1 (2005), we submitted an analysis of the test data with the purpose of isolating the set of T-H pairs whose categorization could be accurately predicted based solely on syntactic cues (Vanderwende and Dolan, 2005). Furthermore, the intent of our analysis was to isolate the impact of syntactic analysis in the limit, and not of any given parser. We therefore relied on human annotators to decide whether syntactic information from an idealized parser would be sufficient to make a judgment. We found that 34% of the test items could be handled by syntax, including basic alternations. We found that 48% of the test items could be handled by syntax plus a general purpose thesaurus. Given that the test data is split evenly between entailments that are True and False, an accuracy of 74% is in principle achievable for a system with access to a general purpose thesaurus, if the system guesses randomly on what it cannot determine using syntax. With these numbers as our goal, we have developed MENT (Microsoft ENTailment), a system that predicts entailment using syntactic features and a general purpose thesaurus, in addition to an overall alignment score. MENT takes as its premise that it is easier for a syntactic system to predict False entailments, following the observation in Vanderwende and Dolan (2005) that 243/800 test items could be determined to be False using syntax and thesaurus, while only roughly
half as many, 147/800, could be determined as True.