TaxiXNLI
This repository contains necessary data associated with the Analyzing the Effects of Reasoning Types on Cross-Lingual Transfer Performance, published in EMNLP 2021 Multilingual Representation Learning workshop. This data is a multi-lingual extension of ID 3860 released dataset. Similar to the ID 3397 and 3860, the broad goal for this project is to research and develop Neuro-Symbolic systems for Natural Language Inferencing (NLI) to leverage the "correctness" guarantees and interpretability of symbolic systems in neural network inference. In ID 3860, we proposed TaxiNLI data where we annotated an already public NLI dataset (MultiNLI) with the types of reasoning required to solve each example. Multi-NLI, each example has a premise, hypothesis sentence and a label. In TaxiNLI, the annotations will only add 0s/1s against reasoning categories such as lexical, syntactic, spatial etc. In this data, we translate the TaxiNLI dataset automatically using Bing to Spanish, French, Russian, Hindi, Arabic, Vietnamese, Chinese, Swahili, and Urdu. And, we also use another public dataset (XNLI), sample 1.4k XNLI examples, and annotate with a few selected interesting categories (Negation, Boolean, Spatial, Causal, Temporal, Knowledge).