TUTA: Tree-based Transformers for Generally Structured Table Pre-training

Tables are widely used with various structures for data presentation and management. Current research works mainly focus on relational tables yet neglect other common table structures. In this paper, we propose TUTA, a unified pre-training architecture for understanding generally structured tables. Noticing that understanding a table requires spatial, hierarchical, and semantic information, so, we enhance transformers with three core structure-aware mechanisms. First, we propose a novel tree-based structure, called a bi-dimensional coordinate tree, to describe both the spatial and hierarchical information in generally structured tables. Next, we propose tree-based attention and tree-based position embedding to better capture the spatial and hierarchical information. Moreover, to capture table information progressively, we devise three pre-training objectives to enable representations at the token, cell, and table levels. We pre-train TUTA on a large volume of unlabeled tables and fine-tune it on two critical tasks in the field of table semantic structure understanding: cell type classification and table type classification. Experiment results show that TUTA is highly effective, achieving state-of-the-art on five widely-studied datasets.