Semantic Table Structure Identification in Spreadsheets
- Yakun Zhang ,
- Xiao Lv ,
- Haoyu Dong ,
- Wensheng Dou ,
- Shi Han ,
- Dongmei Zhang ,
- Jun Wei ,
- Ye Dan
ISSTA |
Research
ISSTA |
Spreadsheets are widely used in various business tasks, and contain amounts of valuable data. However, spreadsheet tables are usually organized in a semi-structured way, and contain complicated semantic structures, e.g., header types and relations among headers. Lack of documented semantic table structures, existing data analysis and error detection tools can hardly understand spreadsheet tables. Thus, identifying semantic table structures in spreadsheet tables can greatly promote various analysis tasks on spreadsheets.
In this paper, we propose Tasi (Table structure identification) to automatically identify semantic table structures in spreadsheets. Based on the contents, styles, and spatial locations in table headers, Tasi adopts a multi-classifier to predict potential header types and relations, and then integrates all header types and relations into consistent semantic table structures. We further propose TasiError, to detect spreadsheet errors based on the identified semantic table structures by Tasi. Our experiments on real-world spreadsheets show that, Tasi can precisely identify semantic table structures in spreadsheets, and TasiError can detect real-world spreadsheet errors with higher precision (75.2%) and recall (82.9%) than existing approaches.
한국마이크로소프트(유)
대표이사: 조원우
주소: (우)110-150 서울 종로구 종로1길 50 더 케이트윈타워 A동 12층
전화번호: 02-531-4500, 메일: ms-korea@microsoft.com
사업자등록번호: 120-81-05948 사업자정보확인
호스팅서비스 제공자: Microsoft Corporation
통신판매신고: 제2013-서울종로-1009호
사이버몰의 이용약관: Microsoft Store 판매 약관