Expandable Group Identification in Spreadsheets

Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE'18) |

DOI

Spreadsheets are widely used in various business tasks. Spread-sheet users may put similar data and computations by repeating a block of cells (a unit) in their spreadsheets. We name the unit and all its expanding ones as an expandable group. All units in an ex-pandable group share the same or similar formats and semantics. As a data storage and management tool, expandable groups repre-sent the fundamental structure in spreadsheets. However, existing spreadsheet systems do not recognize any expandable groups. Therefore, other spreadsheet analysis tools, e.g., data integration and fault detection, cannot utilize this structure of expandable groups to perform precise analysis.

In this paper, we propose ExpCheck to automatically extract ex-pandable groups in spreadsheets. We observe that continuous units that share the similar formats and semantics are likely to be an expandable group. Inspired by this, we inspect the format of each cell and its corresponding semantics, and further classify them into expandable groups according to their similarity. We evaluate ExpCheck on 120 spreadsheets randomly sampled from the EUSES and VEnron corpora. The experimental results show that ExpCheck is effective. ExpCheck successfully detect expanda-ble groups with F1-measure of 73.1%, significantly outperforming the state-of-the-art techniques (F1-measure of 13.3%).