Automated Refactoring of Nested-IF Formulae in Spreadsheets

Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (FSE'18 industrial track) |

DOI

Spreadsheets are the most popular end-user programming software, where formulae act like programs and also have smells. One well recognized smell is the use of nested-IF expressions, which have low readability and high cognitive cost for users, and are error-prone during reuse or maintenance. End users usually lack essential programming language knowledge and skills to tackle or even realize this problem, yet no automatic approaches are currently available.

This paper proposes the rst exploration of the nest-if usage status against two large-scale spreadsheet corpora containing over 80,000 industry-level spreadsheets. It turns out the use of nested-IF expressions are surprisingly common among end users. We then present an approach to tackling this problem through automatic formula refactoring. The general idea of the automatic approach is two-fold. First, we detect and remove logic redundancy based on the AST of a formula. Second, we identify higher-level semantics that have been represented with fragmented and scattered syntax, and reassemble the syntax using concise built-in functions. A comprehensive evaluation with over 28 million nested-IF formulae reveals that the approach is able to relieve the smell of over 90% of nested-IF formulae.