Understanding and Inferring Units in Spreadsheets

IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) |

Published by IEEE

DOI | PDF

Numbers in spreadsheets often have units: metres, grams, dollars, etc. Spreadsheet cells typically cannot carry unit information, and even where they can, users may not be motivated to provide it. However, unit information is extremely valuable: it allows us to detect and prevent an entire class of spreadsheet errors, such as accidentally adding values of different units. What if we could infer the unit of any value in a spreadsheet, with little or no work from the user?

We present a novel method for predicting units and dimensions in spreadsheets, the first such method that combines logical constraint solving and probabilistic unit labelling. Our approach identifies and formalises the critical cells in spreadsheets that bound the user cost of unit annotation. Separately, we apply machine learning to infer probabilistic unit labels from table headers. To contextualise the accuracy of our system, we discuss the attention investment trade-off for unit inference.