Collecting a Heap of Shapes

Proceedings of the International Symposium on Software Testing and Analysis, 2013 |

Published by ACM

Distinguished Paper Award

The program heap is fundamentally a simple mathematical concept – a set of objects and a connectivity relation on them. However, a large gap exists between the set of heap structures that could be constructed and those that programmers actually build. To understand this gap, we empirically study heap structures and sharing relations in large object-oriented programs. To scale and make sense of real world heaps, any analysis must employ abstraction; our abstraction groups sets of objects by role and the aliasing present in pointer sets. We fifind that the heaps of real-world programs are, in practice, fundamentally simple structures that are largely constructed from a small number of simple structures and sharing idioms, such as the sharing of immutable or unique (e.g. singleton) objects. For instance, we find that, under our abstraction, 53-75% of pointers build tree structures and we classify all but 7-18% of aliasing pointers. These results provide actionable information for rethinking the design of annotation systems, memory allocation/collection, and program analyses.