Clone Detection via Structural Abstraction
- William S. Evans ,
- Christopher W. Fraser
MSR-TR-2005-104 |
This paper describes the design, implementation, and application of a new algorithm to detect cloned code. It operates on the abstract syntax trees formed by many compilers as an intermediate representation. It extends prior work by identifying clones even when arbitrary subtrees have been changed. On a 16,000-line code corpus, 20-50% of its clones eluded previous methods. The method also identifies cloning in declarations, so it is somewhat more general than conventional procedural abstraction.