Abstract

Large organizations like Microsoft tend to rely on formal requirements documentation in order to specify and design the software products that they develop. These documents are meant to be tightly coupled with the actual implementation of the features they describe. In this paper we evaluate the value of high-level topic-based requirements traceability and issue report traceability in the version control system, using Latent Dirichlet Allocation (LDA). We evaluate LDA topics on practitioners and check if the topics and trends extracted match the perception that industrial Program Managers and Developers have about the effort put into addressing certain topics. We then replicate this study again on Open Source Developers using issue reports from issue trackers instead of requirements, confirming our previous industrial conclusions. We found that efforts extracted as commits from version control systems relevant to a topic often matched the perception of the managers and developers of what actually occurred at that time. Furthermore we found evidence that many of the identified topics made sense to practitioners and matched their perception of what occurred. But for some topics, we found that practitioners had difficulty interpreting and labelling them. In summary, we investigate the high-level traceability of requirements topics and issue/bug report topics to version control commits via topic analysis and validate with the actual stakeholders the relevance of these topics extracted from requirements and issues.