In this talk I will cover the results of my recent PhD work on Software Process Recovery: the recovery of software development processes from evidence that developers leave behind. Towards the end of the talk I will show explicitly how Software Process Recovery could be leveraged and applied by Microsoft and other companies.
Developing a large software system involves many complicated, varied, and inter-dependent tasks, and these tasks are typically implemented using a combination of defined processes, semi-automated tools, and ad hoc practices. Stakeholders in the development process — including software developers, managers, and customers — often want to be able to track the actual practices being employed within a project. For example, a customer may wish to be sure that the process is ISO 9000 compliant, a manager may wish to track the amount of testing that has been done in the current iteration, and a developer may wish to determine who has recently been working on a subsystem that has had several major bugs appear in it.
However, extracting the software development processes from an existing project is expensive if one must rely upon manual inspection of artifacts and interviews of developers and their managers. Previously, researchers have suggested the live observation and instrumentation of a project to allow for more measurement, but this is costly, invasive, and also requires a live running project.
In this work, we propose an approach that we call software process recovery, that is based on after-the-fact analysis of various kinds of software development artifacts. We use a variety of supervised and unsupervised techniques from machine learning, topic analysis, natural language processing, and statistics on software repositories such as version control systems, bug trackers, and mailing list archives. We show how we can combine all of these methods to recover process signals that we map back to software development processes such as the Unified Process. The Unified Process has been visualized using a time-line view that shows effort per parallel discipline occurring across time. This visualization is called the Unified Process diagram. We use this diagram as inspiration to produce Recovered Unified Process Views (RUPV) that are a concrete version of this theoretical Unified Process diagram. We then validate these methods using case studies of multiple open source software systems.