Empirical Software Engineering, Version 2.0

January 27, 2011
Tim Menzies | West Virginia University

The rapid pace of software development innovation challenges empirical software research to keep up, if it is to deliver actionable and useful results to practitioners. The empirical software engineering research field has not always been able to deliver this. Recently, it has become increasingly apparent that rigorous data collection and analysis can be so expensive and time-consuming that empirical software engineering studies, which seek to understand the costs and bene?ts of software development solutions in practice, greatly lag the pace of innovation in the ?eld. In too many cases, a trusted body of empirical results can only be built up after the innovative solutions that they are studying are already well on their way to obsolescence or standard practice. However, we argue that recent advances put a sustainable and increased research pace within our reach. A suitably scaled-up and nimble empirical research approach must be based upon:

The “crowdsourcing” of tough empirical problems. Ben Shneiderman advocates Science

2.0: a vast space of web-based data which everyone can analyze, and where anyone might ?nd important new insights.
The growth of the World Wide Web … continues to reorder whole disciplines and industries. … It is time for researchers in science to take network collaboration to the next phase and reap the potential intellectual and societal payoffs. [1] In Science 2.0, the pace of discovery and communication is increased by orders of magnitude over current practice [2]. A Science 2.0 approach to empirical software engineering addresses fundamental weaknesses in contemporary software engineering research.

Automated or computer-assisted approaches to data synthesis, analysis, and interpretation.
The ability to connect technical issues, data, and results back to the business drivers that affect an organization’s resource availability.
Low-cost, non-intrusive ways for: oGetting results to practitioners;

oAllowing practitioners to comment upon and refine the results;
oSuggesting what practitioners should do with this information.

This talk discusses each of these four areas and the technologies that make each possible, using real results from practice to illustrate the points. We furthermore suggest how these approaches can be used to better share and leverage results across the community of empirical researchers, which is necessary to enable scaling up to the tougher questions already appearing on the horizon.

Speaker Details

The presenter has direct experience in running large SE repositories of data (PROMISE [3]). This technical briefing will extrapolate from that experience to discuss the costs and benefits of Science 2.0 for SE. Tim Menzies (P.hD, UNSE) is an Assoc. Prof in CSEE at WVU and the author of over 200 referred publications. At WVU, he has been a lead researcher on projects for NSF, NIJ, DoD, NASA’s Office of Safety and Mission Assurance, as well as SBIRs and STTRs with private companies. He teaches data mining and artificial intelligence. Tim is the co-founder of the PROMISE conference series devoted to reproducible experiments in software engineering. In 2012, he will be the co-chair of the program committee for the IEEE Automated Software Engineering conference.

REFERENCES:

B. Shneiderman. Science 2.0. Science, 319(7):1349–1350, March 2008.
Andreas Zeller, keynote, MSR’07, see http://msr.uwaterloo.ca/msr2007/Empirical-SE-2.0-Zeller.pdf
Http://promisedata.org/data