{"id":199828,"date":"2012-10-27T10:15:47","date_gmt":"2012-10-27T10:15:47","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/events\/dapse13-international-workshop-on-data-analysis-patterns-in-software-engineering\/"},"modified":"2025-08-06T12:02:12","modified_gmt":"2025-08-06T19:02:12","slug":"dapse13","status":"publish","type":"msr-event","link":"https:\/\/www.microsoft.com\/en-us\/research\/event\/dapse13\/","title":{"rendered":"DAPSE\u201913: International Workshop on Data Analysis Patterns in Software Engineering"},"content":{"rendered":"\n\n<p>Tuesday, May 21, 2013<br \/>\n<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"http:\/\/www.sanfranciscoregency.hyatt.com\/hyatt\/hotels\/index.jsp\">Hyatt Regency San Francisco<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\n5 Embarcadero Center (<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"http:\/\/binged.it\/yXP3z7\">Map<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>)<br \/>\nSan Francisco, California, USA 94111<\/p>\n<p>Workshop in conjunction with the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/2013.icse-conferences.org\/\" target=\"_blank\">ICSE 2013<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0conference.<\/p>\n<h2>Important Dates<\/h2>\n<p>Workshop paper submissions due<br \/>\nFebruary 7, 2013 (archival papers)<\/p>\n<p>Notification of authors<br \/>\nFebruary 28, 2013<\/p>\n<p>Camera-ready copies<br \/>\nMarch 7, 2013<\/p>\n<p>Non-archival submissions accepted until<br \/>\nApril 24, 2013<\/p>\n<h2>Submission Site<\/h2>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"https:\/\/www.easychair.org\/conferences\/?conf=dapse2013\">https:\/\/www.easychair.org\/conferences\/?conf=dapse2013<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<div class=\"conM \">\n<p>Data scientists in software engineering seek insight in data collected from software projects to improve software development. The demand for data scientists with domain knowledge in software development is growing rapidly and there is already a shortage of such data scientists.<\/p>\n<p>Data science is a skilled art with a steep learning curve. To shorten that learning curve, this workshop will collect best practices in form of data analysis patterns, that is, analyses of data that leads to meaningful conclusions and can be reused for comparable data. In the workshop we will compile a catalog of such patterns that will help both experienced and emerging data scientists to better communicate about data analysis. The workshop is intended for anyone interested in how to analyze data correctly and efficiently in a community accepted way.<\/p>\n<\/div>\n<div class=\"conM \">\n<h2>Workshop Program<\/h2>\n<p>8:30 &#8211; 9:00 Introductions and discussions of plans and goals from the chairs.<\/p>\n<p>9:00 &#8211; 10:00 lightning talks<\/p>\n<p>(5 min presentation with 2 minutes for questions)<\/p>\n<ul>\n<li>\n<div>Olga Baysal, Oleksii Kononenko, Reid Holmes and Mike Godfrey. Extracting Artifact Lifecycle Models From Metadata History<\/div>\n<\/li>\n<li>\n<div>Emanuel Giger and Harald Gall. Effect Size Analysis<\/div>\n<\/li>\n<li>\n<div>Rodrigo Souza, Christina Chavez and Roberto Bittencourt. Patterns for Cleaning Up Bug Data<\/div>\n<\/li>\n<li>\n<div>David Weiss and Audris Mockus. The Chunking Pattern<\/div>\n<\/li>\n<li>\n<div>Barbara Russo. Parametric Classi\ufb01cation over Multiple Samples<\/div>\n<\/li>\n<li>\n<div>Xiaobing Sun, Ying Chen, Bin Li and Bixin Li. Exploring Software Engineering Data with Formal Concept Analysis<\/div>\n<\/li>\n<li>\n<div>Barbara Russo and Maximilian Steff. Commit Histories<\/div>\n<\/li>\n<li>\n<div>Sandro Morasca. Data Analysis Anti-Patterns in Empirical Software Engineering<\/div>\n<\/li>\n<\/ul>\n<p dir=\"ltr\">10:00 &#8211; 10:30 Break<\/p>\n<p dir=\"ltr\">10:30 &#8211; 11:15 lightning talks 2<\/p>\n<ul>\n<li>Peter Schulam, Roni Rosenfeld and Premkumar Devanbu. Building Statistical Language Models of Code<\/li>\n<li>Scott McGrath, Dhundy Kiran Bastola and Harvey Siy. Concept to Commit: A pattern designed to trace code changes from user requests to change implementation by analyzing mailing lists and code repositories.<\/li>\n<li>Emmanuel Letier and Camilo Fitzgerald. Measure what Counts: An Evaluation Pattern for Software Data Analysis<\/li>\n<li>Venkatesh Prasad Ranganath and Jithin Thomas. Structural and Temporal Patterns-based Features<\/li>\n<li>Rodrigo Souza, Christina Chavez and Roberto A. Bittencourt. Patterns for Extracting High Level Information from Bug Reports<\/li>\n<li>Burak Turhan. Relevancy Filtering<\/li>\n<\/ul>\n<p>11:15 &#8211; 12:00<\/p>\n<p>Discussion on what makes a good data analysis pattern.<\/p>\n<p>12:00 &#8211; 13:30 Lunch<\/p>\n<p>13:30 &#8211; 14:45 Breakout discussion groups<\/p>\n<p>14:45 &#8211; 15:30 Breakout groups present<\/p>\n<p>15:30 &#8211; 16:00 Break<\/p>\n<p>16:00 &#8211; 17:00 Workshop Discussion.<\/p>\n<p>Potential topics include:<\/p>\n<ul>\n<li>How do we &#8220;evangelize&#8221; patterns?<\/li>\n<li>How can we make patterns reusable?<\/li>\n<li>What needs exist for data analysis patterns?<\/li>\n<li>What are common data analysis mistakes and how can we or patterns help others avoid them.<\/li>\n<li>What is the right way to catalog the patterns?<\/li>\n<li>Where should data analysis patterns live? Should there be a web resource where people post info on patterns?<\/li>\n<li>Additional topics solicited from attendees.<\/li>\n<\/ul>\n<p>17:00 Wrap up. Discussion of future events.<\/p>\n<p>17:30 End<\/p>\n<\/div>\n<div class=\"conM \">\n<h2>Accepted Papers<\/h2>\n<ul>\n<li>Emanuel Giger and Harald Gall. Effect Size Analysis<\/li>\n<li>Rodrigo Souza, Christina Chavez and Roberto Bittencourt. Patterns for Cleaning Up Bug Data<\/li>\n<li>Xiaobing Sun, Ying Chen, Bin Li and Bixin Li. Exploring Software Engineering Data with Formal Concept Analysis<\/li>\n<li>David Weiss and Audris Mockus. The Chunking Pattern<\/li>\n<li>Barbara Russo. Parametric Classi\ufb01cation over Multiple Samples<\/li>\n<li>Barbara Russo and Maximilian Steff. Commit Histories<\/li>\n<li>Sandro Morasca. Data Analysis Anti-Patterns in Empirical Software Engineering<\/li>\n<li>Olga Baysal, Oleksii Kononenko, Reid Holmes and Mike Godfrey. Extracting Artifact Lifecycle Models From Metadata History<\/li>\n<li>Rodrigo Souza, Christina Chavez and Roberto A. Bittencourt. Patterns for Extracting High Level Information from Bug Reports<\/li>\n<li>Peter Schulam, Roni Rosenfeld and Premkumar Devanbu. Building Statistical Language Models of Code<\/li>\n<li>Scott McGrath, Dhundy Kiran Bastola and Harvey Siy. Concept to Commit: A pattern designed to trace code changes from user requests to change implementation by analyzing mailing lists and code repositories.<\/li>\n<li>Emmanuel Letier and Camilo Fitzgerald. Measure what Counts: An Evaluation Pattern for Software Data Analysis<\/li>\n<li>Venkatesh Prasad Ranganath and Jithin Thomas. Structural and Temporal Patterns-based Features<\/li>\n<\/ul>\n<\/div>\n<div class=\"conM \">\n<h2>Non-Archival Accepted Papers<\/h2>\n<ul>\n<li>Burak Turhan. Relevancy Filtering.<\/li>\n<\/ul>\n<\/div>\n<div class=\"conM \">\n<h2>Submissions<\/h2>\n<p>We solicit papers (2-3 pages) describing one or more data analysis pattern. Authors should use the form that is most suited to describe the pattern. Where possible, we encourage authors to describe pattern as follows<\/p>\n<ul>\n<li>Pattern name: a handle for the pattern<\/li>\n<li>Problem: when to apply the pattern<\/li>\n<li>Solution: how to apply the pattern<\/li>\n<li>Consequence: results and trade-offs of applying the pattern, common mistakes in applying the pattern to be avoided, etc.<\/li>\n<li>Examples: brief summary and\/or cite example applications of the pattern in literature; if possible, R snippets or Weka code to apply the pattern, etc.<\/li>\n<\/ul>\n<p>There are two options for submitting a proposal.<\/p>\n<ul>\n<li><strong>Archival Papers:<\/strong> Submit the pattern by February 7, 2013. If accepted,\u00a0it\u00a0will be published in the workshop proceedings and the ACM and IEEE Digital Libraries.<\/li>\n<li><strong>Non-Archival Papers:<\/strong> Submit the paper by April 24, 2013; we will\u00a0send the notification within two weeks.\u00a0If accepted, the paper will be published on the workshop web-pages only; non-archival papers will not be published in the workshop proceedings and the ACM and IEEE Digital Libraries.<\/li>\n<\/ul>\n<p>Both archival and non-archival papers will be reviewed by a program committee and accepted based on the clarity of the description and how broadly their proposed pattern might be applicable. Prior application of the pattern by the authors is not a requirement. This workshop is more interested in the mechanics and choice of the data analysis than the impact of published results.<\/p>\n<p>Upon notification of acceptance, all authors of accepted archival papers will be asked to complete an IEEE Copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to present the paper at the workshop.<\/p>\n<p>All submitted papers must conform to the<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"http:\/\/2013.icse-conferences.org\/content\/submission-guidelines\"> ICSE 2013 formatting and submission instructions<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0and must not exceed the page limits mentioned above, including figures and references. All submissions must be in English. Papers must be submitted electronically, in PDF format, using the submission site hosted by EasyChair:<br \/>\n<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"https:\/\/www.easychair.org\/conferences\/?conf=dapse2013\">https:\/\/www.easychair.org\/conferences\/?conf=dapse2013<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p><em>It is the desire of the organizers that discussion of research at the workshop does not preclude publication of closely related material at conferences or journals. Authors of accepted papers will be able to choose whether to include their papers in the workshop proceedings.<\/em><\/p>\n<\/div>\n<div class=\"conM \">\n<h2>Format<\/h2>\n<p>The workshop will consist of the following sessions:<\/p>\n<ul>\n<li><em>Lightning session. <\/em>Authors of accepted papers will give a lightning talk in the morning to present their proposed pattern (about 5-10 minutes depending on the number of accepted papers).<\/li>\n<li><em>Discussion session.<\/em> This session has two goals: (1) Group the patterns into pattern types. (2) Refine the pattern groups and the interactions between patterns. For example, we expect that some patterns could be composed into more powerful patterns while other patterns could be split into smaller pattern.<\/li>\n<li><em>Breakout session.<\/em> For the next session, participants will break out into groups and try to use the data analysis patterns to solve several data science tasks provided by the workshop organizers. The tasks will come from academic research but also from industry. The goal of this session is to assess the usefulness as well as the completeness of the patterns identified. We expect that patterns will be refined and new patterns will be discovered. At the end of the session each group presents their findings in a 5 minute blitz presentation.<\/li>\n<\/ul>\n<p>Before the workshop there will be a blog to promote and discuss accepted patterns.<\/p>\n<p>After the workshop there will be a Dagstuhl seminar on software development analytics building on the outcomes of this workshop, to which selected authors will be invited.\u00a0Furthermore the organizers plan to edit a book on \u201cData Science for Software Engineers\u201d with a collection of data analysis patterns. Selected authors from the workshop will be invited to contribute chapters to this book.<\/p>\n<\/div>\n<div class=\"conM \">\n<h2>Example of a Pattern<\/h2>\n<p>For illustrative purposes, here\u2019s an example pattern in short and simplified form. For the workshop, we expect the discussion to be more comprehensive. We do welcome both simple and complex analysis patterns.<\/p>\n<table class=\" tWiz tableBorder\">\n<tbody>\n<tr>\n<td><b>Pattern name:<\/b> Contrast<\/p>\n<p><b>Problem:<\/b><br \/>\nDetermine if there is a difference in one or more <i>properties<\/i> between <i>two<\/i> <i>populations.<\/i><\/p>\n<p><b>Solution: <\/b><br \/>\n1. Apply a hypothesis test (student t-test for parametric data, Mann Whitney test for non-parametric test) to check if the property is statistically different between populations.<\/p>\n<p>2. Determine the magnitude of the difference, either through visualization (e.g., boxplot) or when appropriate through mean or median.<\/p>\n<p><b>Discussion: <\/b><br \/>\nEither step without the other can be misleading. For large populations, tiny differences might be statistically significant. In contrast for small populations large differences might not be statistically significant.<\/p>\n<p>Choosing the wrong hypothesis test is a common mistake.<\/p>\n<p><b>Examples:<\/b><br \/>\nFor example, at ICSE 2009, Bird et al. used a Mann Whitney test to compare the defect proneness (=the property) between distributed and co-located binaries (=two populations). See Figure 5 in their paper for a sample visualization of the differences between the two population.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<\/div>\n<p><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<h2>Workshop Organizers<\/h2>\n<p>Christian Bird<br \/>\nMicrosoft Research, USA<\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"http:\/\/menzies.us\/\">Tim Menzies<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\nWest Virginia University, USA<\/p>\n<p>Thomas Zimmermann\u00a0(contact)<br \/>\nMicrosoft Research, USA<\/p>\n<h2>Program Committee<\/h2>\n<p>Lionel Briand, University of Luxembourg, Luxembourg<\/p>\n<p>Yuanfang Cai, Drexel University, USA<\/p>\n<p>Prem Devanbu, University of California at Davis, USA<\/p>\n<p>Massimiliano Di Penta, University of Sannio, Italy<\/p>\n<p>Harald Gall, University of Zurich, Switzerland<\/p>\n<p>Michael Godfrey, University of Waterloo, Canada<\/p>\n<p>Tracy Hall, Brunel University, UK<\/p>\n<p>Shi Han, Microsoft Research, China<\/p>\n<p>Ahmed Hassan, Queen&#8217;s University, Canada<\/p>\n<p>Abram Hindle, University of Alberta, Canada<\/p>\n<p>Sung Kim, Hongkong University of Science and Technology, China<\/p>\n<p>Michele Lanza, University of Lugano, Switzerland<\/p>\n<p>Audris Mockus, Avaya Labs Research, USA<\/p>\n<p>Emerson Murphy-Hill, North Carolina State University, USA<\/p>\n<p>Venkatesh-Prasad Ranganath, Microsoft Research, India<\/p>\n<p>Romain Robbes, University in Chile, Chile<\/p>\n<p>Pete Rotella, Cisco Systems, USA<\/p>\n<p>Anita Sarma, University of Nebraska, USA<\/p>\n<p>Carolyn Seaman, University of Maryland, USA<\/p>\n<p>Martin Shepperd, Brunel University, UK<\/p>\n<p>Burak Turhan, University of Oulu, Finland<\/p>\n<p>Stefan Wagner, University of Stuttgart, Germany<\/p>\n<p>Patrick Wagstrom, IBM, USA<\/p>\n<p>Laurie Williams, North Carolina State University, USA<\/p>\n<p>Ye Yang, Chinese Academy of Sciences, China<span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Tuesday, May 21, 2013 Hyatt Regency San Francisco (opens in new tab) 5 Embarcadero Center (Map (opens in new tab)) San Francisco, California, USA 94111 Workshop in conjunction with the\u00a0ICSE 2013 (opens in new tab)\u00a0conference. Important Dates Workshop paper submissions due February 7, 2013 (archival papers) Notification of authors February 28, 2013 Camera-ready copies March [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_startdate":"2013-05-21","msr_enddate":"2013-05-21","msr_location":"San Francisco, CA, USA","msr_expirationdate":"","msr_event_recording_link":"","msr_event_link":"","msr_event_link_redirect":false,"msr_event_time":"","msr_hide_region":false,"msr_private_event":true,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[13560],"msr-region":[],"msr-event-type":[],"msr-video-type":[],"msr-locale":[268875],"msr-program-audience":[],"msr-post-option":[],"msr-impact-theme":[],"class_list":["post-199828","msr-event","type-msr-event","status-publish","hentry","msr-research-area-programming-languages-software-engineering","msr-locale-en_us"],"msr_about":"<!-- wp:msr\/event-details {\"title\":\"DAPSE\u201913: International Workshop on Data Analysis Patterns in Software Engineering\",\"backgroundColor\":\"grey\"} \/-->\n\n<!-- wp:msr\/content-tabs --><!-- wp:msr\/content-tab {\"title\":\"About\"} --><!-- wp:freeform --><p>Tuesday, May 21, 2013<br \/>\n<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"http:\/\/www.sanfranciscoregency.hyatt.com\/hyatt\/hotels\/index.jsp\">Hyatt Regency San Francisco<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\n5 Embarcadero Center (<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"http:\/\/binged.it\/yXP3z7\">Map<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>)<br \/>\nSan Francisco, California, USA 94111<\/p>\n<p>Workshop in conjunction with the\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/2013.icse-conferences.org\/\" target=\"_blank\">ICSE 2013<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0conference.<\/p>\n<h2>Important Dates<\/h2>\n<p>Workshop paper submissions due<br \/>\nFebruary 7, 2013 (archival papers)<\/p>\n<p>Notification of authors<br \/>\nFebruary 28, 2013<\/p>\n<p>Camera-ready copies<br \/>\nMarch 7, 2013<\/p>\n<p>Non-archival submissions accepted until<br \/>\nApril 24, 2013<\/p>\n<h2>Submission Site<\/h2>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"https:\/\/www.easychair.org\/conferences\/?conf=dapse2013\">https:\/\/www.easychair.org\/conferences\/?conf=dapse2013<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<div class=\"conM \">\n<p>Data scientists in software engineering seek insight in data collected from software projects to improve software development. The demand for data scientists with domain knowledge in software development is growing rapidly and there is already a shortage of such data scientists.<\/p>\n<p>Data science is a skilled art with a steep learning curve. To shorten that learning curve, this workshop will collect best practices in form of data analysis patterns, that is, analyses of data that leads to meaningful conclusions and can be reused for comparable data. In the workshop we will compile a catalog of such patterns that will help both experienced and emerging data scientists to better communicate about data analysis. The workshop is intended for anyone interested in how to analyze data correctly and efficiently in a community accepted way.<\/p>\n<\/div>\n<div class=\"conM \">\n<h2>Workshop Program<\/h2>\n<p>8:30 &#8211; 9:00 Introductions and discussions of plans and goals from the chairs.<\/p>\n<p>9:00 &#8211; 10:00 lightning talks<\/p>\n<p>(5 min presentation with 2 minutes for questions)<\/p>\n<ul>\n<li>\n<div>Olga Baysal, Oleksii Kononenko, Reid Holmes and Mike Godfrey. Extracting Artifact Lifecycle Models From Metadata History<\/div>\n<\/li>\n<li>\n<div>Emanuel Giger and Harald Gall. Effect Size Analysis<\/div>\n<\/li>\n<li>\n<div>Rodrigo Souza, Christina Chavez and Roberto Bittencourt. Patterns for Cleaning Up Bug Data<\/div>\n<\/li>\n<li>\n<div>David Weiss and Audris Mockus. The Chunking Pattern<\/div>\n<\/li>\n<li>\n<div>Barbara Russo. Parametric Classi\ufb01cation over Multiple Samples<\/div>\n<\/li>\n<li>\n<div>Xiaobing Sun, Ying Chen, Bin Li and Bixin Li. Exploring Software Engineering Data with Formal Concept Analysis<\/div>\n<\/li>\n<li>\n<div>Barbara Russo and Maximilian Steff. Commit Histories<\/div>\n<\/li>\n<li>\n<div>Sandro Morasca. Data Analysis Anti-Patterns in Empirical Software Engineering<\/div>\n<\/li>\n<\/ul>\n<p dir=\"ltr\">10:00 &#8211; 10:30 Break<\/p>\n<p dir=\"ltr\">10:30 &#8211; 11:15 lightning talks 2<\/p>\n<ul>\n<li>Peter Schulam, Roni Rosenfeld and Premkumar Devanbu. Building Statistical Language Models of Code<\/li>\n<li>Scott McGrath, Dhundy Kiran Bastola and Harvey Siy. Concept to Commit: A pattern designed to trace code changes from user requests to change implementation by analyzing mailing lists and code repositories.<\/li>\n<li>Emmanuel Letier and Camilo Fitzgerald. Measure what Counts: An Evaluation Pattern for Software Data Analysis<\/li>\n<li>Venkatesh Prasad Ranganath and Jithin Thomas. Structural and Temporal Patterns-based Features<\/li>\n<li>Rodrigo Souza, Christina Chavez and Roberto A. Bittencourt. Patterns for Extracting High Level Information from Bug Reports<\/li>\n<li>Burak Turhan. Relevancy Filtering<\/li>\n<\/ul>\n<p>11:15 &#8211; 12:00<\/p>\n<p>Discussion on what makes a good data analysis pattern.<\/p>\n<p>12:00 &#8211; 13:30 Lunch<\/p>\n<p>13:30 &#8211; 14:45 Breakout discussion groups<\/p>\n<p>14:45 &#8211; 15:30 Breakout groups present<\/p>\n<p>15:30 &#8211; 16:00 Break<\/p>\n<p>16:00 &#8211; 17:00 Workshop Discussion.<\/p>\n<p>Potential topics include:<\/p>\n<ul>\n<li>How do we &#8220;evangelize&#8221; patterns?<\/li>\n<li>How can we make patterns reusable?<\/li>\n<li>What needs exist for data analysis patterns?<\/li>\n<li>What are common data analysis mistakes and how can we or patterns help others avoid them.<\/li>\n<li>What is the right way to catalog the patterns?<\/li>\n<li>Where should data analysis patterns live? Should there be a web resource where people post info on patterns?<\/li>\n<li>Additional topics solicited from attendees.<\/li>\n<\/ul>\n<p>17:00 Wrap up. Discussion of future events.<\/p>\n<p>17:30 End<\/p>\n<\/div>\n<div class=\"conM \">\n<h2>Accepted Papers<\/h2>\n<ul>\n<li>Emanuel Giger and Harald Gall. Effect Size Analysis<\/li>\n<li>Rodrigo Souza, Christina Chavez and Roberto Bittencourt. Patterns for Cleaning Up Bug Data<\/li>\n<li>Xiaobing Sun, Ying Chen, Bin Li and Bixin Li. Exploring Software Engineering Data with Formal Concept Analysis<\/li>\n<li>David Weiss and Audris Mockus. The Chunking Pattern<\/li>\n<li>Barbara Russo. Parametric Classi\ufb01cation over Multiple Samples<\/li>\n<li>Barbara Russo and Maximilian Steff. Commit Histories<\/li>\n<li>Sandro Morasca. Data Analysis Anti-Patterns in Empirical Software Engineering<\/li>\n<li>Olga Baysal, Oleksii Kononenko, Reid Holmes and Mike Godfrey. Extracting Artifact Lifecycle Models From Metadata History<\/li>\n<li>Rodrigo Souza, Christina Chavez and Roberto A. Bittencourt. Patterns for Extracting High Level Information from Bug Reports<\/li>\n<li>Peter Schulam, Roni Rosenfeld and Premkumar Devanbu. Building Statistical Language Models of Code<\/li>\n<li>Scott McGrath, Dhundy Kiran Bastola and Harvey Siy. Concept to Commit: A pattern designed to trace code changes from user requests to change implementation by analyzing mailing lists and code repositories.<\/li>\n<li>Emmanuel Letier and Camilo Fitzgerald. Measure what Counts: An Evaluation Pattern for Software Data Analysis<\/li>\n<li>Venkatesh Prasad Ranganath and Jithin Thomas. Structural and Temporal Patterns-based Features<\/li>\n<\/ul>\n<\/div>\n<div class=\"conM \">\n<h2>Non-Archival Accepted Papers<\/h2>\n<ul>\n<li>Burak Turhan. Relevancy Filtering.<\/li>\n<\/ul>\n<\/div>\n<div class=\"conM \">\n<h2>Submissions<\/h2>\n<p>We solicit papers (2-3 pages) describing one or more data analysis pattern. Authors should use the form that is most suited to describe the pattern. Where possible, we encourage authors to describe pattern as follows<\/p>\n<ul>\n<li>Pattern name: a handle for the pattern<\/li>\n<li>Problem: when to apply the pattern<\/li>\n<li>Solution: how to apply the pattern<\/li>\n<li>Consequence: results and trade-offs of applying the pattern, common mistakes in applying the pattern to be avoided, etc.<\/li>\n<li>Examples: brief summary and\/or cite example applications of the pattern in literature; if possible, R snippets or Weka code to apply the pattern, etc.<\/li>\n<\/ul>\n<p>There are two options for submitting a proposal.<\/p>\n<ul>\n<li><strong>Archival Papers:<\/strong> Submit the pattern by February 7, 2013. If accepted,\u00a0it\u00a0will be published in the workshop proceedings and the ACM and IEEE Digital Libraries.<\/li>\n<li><strong>Non-Archival Papers:<\/strong> Submit the paper by April 24, 2013; we will\u00a0send the notification within two weeks.\u00a0If accepted, the paper will be published on the workshop web-pages only; non-archival papers will not be published in the workshop proceedings and the ACM and IEEE Digital Libraries.<\/li>\n<\/ul>\n<p>Both archival and non-archival papers will be reviewed by a program committee and accepted based on the clarity of the description and how broadly their proposed pattern might be applicable. Prior application of the pattern by the authors is not a requirement. This workshop is more interested in the mechanics and choice of the data analysis than the impact of published results.<\/p>\n<p>Upon notification of acceptance, all authors of accepted archival papers will be asked to complete an IEEE Copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to present the paper at the workshop.<\/p>\n<p>All submitted papers must conform to the<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"http:\/\/2013.icse-conferences.org\/content\/submission-guidelines\"> ICSE 2013 formatting and submission instructions<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0and must not exceed the page limits mentioned above, including figures and references. All submissions must be in English. Papers must be submitted electronically, in PDF format, using the submission site hosted by EasyChair:<br \/>\n<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"https:\/\/www.easychair.org\/conferences\/?conf=dapse2013\">https:\/\/www.easychair.org\/conferences\/?conf=dapse2013<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n<p><em>It is the desire of the organizers that discussion of research at the workshop does not preclude publication of closely related material at conferences or journals. Authors of accepted papers will be able to choose whether to include their papers in the workshop proceedings.<\/em><\/p>\n<\/div>\n<div class=\"conM \">\n<h2>Format<\/h2>\n<p>The workshop will consist of the following sessions:<\/p>\n<ul>\n<li><em>Lightning session. <\/em>Authors of accepted papers will give a lightning talk in the morning to present their proposed pattern (about 5-10 minutes depending on the number of accepted papers).<\/li>\n<li><em>Discussion session.<\/em> This session has two goals: (1) Group the patterns into pattern types. (2) Refine the pattern groups and the interactions between patterns. For example, we expect that some patterns could be composed into more powerful patterns while other patterns could be split into smaller pattern.<\/li>\n<li><em>Breakout session.<\/em> For the next session, participants will break out into groups and try to use the data analysis patterns to solve several data science tasks provided by the workshop organizers. The tasks will come from academic research but also from industry. The goal of this session is to assess the usefulness as well as the completeness of the patterns identified. We expect that patterns will be refined and new patterns will be discovered. At the end of the session each group presents their findings in a 5 minute blitz presentation.<\/li>\n<\/ul>\n<p>Before the workshop there will be a blog to promote and discuss accepted patterns.<\/p>\n<p>After the workshop there will be a Dagstuhl seminar on software development analytics building on the outcomes of this workshop, to which selected authors will be invited.\u00a0Furthermore the organizers plan to edit a book on \u201cData Science for Software Engineers\u201d with a collection of data analysis patterns. Selected authors from the workshop will be invited to contribute chapters to this book.<\/p>\n<\/div>\n<div class=\"conM \">\n<h2>Example of a Pattern<\/h2>\n<p>For illustrative purposes, here\u2019s an example pattern in short and simplified form. For the workshop, we expect the discussion to be more comprehensive. We do welcome both simple and complex analysis patterns.<\/p>\n<table class=\" tWiz tableBorder\">\n<tbody>\n<tr>\n<td><b>Pattern name:<\/b> Contrast<\/p>\n<p><b>Problem:<\/b><br \/>\nDetermine if there is a difference in one or more <i>properties<\/i> between <i>two<\/i> <i>populations.<\/i><\/p>\n<p><b>Solution: <\/b><br \/>\n1. Apply a hypothesis test (student t-test for parametric data, Mann Whitney test for non-parametric test) to check if the property is statistically different between populations.<\/p>\n<p>2. Determine the magnitude of the difference, either through visualization (e.g., boxplot) or when appropriate through mean or median.<\/p>\n<p><b>Discussion: <\/b><br \/>\nEither step without the other can be misleading. For large populations, tiny differences might be statistically significant. In contrast for small populations large differences might not be statistically significant.<\/p>\n<p>Choosing the wrong hypothesis test is a common mistake.<\/p>\n<p><b>Examples:<\/b><br \/>\nFor example, at ICSE 2009, Bird et al. used a Mann Whitney test to compare the defect proneness (=the property) between distributed and co-located binaries (=two populations). See Figure 5 in their paper for a sample visualization of the differences between the two population.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<\/div>\n<p><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<!-- \/wp:freeform --><!-- \/wp:msr\/content-tab --><!-- wp:msr\/content-tab {\"title\":\"Organization\"} --><!-- wp:freeform --><h2>Workshop Organizers<\/h2>\n<p>Christian Bird<br \/>\nMicrosoft Research, USA<\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" target=\"_blank\" href=\"http:\/\/menzies.us\/\">Tim Menzies<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\nWest Virginia University, USA<\/p>\n<p>Thomas Zimmermann\u00a0(contact)<br \/>\nMicrosoft Research, USA<\/p>\n<h2>Program Committee<\/h2>\n<p>Lionel Briand, University of Luxembourg, Luxembourg<\/p>\n<p>Yuanfang Cai, Drexel University, USA<\/p>\n<p>Prem Devanbu, University of California at Davis, USA<\/p>\n<p>Massimiliano Di Penta, University of Sannio, Italy<\/p>\n<p>Harald Gall, University of Zurich, Switzerland<\/p>\n<p>Michael Godfrey, University of Waterloo, Canada<\/p>\n<p>Tracy Hall, Brunel University, UK<\/p>\n<p>Shi Han, Microsoft Research, China<\/p>\n<p>Ahmed Hassan, Queen&#8217;s University, Canada<\/p>\n<p>Abram Hindle, University of Alberta, Canada<\/p>\n<p>Sung Kim, Hongkong University of Science and Technology, China<\/p>\n<p>Michele Lanza, University of Lugano, Switzerland<\/p>\n<p>Audris Mockus, Avaya Labs Research, USA<\/p>\n<p>Emerson Murphy-Hill, North Carolina State University, USA<\/p>\n<p>Venkatesh-Prasad Ranganath, Microsoft Research, India<\/p>\n<p>Romain Robbes, University in Chile, Chile<\/p>\n<p>Pete Rotella, Cisco Systems, USA<\/p>\n<p>Anita Sarma, University of Nebraska, USA<\/p>\n<p>Carolyn Seaman, University of Maryland, USA<\/p>\n<p>Martin Shepperd, Brunel University, UK<\/p>\n<p>Burak Turhan, University of Oulu, Finland<\/p>\n<p>Stefan Wagner, University of Stuttgart, Germany<\/p>\n<p>Patrick Wagstrom, IBM, USA<\/p>\n<p>Laurie Williams, North Carolina State University, USA<\/p>\n<p>Ye Yang, Chinese Academy of Sciences, China<span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<!-- \/wp:freeform --><!-- \/wp:msr\/content-tab --><!-- \/wp:msr\/content-tabs -->","tab-content":[{"id":0,"name":"About","content":"<div class=\"conM \">\r\n\r\nData scientists in software engineering seek insight in data collected from software projects to improve software development. The demand for data scientists with domain knowledge in software development is growing rapidly and there is already a shortage of such data scientists.\r\n\r\nData science is a skilled art with a steep learning curve. To shorten that learning curve, this workshop will collect best practices in form of data analysis patterns, that is, analyses of data that leads to meaningful conclusions and can be reused for comparable data. In the workshop we will compile a catalog of such patterns that will help both experienced and emerging data scientists to better communicate about data analysis. The workshop is intended for anyone interested in how to analyze data correctly and efficiently in a community accepted way.\r\n\r\n<\/div>\r\n<div class=\"conM \">\r\n<h2>Workshop Program<\/h2>\r\n8:30 - 9:00 Introductions and discussions of plans and goals from the chairs.\r\n\r\n9:00 - 10:00 lightning talks\r\n\r\n(5 min presentation with 2 minutes for questions)\r\n<ul>\r\n \t<li>\r\n<div>Olga Baysal, Oleksii Kononenko, Reid Holmes and Mike Godfrey. Extracting Artifact Lifecycle Models From Metadata History<\/div><\/li>\r\n \t<li>\r\n<div>Emanuel Giger and Harald Gall. Effect Size Analysis<\/div><\/li>\r\n \t<li>\r\n<div>Rodrigo Souza, Christina Chavez and Roberto Bittencourt. Patterns for Cleaning Up Bug Data<\/div><\/li>\r\n \t<li>\r\n<div>David Weiss and Audris Mockus. The Chunking Pattern<\/div><\/li>\r\n \t<li>\r\n<div>Barbara Russo. Parametric Classi\ufb01cation over Multiple Samples<\/div><\/li>\r\n \t<li>\r\n<div>Xiaobing Sun, Ying Chen, Bin Li and Bixin Li. Exploring Software Engineering Data with Formal Concept Analysis<\/div><\/li>\r\n \t<li>\r\n<div>Barbara Russo and Maximilian Steff. Commit Histories<\/div><\/li>\r\n \t<li>\r\n<div>Sandro Morasca. Data Analysis Anti-Patterns in Empirical Software Engineering<\/div><\/li>\r\n<\/ul>\r\n<p dir=\"ltr\">10:00 - 10:30 Break<\/p>\r\n<p dir=\"ltr\">10:30 - 11:15 lightning talks 2<\/p>\r\n\r\n<ul>\r\n \t<li>Peter Schulam, Roni Rosenfeld and Premkumar Devanbu. Building Statistical Language Models of Code<\/li>\r\n \t<li>Scott McGrath, Dhundy Kiran Bastola and Harvey Siy. Concept to Commit: A pattern designed to trace code changes from user requests to change implementation by analyzing mailing lists and code repositories.<\/li>\r\n \t<li>Emmanuel Letier and Camilo Fitzgerald. Measure what Counts: An Evaluation Pattern for Software Data Analysis<\/li>\r\n \t<li>Venkatesh Prasad Ranganath and Jithin Thomas. Structural and Temporal Patterns-based Features<\/li>\r\n \t<li>Rodrigo Souza, Christina Chavez and Roberto A. Bittencourt. Patterns for Extracting High Level Information from Bug Reports<\/li>\r\n \t<li>Burak Turhan. Relevancy Filtering<\/li>\r\n<\/ul>\r\n11:15 - 12:00\r\n\r\nDiscussion on what makes a good data analysis pattern.\r\n\r\n12:00 - 13:30 Lunch\r\n\r\n13:30 - 14:45 Breakout discussion groups\r\n\r\n14:45 - 15:30 Breakout groups present\r\n\r\n15:30 - 16:00 Break\r\n\r\n16:00 - 17:00 Workshop Discussion.\r\n\r\nPotential topics include:\r\n<ul>\r\n \t<li>How do we \"evangelize\" patterns?<\/li>\r\n \t<li>How can we make patterns reusable?<\/li>\r\n \t<li>What needs exist for data analysis patterns?<\/li>\r\n \t<li>What are common data analysis mistakes and how can we or patterns help others avoid them.<\/li>\r\n \t<li>What is the right way to catalog the patterns?<\/li>\r\n \t<li>Where should data analysis patterns live? Should there be a web resource where people post info on patterns?<\/li>\r\n \t<li>Additional topics solicited from attendees.<\/li>\r\n<\/ul>\r\n17:00 Wrap up. Discussion of future events.\r\n\r\n17:30 End\r\n\r\n<\/div>\r\n<div class=\"conM \">\r\n<h2>Accepted Papers<\/h2>\r\n<ul>\r\n \t<li>Emanuel Giger and Harald Gall. Effect Size Analysis<\/li>\r\n \t<li>Rodrigo Souza, Christina Chavez and Roberto Bittencourt. Patterns for Cleaning Up Bug Data<\/li>\r\n \t<li>Xiaobing Sun, Ying Chen, Bin Li and Bixin Li. Exploring Software Engineering Data with Formal Concept Analysis<\/li>\r\n \t<li>David Weiss and Audris Mockus. The Chunking Pattern<\/li>\r\n \t<li>Barbara Russo. Parametric Classi\ufb01cation over Multiple Samples<\/li>\r\n \t<li>Barbara Russo and Maximilian Steff. Commit Histories<\/li>\r\n \t<li>Sandro Morasca. Data Analysis Anti-Patterns in Empirical Software Engineering<\/li>\r\n \t<li>Olga Baysal, Oleksii Kononenko, Reid Holmes and Mike Godfrey. Extracting Artifact Lifecycle Models From Metadata History<\/li>\r\n \t<li>Rodrigo Souza, Christina Chavez and Roberto A. Bittencourt. Patterns for Extracting High Level Information from Bug Reports<\/li>\r\n \t<li>Peter Schulam, Roni Rosenfeld and Premkumar Devanbu. Building Statistical Language Models of Code<\/li>\r\n \t<li>Scott McGrath, Dhundy Kiran Bastola and Harvey Siy. Concept to Commit: A pattern designed to trace code changes from user requests to change implementation by analyzing mailing lists and code repositories.<\/li>\r\n \t<li>Emmanuel Letier and Camilo Fitzgerald. Measure what Counts: An Evaluation Pattern for Software Data Analysis<\/li>\r\n \t<li>Venkatesh Prasad Ranganath and Jithin Thomas. Structural and Temporal Patterns-based Features<\/li>\r\n<\/ul>\r\n<\/div>\r\n<div class=\"conM \">\r\n<h2>Non-Archival Accepted Papers<\/h2>\r\n<ul>\r\n \t<li>Burak Turhan. Relevancy Filtering.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<div class=\"conM \">\r\n<h2>Submissions<\/h2>\r\nWe solicit papers (2-3 pages) describing one or more data analysis pattern. Authors should use the form that is most suited to describe the pattern. Where possible, we encourage authors to describe pattern as follows\r\n<ul>\r\n \t<li>Pattern name: a handle for the pattern<\/li>\r\n \t<li>Problem: when to apply the pattern<\/li>\r\n \t<li>Solution: how to apply the pattern<\/li>\r\n \t<li>Consequence: results and trade-offs of applying the pattern, common mistakes in applying the pattern to be avoided, etc.<\/li>\r\n \t<li>Examples: brief summary and\/or cite example applications of the pattern in literature; if possible, R snippets or Weka code to apply the pattern, etc.<\/li>\r\n<\/ul>\r\nThere are two options for submitting a proposal.\r\n<ul>\r\n \t<li><strong>Archival Papers:<\/strong> Submit the pattern by February 7, 2013. If accepted,\u00a0it\u00a0will be published in the workshop proceedings and the ACM and IEEE Digital Libraries.<\/li>\r\n \t<li><strong>Non-Archival Papers:<\/strong> Submit the paper by April 24, 2013; we will\u00a0send the notification within two weeks.\u00a0If accepted, the paper will be published on the workshop web-pages only; non-archival papers will not be published in the workshop proceedings and the ACM and IEEE Digital Libraries.<\/li>\r\n<\/ul>\r\nBoth archival and non-archival papers will be reviewed by a program committee and accepted based on the clarity of the description and how broadly their proposed pattern might be applicable. Prior application of the pattern by the authors is not a requirement. This workshop is more interested in the mechanics and choice of the data analysis than the impact of published results.\r\n\r\nUpon notification of acceptance, all authors of accepted archival papers will be asked to complete an IEEE Copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to present the paper at the workshop.\r\n\r\nAll submitted papers must conform to the<a href=\"http:\/\/2013.icse-conferences.org\/content\/submission-guidelines\"> ICSE 2013 formatting and submission instructions<\/a>\u00a0and must not exceed the page limits mentioned above, including figures and references. All submissions must be in English. Papers must be submitted electronically, in PDF format, using the submission site hosted by EasyChair:\r\n<a href=\"https:\/\/www.easychair.org\/conferences\/?conf=dapse2013\">https:\/\/www.easychair.org\/conferences\/?conf=dapse2013<\/a>\r\n\r\n<em>It is the desire of the organizers that discussion of research at the workshop does not preclude publication of closely related material at conferences or journals. Authors of accepted papers will be able to choose whether to include their papers in the workshop proceedings.<\/em>\r\n\r\n<\/div>\r\n<div class=\"conM \">\r\n<h2>Format<\/h2>\r\nThe workshop will consist of the following sessions:\r\n<ul>\r\n \t<li><em>Lightning session. <\/em>Authors of accepted papers will give a lightning talk in the morning to present their proposed pattern (about 5-10 minutes depending on the number of accepted papers).<\/li>\r\n \t<li><em>Discussion session.<\/em> This session has two goals: (1) Group the patterns into pattern types. (2) Refine the pattern groups and the interactions between patterns. For example, we expect that some patterns could be composed into more powerful patterns while other patterns could be split into smaller pattern.<\/li>\r\n \t<li><em>Breakout session.<\/em> For the next session, participants will break out into groups and try to use the data analysis patterns to solve several data science tasks provided by the workshop organizers. The tasks will come from academic research but also from industry. The goal of this session is to assess the usefulness as well as the completeness of the patterns identified. We expect that patterns will be refined and new patterns will be discovered. At the end of the session each group presents their findings in a 5 minute blitz presentation.<\/li>\r\n<\/ul>\r\nBefore the workshop there will be a blog to promote and discuss accepted patterns.\r\n\r\nAfter the workshop there will be a Dagstuhl seminar on software development analytics building on the outcomes of this workshop, to which selected authors will be invited.\u00a0Furthermore the organizers plan to edit a book on \u201cData Science for Software Engineers\u201d with a collection of data analysis patterns. Selected authors from the workshop will be invited to contribute chapters to this book.\r\n\r\n<\/div>\r\n<div class=\"conM \">\r\n<h2>Example of a Pattern<\/h2>\r\nFor illustrative purposes, here\u2019s an example pattern in short and simplified form. For the workshop, we expect the discussion to be more comprehensive. We do welcome both simple and complex analysis patterns.\r\n<table class=\" tWiz tableBorder\">\r\n<tbody>\r\n<tr>\r\n<td><b>Pattern name:<\/b> Contrast\r\n\r\n<b>Problem:<\/b>\r\nDetermine if there is a difference in one or more <i>properties<\/i> between <i>two<\/i> <i>populations.<\/i>\r\n\r\n<b>Solution: <\/b>\r\n1. Apply a hypothesis test (student t-test for parametric data, Mann Whitney test for non-parametric test) to check if the property is statistically different between populations.\r\n\r\n2. Determine the magnitude of the difference, either through visualization (e.g., boxplot) or when appropriate through mean or median.\r\n\r\n<b>Discussion: <\/b>\r\nEither step without the other can be misleading. For large populations, tiny differences might be statistically significant. In contrast for small populations large differences might not be statistically significant.\r\n\r\nChoosing the wrong hypothesis test is a common mistake.\r\n\r\n<b>Examples:<\/b>\r\nFor example, at ICSE 2009, Bird et al. used a Mann Whitney test to compare the defect proneness (=the property) between distributed and co-located binaries (=two populations). See Figure 5 in their paper for a sample visualization of the differences between the two population.<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n&nbsp;\r\n\r\n<\/div>"},{"id":1,"name":"Organization","content":"<h2>Workshop Organizers<\/h2>\r\nChristian Bird\r\nMicrosoft Research, USA\r\n\r\n<a href=\"http:\/\/menzies.us\/\">Tim Menzies<\/a>\r\nWest Virginia University, USA\r\n\r\nThomas Zimmermann\u00a0(contact)\r\nMicrosoft Research, USA\r\n<h2>Program Committee<\/h2>\r\nLionel Briand, University of Luxembourg, Luxembourg\r\n\r\nYuanfang Cai, Drexel University, USA\r\n\r\nPrem Devanbu, University of California at Davis, USA\r\n\r\nMassimiliano Di Penta, University of Sannio, Italy\r\n\r\nHarald Gall, University of Zurich, Switzerland\r\n\r\nMichael Godfrey, University of Waterloo, Canada\r\n\r\nTracy Hall, Brunel University, UK\r\n\r\nShi Han, Microsoft Research, China\r\n\r\nAhmed Hassan, Queen's University, Canada\r\n\r\nAbram Hindle, University of Alberta, Canada\r\n\r\nSung Kim, Hongkong University of Science and Technology, China\r\n\r\nMichele Lanza, University of Lugano, Switzerland\r\n\r\nAudris Mockus, Avaya Labs Research, USA\r\n\r\nEmerson Murphy-Hill, North Carolina State University, USA\r\n\r\nVenkatesh-Prasad Ranganath, Microsoft Research, India\r\n\r\nRomain Robbes, University in Chile, Chile\r\n\r\nPete Rotella, Cisco Systems, USA\r\n\r\nAnita Sarma, University of Nebraska, USA\r\n\r\nCarolyn Seaman, University of Maryland, USA\r\n\r\nMartin Shepperd, Brunel University, UK\r\n\r\nBurak Turhan, University of Oulu, Finland\r\n\r\nStefan Wagner, University of Stuttgart, Germany\r\n\r\nPatrick Wagstrom, IBM, USA\r\n\r\nLaurie Williams, North Carolina State University, USA\r\n\r\nYe Yang, Chinese Academy of Sciences, China"}],"msr_startdate":"2013-05-21","msr_enddate":"2013-05-21","msr_event_time":"","msr_location":"San Francisco, CA, USA","msr_event_link":"","msr_event_recording_link":"","msr_startdate_formatted":"May 21, 2013","msr_register_text":"Watch now","msr_cta_link":"","msr_cta_text":"","msr_cta_bi_name":"","featured_image_thumbnail":null,"event_excerpt":"Data scientists in software engineering seek insight in data collected from software projects to improve software development. The demand for data scientists with domain knowledge in software development is growing rapidly and there is already a shortage of such data scientists. Data science is a skilled art with a steep learning curve. To shorten that learning curve, this workshop will collect best practices in form of data analysis patterns, that is, analyses of data that&hellip;","msr_research_lab":[199565],"related-researchers":[],"msr_impact_theme":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-opportunities":[],"related-publications":[],"related-videos":[],"related-posts":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/199828","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-event"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/199828\/revisions"}],"predecessor-version":[{"id":1147398,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/199828\/revisions\/1147398"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=199828"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=199828"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=199828"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=199828"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=199828"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=199828"},{"taxonomy":"msr-program-audience","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-program-audience?post=199828"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=199828"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=199828"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}