{"id":239534,"date":"2016-06-17T08:21:31","date_gmt":"2016-06-17T15:21:31","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-event&#038;p=239534"},"modified":"2025-08-06T12:00:23","modified_gmt":"2025-08-06T19:00:23","slug":"software-engineering-mix-volume-2-large-scale-data-analysis-of-software-repositories","status":"publish","type":"msr-event","link":"https:\/\/www.microsoft.com\/en-us\/research\/event\/software-engineering-mix-volume-2-large-scale-data-analysis-of-software-repositories\/","title":{"rendered":"Software Engineering Mix Volume 2: Large-scale Data Analysis of Software Repositories"},"content":{"rendered":"\n\n<p><span style=\"line-height: 1.5;\">Microsoft Conference Centre, Hood<\/span><\/p>\n<p><span style=\"line-height: 1.5;\">Software Engineering Mix was part of the <\/span><a style=\"line-height: 1.5;\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/faculty-summit-2016\/\" target=\"_blank\">Microsoft Research Faculty Summit 2016<\/a><span style=\"line-height: 1.5;\">.<\/span><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<p>Software Engineering Mix (SE-MIX) provided a forum for our colleagues from academia to interact directly with Microsoft engineers. The program featured talks from academics: highlights of published research that is highly relevant for Microsoft and blue sky talks summarizing emerging research areas. In addition, practitioners gave presentations about theoretical and pragmatic engineering challenges they face, soliciting help from academia. A coffee round table setting was used to facilitate discussions. This session built on the success of SEIF Days, which provided a discussion forum about the future of software engineering.<\/p>\n<p>The topic of this year&#8217;s SE-MIX was the large-scale data analysis of software repositories (like GitHub for example). Many teams are using GitHub for their OSS projects and would like to have a richer understanding and insight into that activity. While some projects like <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/ghtorrent.org\" target=\"_blank\">GHTorrent<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/githubarchive.org\" target=\"_blank\">GitHub Archive<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0exist, and some insights are available for analyzing a single project, everyone touching this topic sees an enormous potential in the data. The SE-MIX was intended to jumpstart connections between academia and Microsoft on the vast opportunities in leveraging GitHub data and data from other software repositories to develop software more efficiently.<span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<p>Speakers talked about open source and data analysis of large-scale software repositories like GitHub. The SE-MIX featured Open Source Live, a showcase of open source projects related to Microsoft.<\/p>\n<p><strong>8:30-10:30\u00a0 First Session<\/strong><\/p>\n<ul>\n<li>Welcome (10 minutes)<\/li>\n<li>Judith Bishop, Microsoft. <em>Industrial Research and Open Source \u2013 Reasons and Results<\/em> (20 minutes) <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/06\/SE-MIX-Bishop.pptx\">slides<\/a><\/li>\n<li>Jeff McAffer, Microsoft. <em>GitHub Insight: Understanding Open Source<\/em> (20 minutes)<\/li>\n<li>Mei Nagappan, Rochester Institute of Technology. <em>Curating GitHub for Engineered Software Projects <\/em>(20 minutes) <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/06\/CuratingGithub_MSR_FS_SEMix.pdf\">slides<\/a><\/li>\n<li>Speed Dating between academics and Microsoft engineers (50 minutes)<\/li>\n<\/ul>\n<p><em>10:30-10:50\u00a0 Break<\/em><\/p>\n<p><strong>10:50-12:00\u00a0 Second Session<\/strong><\/p>\n<ul>\n<li>Vladimir Filkov, University of California, Davis. <em>How to analyze GitHub traces to ask important questions and get actionable answers?<\/em> (20 minutes)<\/li>\n<li>Cristina Manu & Daniel Quirk, Microsoft. <em>Spot &#8211; A distributed system for source code analysis<\/em> (20 minutes)<\/li>\n<li>Laura Dabbish, Carnegie Mellon University. <em>The social life of software repositories:\u00a0What large scale software analysis can learn from small scale qualitative research<\/em> (20 minutes)<\/li>\n<li>Preparation for Group Brain Storming (10 minutes)<\/li>\n<\/ul>\n<p><em>12:00-13:20\u00a0 Lunch break \/ Group picture<\/em><\/p>\n<p><strong>13:20-15:30\u00a0 Third Session<\/strong><\/p>\n<ul>\n<li>Group Brain Storming (65-80 minutes)<\/li>\n<li>Wrap-up (5 minutes) followed by<\/li>\n<li>Open Source Live! Showcase (45-60 minutes)<\/li>\n<\/ul>\n<p><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<p><div class='content-column col-1-2'><ul>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/abegel\/\" target=\"_blank\">Andrew Begel<\/a><\/li>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cbird\/\" target=\"_blank\">Christian Bird<\/a><\/li>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jbishop\/\" target=\"_blank\">Judith Bishop<\/a><\/li>\n<\/ul><\/div><\/p>\n<p><div class='content-column col-1-2 last_column'><ul>\n<li>Trevor Carnahan<\/li>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/honzhang\/\" target=\"_blank\">Hongyu Zhang<\/a><\/li>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tzimmer\/\" target=\"_blank\">Thomas Zimmermann<\/a><\/li>\n<\/ul><\/div><div class='clear_column'><\/div><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft Conference Centre, Hood Software Engineering Mix was part of the Microsoft Research Faculty Summit 2016.Opens in a new tab Software Engineering Mix (SE-MIX) provided a forum for our colleagues from academia to interact directly with Microsoft engineers. The program featured talks from academics: highlights of published research that is highly relevant for Microsoft and [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr_startdate":"2016-07-15","msr_enddate":"","msr_location":"Redmond, WA, USA","msr_expirationdate":"","msr_event_recording_link":"","msr_event_link":"","msr_event_link_redirect":false,"msr_event_time":"8:30 AM \u2013 3:30 PM","msr_hide_region":false,"msr_private_event":false,"msr_hide_image_in_river":0,"footnotes":""},"research-area":[13563],"msr-region":[197900],"msr-event-type":[197944],"msr-video-type":[],"msr-locale":[268875],"msr-program-audience":[],"msr-post-option":[],"msr-impact-theme":[],"class_list":["post-239534","msr-event","type-msr-event","status-publish","hentry","msr-research-area-data-platform-analytics","msr-region-north-america","msr-event-type-hosted-by-microsoft","msr-locale-en_us"],"msr_about":"<!-- wp:msr\/event-details {\"title\":\"Software Engineering Mix Volume 2: Large-scale Data Analysis of Software Repositories\",\"backgroundColor\":\"grey\"} \/-->\n\n<!-- wp:msr\/content-tabs --><!-- wp:msr\/content-tab {\"title\":\"About\"} --><!-- wp:freeform --><p><span style=\"line-height: 1.5;\">Microsoft Conference Centre, Hood<\/span><\/p>\n<p><span style=\"line-height: 1.5;\">Software Engineering Mix was part of the <\/span><a style=\"line-height: 1.5;\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/event\/faculty-summit-2016\/\" target=\"_blank\">Microsoft Research Faculty Summit 2016<\/a><span style=\"line-height: 1.5;\">.<\/span><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<p>Software Engineering Mix (SE-MIX) provided a forum for our colleagues from academia to interact directly with Microsoft engineers. The program featured talks from academics: highlights of published research that is highly relevant for Microsoft and blue sky talks summarizing emerging research areas. In addition, practitioners gave presentations about theoretical and pragmatic engineering challenges they face, soliciting help from academia. A coffee round table setting was used to facilitate discussions. This session built on the success of SEIF Days, which provided a discussion forum about the future of software engineering.<\/p>\n<p>The topic of this year&#8217;s SE-MIX was the large-scale data analysis of software repositories (like GitHub for example). Many teams are using GitHub for their OSS projects and would like to have a richer understanding and insight into that activity. While some projects like <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/ghtorrent.org\" target=\"_blank\">GHTorrent<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/githubarchive.org\" target=\"_blank\">GitHub Archive<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0exist, and some insights are available for analyzing a single project, everyone touching this topic sees an enormous potential in the data. The SE-MIX was intended to jumpstart connections between academia and Microsoft on the vast opportunities in leveraging GitHub data and data from other software repositories to develop software more efficiently.<span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<!-- \/wp:freeform --><!-- \/wp:msr\/content-tab --><!-- wp:msr\/content-tab {\"title\":\"Agenda\"} --><!-- wp:freeform --><p>Speakers talked about open source and data analysis of large-scale software repositories like GitHub. The SE-MIX featured Open Source Live, a showcase of open source projects related to Microsoft.<\/p>\n<p><strong>8:30-10:30\u00a0 First Session<\/strong><\/p>\n<ul>\n<li>Welcome (10 minutes)<\/li>\n<li>Judith Bishop, Microsoft. <em>Industrial Research and Open Source \u2013 Reasons and Results<\/em> (20 minutes) <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/06\/SE-MIX-Bishop.pptx\">slides<\/a><\/li>\n<li>Jeff McAffer, Microsoft. <em>GitHub Insight: Understanding Open Source<\/em> (20 minutes)<\/li>\n<li>Mei Nagappan, Rochester Institute of Technology. <em>Curating GitHub for Engineered Software Projects <\/em>(20 minutes) <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/06\/CuratingGithub_MSR_FS_SEMix.pdf\">slides<\/a><\/li>\n<li>Speed Dating between academics and Microsoft engineers (50 minutes)<\/li>\n<\/ul>\n<p><em>10:30-10:50\u00a0 Break<\/em><\/p>\n<p><strong>10:50-12:00\u00a0 Second Session<\/strong><\/p>\n<ul>\n<li>Vladimir Filkov, University of California, Davis. <em>How to analyze GitHub traces to ask important questions and get actionable answers?<\/em> (20 minutes)<\/li>\n<li>Cristina Manu &amp; Daniel Quirk, Microsoft. <em>Spot &#8211; A distributed system for source code analysis<\/em> (20 minutes)<\/li>\n<li>Laura Dabbish, Carnegie Mellon University. <em>The social life of software repositories:\u00a0What large scale software analysis can learn from small scale qualitative research<\/em> (20 minutes)<\/li>\n<li>Preparation for Group Brain Storming (10 minutes)<\/li>\n<\/ul>\n<p><em>12:00-13:20\u00a0 Lunch break \/ Group picture<\/em><\/p>\n<p><strong>13:20-15:30\u00a0 Third Session<\/strong><\/p>\n<ul>\n<li>Group Brain Storming (65-80 minutes)<\/li>\n<li>Wrap-up (5 minutes) followed by<\/li>\n<li>Open Source Live! Showcase (45-60 minutes)<\/li>\n<\/ul>\n<p><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<!-- \/wp:freeform --><!-- \/wp:msr\/content-tab --><!-- wp:msr\/content-tab {\"title\":\"Organizers\"} --><!-- wp:freeform --><p><div class='content-column col-1-2'><ul>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/abegel\/\" target=\"_blank\">Andrew Begel<\/a><\/li>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cbird\/\" target=\"_blank\">Christian Bird<\/a><\/li>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jbishop\/\" target=\"_blank\">Judith Bishop<\/a><\/li>\n<\/ul><\/div><\/p>\n<p><div class='content-column col-1-2 last_column'><ul>\n<li>Trevor Carnahan<\/li>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/honzhang\/\" target=\"_blank\">Hongyu Zhang<\/a><\/li>\n<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tzimmer\/\" target=\"_blank\">Thomas Zimmermann<\/a><\/li>\n<\/ul><\/div><div class='clear_column'><\/div><span id=\"label-external-link\" class=\"sr-only\" aria-hidden=\"true\">Opens in a new tab<\/span><\/p>\n<!-- \/wp:freeform --><!-- \/wp:msr\/content-tab --><!-- \/wp:msr\/content-tabs -->","tab-content":[{"id":0,"name":"About","content":"Software Engineering Mix (SE-MIX) provided a forum for our colleagues from academia to interact directly with Microsoft engineers. The program featured talks from academics: highlights of published research that is highly relevant for Microsoft and blue sky talks summarizing emerging research areas. In addition, practitioners gave presentations about theoretical and pragmatic engineering challenges they face, soliciting help from academia. A coffee round table setting was used to facilitate discussions. This session built on the success of SEIF Days, which provided a discussion forum about the future of software engineering.\r\n\r\nThe topic of this year's SE-MIX was the large-scale data analysis of software repositories (like GitHub for example). Many teams are using GitHub for their OSS projects and would like to have a richer understanding and insight into that activity. While some projects like <a href=\"http:\/\/ghtorrent.org\" target=\"_blank\">GHTorrent<\/a>\u00a0and <a href=\"http:\/\/githubarchive.org\" target=\"_blank\">GitHub Archive<\/a>\u00a0exist, and some insights are available for analyzing a single project, everyone touching this topic sees an enormous potential in the data. The SE-MIX was intended to jumpstart connections between academia and Microsoft on the vast opportunities in leveraging GitHub data and data from other software repositories to develop software more efficiently."},{"id":1,"name":"Agenda","content":"Speakers talked about open source and data analysis of large-scale software repositories like GitHub. The SE-MIX featured Open Source Live, a showcase of open source projects related to Microsoft.\r\n\r\n<strong>8:30-10:30\u00a0 First Session<\/strong>\r\n<ul>\r\n \t<li>Welcome (10 minutes)<\/li>\r\n \t<li>Judith Bishop, Microsoft. <em>Industrial Research and Open Source \u2013 Reasons and Results<\/em> (20 minutes) <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/06\/SE-MIX-Bishop.pptx\">slides<\/a><\/li>\r\n \t<li>Jeff McAffer, Microsoft. <em>GitHub Insight: Understanding Open Source<\/em> (20 minutes)<\/li>\r\n \t<li>Mei Nagappan, Rochester Institute of Technology. <em>Curating GitHub for Engineered Software Projects <\/em>(20 minutes) <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/06\/CuratingGithub_MSR_FS_SEMix.pdf\">slides<\/a><\/li>\r\n \t<li>Speed Dating between academics and Microsoft engineers (50 minutes)<\/li>\r\n<\/ul>\r\n<em>10:30-10:50\u00a0 Break<\/em>\r\n\r\n<strong>10:50-12:00\u00a0 Second Session<\/strong>\r\n<ul>\r\n \t<li>Vladimir Filkov, University of California, Davis. <em>How to analyze GitHub traces to ask important questions and get actionable answers?<\/em> (20 minutes)<\/li>\r\n \t<li>Cristina Manu &amp; Daniel Quirk, Microsoft. <em>Spot - A distributed system for source code analysis<\/em> (20 minutes)<\/li>\r\n \t<li>Laura Dabbish, Carnegie Mellon University. <em>The social life of software repositories:\u00a0What large scale software analysis can learn from small scale qualitative research<\/em> (20 minutes)<\/li>\r\n \t<li>Preparation for Group Brain Storming (10 minutes)<\/li>\r\n<\/ul>\r\n<em>12:00-13:20\u00a0 Lunch break \/ Group picture<\/em>\r\n\r\n<strong>13:20-15:30\u00a0 Third Session<\/strong>\r\n<ul>\r\n \t<li>Group Brain Storming (65-80 minutes)<\/li>\r\n \t<li>Wrap-up (5 minutes) followed by<\/li>\r\n \t<li>Open Source Live! Showcase (45-60 minutes)<\/li>\r\n<\/ul>"},{"id":2,"name":"Organizers","content":"[col-1-2]\r\n<ul>\r\n \t<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/abegel\/\" target=\"_blank\">Andrew Begel<\/a><\/li>\r\n \t<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/cbird\/\" target=\"_blank\">Christian Bird<\/a><\/li>\r\n \t<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jbishop\/\" target=\"_blank\">Judith Bishop<\/a><\/li>\r\n<\/ul>\r\n[\/col-1-2]\r\n\r\n[col-1-2_last]\r\n<ul>\r\n \t<li>Trevor Carnahan<\/li>\r\n \t<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/honzhang\/\" target=\"_blank\">Hongyu Zhang<\/a><\/li>\r\n \t<li><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/tzimmer\/\" target=\"_blank\">Thomas Zimmermann<\/a><\/li>\r\n<\/ul>\r\n[\/col-1-2_last]"}],"msr_startdate":"2016-07-15","msr_enddate":"","msr_event_time":"8:30 AM \u2013 3:30 PM","msr_location":"Redmond, WA, USA","msr_event_link":"","msr_event_recording_link":"","msr_startdate_formatted":"July 15, 2016","msr_register_text":"Watch now","msr_cta_link":"","msr_cta_text":"","msr_cta_bi_name":"","featured_image_thumbnail":null,"event_excerpt":"Software Engineering Mix (SE-MIX) provided a forum for our colleagues from academia to interact directly with Microsoft engineers. The program featured talks from academics: highlights of published research that is highly relevant for Microsoft and blue sky talks summarizing emerging research areas. In addition, practitioners gave presentations about theoretical and pragmatic engineering challenges they face, soliciting help from academia. A coffee round table setting was used to facilitate discussions. This session built on the success&hellip;","msr_research_lab":[],"related-researchers":[],"msr_impact_theme":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-opportunities":[],"related-publications":[],"related-videos":[],"related-posts":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/239534","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-event"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/239534\/revisions"}],"predecessor-version":[{"id":1147329,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event\/239534\/revisions\/1147329"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=239534"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=239534"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=239534"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=239534"},{"taxonomy":"msr-video-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-video-type?post=239534"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=239534"},{"taxonomy":"msr-program-audience","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-program-audience?post=239534"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=239534"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=239534"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}