{"id":283292,"date":"2016-08-26T09:00:31","date_gmt":"2016-08-26T16:00:31","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=283292"},"modified":"2016-08-29T13:36:36","modified_gmt":"2016-08-29T20:36:36","slug":"summer-school-data-science-research-trigger-real-world-changes","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/summer-school-data-science-research-trigger-real-world-changes\/","title":{"rendered":"Summer school data science research could trigger real world changes"},"content":{"rendered":"<p><em>By John Kaiser, Writer, Microsoft Research<\/em><\/p>\n<p>Microsoft Research hosted its third annual <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/ds3.research.microsoft.com\/\" target=\"_blank\">Data Science Summer School<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> in New York City as a diverse group of undergraduate students deployed some of the latest data crunching techniques on millions of rows of anonymized data in an effort to uncover useful information.<\/p>\n<p>\u201cWe\u2019re really hoping to give them a flavor of solving a research problem that hasn&#8217;t yet been solved,\u201d said <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jmh\/\">Jake Hofman<\/a>, one of several Microsoft Research instructors leading the intensive eight-week hands-on course that concluded in August. <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/github.com\/msr-ds3\/coursework\" target=\"_blank\">Coursework<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0for the program is freely available on Github.<\/p>\n<h2>Data points to tweaking incentives at Airbnb<\/h2>\n<div id=\"attachment_283790\" style=\"width: 640px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-283790\" class=\"wp-image-283790\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/08\/AirBnbMentors-1024x682.jpg\" alt=\"AirBnb\" width=\"630\" height=\"420\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/08\/AirBnbMentors-1024x682.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/08\/AirBnbMentors-300x200.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/08\/AirBnbMentors-768x511.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/08\/AirBnbMentors.jpg 1600w\" sizes=\"auto, (max-width: 630px) 100vw, 630px\" \/><p id=\"caption-attachment-283790\" class=\"wp-caption-text\">AirBnB team left to right: Shawndra Hill (MSR, mentor), Chris Riederer (Columbia, teaching assistant\/mentor), Erica Ram (student), Louise Lai (student), Jacqueline Curran (student), Kaciny Calixte (student), Fernando Diaz (MSR, mentor), Amit Sharma (MSR, mentor)<\/p><\/div>\n<p>This year marked the first time that student-led research relied on machine learning algorithms to predict actual outcomes. In a project called \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/ds3.research.microsoft.com\/doc\/airbnb.pdf\" target=\"_blank\">Airbnb: Predicting Loyalty<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,\u201d the students tapped decision tree learning techniques \u2014 \u201cusing decision trees to find patterns in the given data to predict on unseen data.\u201d Most importantly, they were able to pinpoint how the company might tweak specific factors to encourage guests to book another stay or incentivize hosts to open up their home another time.<\/p>\n<p>Students looked for patterns indicating a higher or lower probability of being a repeat customer.<\/p>\n<p>\u201cHow does host loyalty interplay with guest loyalty?\u201d asked summer school student Louise Lai, in describing one of the primary areas of focus for the Airbnb study group. \u201cWe\u2019re looking at that interplay as something very new and very distinct for the sharing economy.\u201d<\/p>\n<p>For the Airbnb student project, Lai was joined by Kaciny Calixte, Jacqueline Curran and Erica Ram.<\/p>\n<p>Explaining how \u201cpredictive models show that reviews and interaction between hosts and guests is of great importance,\u201d the study concluded that \u201cAirbnb could potentially boost return-rates of first time guests by providing them with incentives to stay at highly-rated properties.\u201d<\/p>\n<p>The project relied on two datasets collected by <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/insideairbnb.com\/about.html\" target=\"_blank\">InsideAirbnb<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, which describes itself as an \u201cindependent, non-commercial set of tools and data\u201d that allows anyone to \u201cexplore how Airbnb is really being used in cities around the world.\u201d<\/p>\n<p>In another sign of the maturing field of data science, this year marked the first time that student projects used pre-existing datasets without the need for modification.<\/p>\n<p>\u201cIt does raise the bar for the types of questions that get asked. We\u2019re seeing the tools improve, we\u2019re seeing more and more interesting datasets out there.<\/p>\n<p>And certainly Microsoft Azure\u2019s point and click graphical interface makes it seem easy \u2014 if you know what to look for.<\/p>\n<p>\u201cWhat were&#8217; trying to train our students on is more around what questions to ask and how to answer them,\u201d Hofman added.<\/p>\n<h2>Taxi data points to carpooling to push to counter redundant trips<\/h2>\n<div id=\"attachment_283796\" style=\"width: 641px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-283796\" class=\"wp-image-283796\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/08\/FareShareMentors-1024x682.jpg\" alt=\"FareShare\" width=\"631\" height=\"420\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/08\/FareShareMentors-1024x682.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/08\/FareShareMentors-300x200.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/08\/FareShareMentors-768x511.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/08\/FareShareMentors.jpg 1600w\" sizes=\"auto, (max-width: 631px) 100vw, 631px\" \/><p id=\"caption-attachment-283796\" class=\"wp-caption-text\">Fare Share team left to right: Chris Riederer (Columbia, teaching assistant\/mentor), Abraham Neuwirth (student), Jai Punjwani (student), Fatima Chebchoub (student), Marieme Toure (student), Ashton Anderson (MSR, mentor), Sid Sen (MSR, mentor), Jake Hofman (MSR, mentor)<\/p><\/div>\n<p>The other student project, \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/ds3.research.microsoft.com\/doc\/nyctaxi.pdf\" target=\"_blank\">Fare Share: Flow and Efficiency in NYC&#8217;s Taxi System<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>,\u201d tapped into what\u2019s officially known as the \u201c2013 Yellow Taxi Driver Set,\u201d which contains anonymized driver IDs, trip time, distance, point of origin, destination and other information.<\/p>\n<p>In the group\u2019s final presentation, summer school student Jai Punjwani described the dataset this way: \u201cImagine if you have info on every single cab in New York City and you able to see where every single cab was going \u2014 who they were picking up, who they were dropping off, what they were doing afterward.\u201d<\/p>\n<p>Punjwani is now entering his junior year in computer science at Adelphi University, where he\u2019s developed an Android app that enables students \u201cto find each other and study at his university.\u201d For the taxi data student project, Punjwani was joined by Abraham Neuwirth, Marieme Toure and Fatima Chebchoub.<\/p>\n<p>The research project focused on a single month of data, which included more than 13 million rides, for an average of 420,000 trips per day, driven by over 32,000 different drivers.<\/p>\n<p>Unlike a similar project in 2009 that yielded largely inconclusive results, this year\u2019s study zeroed in on longer trips to specific destinations, revealing that large numbers of taxis ferried just a single passenger on various popular commute and transit routes. The study notes that on \u201cweekday mornings around 7 a.m., there are roughly 25 redundant trips from Port Authority to Rockefeller Center that take place every five minutes for the duration of rush hour.\u201d<\/p>\n<p>The students concluded that a \u201ctaxi stand policy requiring people to wait no more than five minutes to carpool with another rider at these locations could improve the system by upwards of 5 percent, eliminating more than 650,000 trips. That translates into a potential savings to consumers of more than $8.5 million.\u201d<\/p>\n<p>It\u2019s a good example of how data science can shine a light on efficiencies that would otherwise go unnoticed, and it shows how research could lead spur new policies.<\/p>\n<p>\u201cYou could really improve the efficiency of the taxi system that could happen at almost zero cost,\u201d Hofman noted.<\/p>\n<p>There\u2019s a reason it\u2019s called \u201cbig data.\u201d For their projects, students were faced with the tricky task of culling through millions of rows of data, discarding anomalies like multiple Airbnb listings or faulty geolocation taxi journey data such as an erroneous trip to Antarctica.<\/p>\n<h2>Microsoft Research Data Science Summer School<\/h2>\n<p>The program was launched in 2014 as part of a commitment to boost the diversity in computer science, encouraging \u201capplications from women, minorities, people with disabilities and students from resource-limited colleges.\u201d This year\u2019s class included a woman who immigrated from Senegal and another who moved from Morocco.<\/p>\n<p>In choosing applicants from more than 100 entries, Microsoft looks for candidates who have demonstrated a degree of passion around computer science from their undergraduate coursework and related activities.<\/p>\n<p>Projects from earlier years drew on data from New York\u2019s public school system, subway and fleet of shared bicycles, as well as stats compiled from ongoing police practices.<\/p>\n<h2>Summer 2016 Projects<\/h2>\n<p><strong>Airbnb: Predicting Loyalty<\/strong><\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/video\/data-science-summer-school-2016-airbrb-predicting-loyalty\/\" target=\"_blank\">Watch the talk<\/a>\u00a0or\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/ds3.research.microsoft.com\/doc\/airbnb.pdf\" target=\"_blank\">read the paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0for more details. Source code is available\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/github.com\/msr-ds3\/airbnb\" target=\"_blank\">on GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<p><strong>Fare Share: Flow and Efficiency in NYC&#8217;s Taxi System<\/strong><\/p>\n<p><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/video\/data-science-summer-school-2016-fare-share-flow-and-efficiency-in-nycs-taxi-system\/\" target=\"_blank\">Watch the talk<\/a>\u00a0or\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/ds3.research.microsoft.com\/doc\/nyctaxi.pdf\" target=\"_blank\">read the paper<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0for more details. Source code is available\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/github.com\/msr-ds3\/nyctaxi\" target=\"_blank\">on GitHub<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0as well as an\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"http:\/\/bit.ly\/nyc_taxi\" target=\"_blank\">interactive map of travel patterns across neighborhoods<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By John Kaiser, Writer, Microsoft Research Microsoft Research hosted its third annual Data Science Summer School in New York City as a diverse group of undergraduate students deployed some of the latest data crunching techniques on millions of rows of anonymized data in an effort to uncover useful information. \u201cWe\u2019re really hoping to give them [&hellip;]<\/p>\n","protected":false},"author":39507,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[194466,194453,194475,194455],"tags":[210794,210779,210803,210797,210782,195266,210776,210806,210791,193659,210788,210785,210800],"research-area":[13561,13556,13563],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-283292","post","type-post","status-publish","format-standard","hentry","category-algorithms","category-data-science","category-database-data-analytics-platforms","category-machine-learning","tag-2013-yellow-taxi-driver-set","tag-airbnb","tag-airbnb-predicting-loyalty","tag-carpool","tag-customer-loyalty","tag-data-science-summer-school","tag-decision-tree","tag-fare-share-flow-and-efficiency-in-nycs-taxi-system","tag-insideairbnb","tag-microsoft-azure","tag-predictive-models","tag-sharing-economy","tag-taxi","msr-research-area-algorithms","msr-research-area-artificial-intelligence","msr-research-area-data-platform-analytics","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[204065],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"August 26, 2016","formattedExcerpt":"By John Kaiser, Writer, Microsoft Research Microsoft Research hosted its third annual Data Science Summer School in New York City as a diverse group of undergraduate students deployed some of the latest data crunching techniques on millions of rows of anonymized data in an effort&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/283292","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=283292"}],"version-history":[{"count":7,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/283292\/revisions"}],"predecessor-version":[{"id":284654,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/283292\/revisions\/284654"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=283292"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=283292"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=283292"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=283292"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=283292"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=283292"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=283292"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=283292"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=283292"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=283292"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=283292"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}