The Data Science Summer School (DS3) is an intensive, eight-week hands-on introduction to data science for college students in the New York City area.
DS3 includes both coursework in data science and group research projects. The summer school is taught by leading scientists at Microsoft Research, and is held at the new Microsoft Research office in the heart of New York City.
Each student receives a $5,000 stipend for participating in the program, as well as a laptop.
As we are committed to increasing diversity in computer science, we strongly encourage women, minorities, and individuals with disabilities to apply.
To learn more about DS3, please look at the program details and the frequently asked questions.
June 11, 2019 to August 2, 2019 (8 weeks)
Mondays, Wednesdays & Thursdays: 10:30am – 6:30pm
Tuesdays: 10:30am – 8:30pm
Fridays: 10:30am – 5:00pm
Dinner provided on Tuesdays, and lunch provided on Thursdays
Number of Participants
We typically have capacity for eight students.
Upper-level undergraduate students attending college in the New York City area who are interested in attending computer science graduate school, and who would benefit from an intensive introduction to data science. We seek to increase diversity in computer science, and so we especially encourage women, minorities, individuals with disabilities, and students from smaller colleges to apply to the program.
This introduction to data science will cover tools and techniques for acquiring, cleaning, and utilizing real-world data for research purposes. In contrast to traditional course work, where one is often handed a prepackaged dataset obtained by a third party and prepared for a specific exercise, research projects often involve not only cleaning and preparing “messy” data, but often also acquiring that data oneself (e.g., through an API). The initial phase of these projects involves a good deal of exploratory analysis to gain a preliminary understanding of the dataset. Students will be introduced to scripting (on the command line and with Python and R) for these purposes, and will gain direct experience in acquiring and modeling data from online sources.
The course also serves as an introduction to problems in applied statistics and machine learning. We will cover the theory behind simple but effective methods for supervised and unsupervised learning. Emphasis will be on formulating real-world modeling and prediction tasks as optimization problems and comparing methods in terms of practical efficacy and scalability. Students will learn to fit and evaluate such models, with applications including spam filtering and recommendation systems.
All course material from the 2017 program is available on GitHub.
Students will work on an original research project in groups, led by Microsoft research scientists.
MSR NYC Data Science Summer School 2018
All course material from the 2018 program is available on GitHub.
- Week 1: Git(hub), command line, exploratory data analysis in R
- Week 2: Statistics and machine learning
- Week 3: Causality and experiments
- Week 4: Python, APIs, etc.
Dan Goldstein works at the intersection of behavioral economics and computer science. Prior to joining Microsoft, Dan was a Principal Research Scientist at Yahoo Research and a marketing professor at London Business School. He received his Ph.D. at The University of Chicago and has taught and researched at Columbia, Harvard, Stanford and Max Planck Institute in Germany, where he was awarded the Otto Hahn Medal in 1997. His academic writings have appeared in journals from Science to Psychological Review. Dan is a member of the Academic Advisory Board of the UK’s Behavioral Insights Team (aka Britain’s “nudge unit”). He was elected President of the Society for Judgment and Decision Making for the year 2015-2016.
Jake Hofman is a Senior Researcher at Microsoft Research in New York City, where his work in computational social science involves applications of statistics and machine learning to large-scale social data. Prior to joining Microsoft, he was a member of the Microeconomics and Social Systems group at Yahoo! Research. Jake is also an Adjunct Assistant Professor of Applied Mathematics at Columbia University, where he has designed and taught classes on a number of topics ranging from biological physics to applied machine learning. He holds a B.S. in Electrical Engineering from Boston University and a Ph.D. in Physics from Columbia University
Siddhartha Sen is a Researcher at Microsoft Research in New York City, and previously at the MSR Silicon Valley lab. He creates distributed systems that use novel data structures and algorithms to deliver unprecedented functionality or performance. Some of his data structures have been incorporated into undergraduate textbooks and curricula. Recently, he has generalized this approach to use contextual machine learning to optimize decisions in distributed systems infrastructure. Siddhartha received his BS degrees in computer science and mathematics and his MEng degree in computer science from MIT. From 2004-2007 he worked as a developer at Microsoft and built a network load balancer for Windows Server. He completed his PhD from Princeton University in 2013. Siddhartha received the first Google Fellowship in Fault-Tolerant Computing in 2009, the best student paper award at PODC 2012, and the best paper award at ASPLOS 2017.
Forough Poursabzi Sangdeh
Forough Poursabzi Sangdeh is a postdoctoral researcher at Microsoft Research in New York City. Forough’s research interests lie at the intersection of machine learning, human-computer interaction, and social sciences. She studies machine learning with humans in the loop and works on methods to foster human-machine collaboration. Forough recently completed her Ph.D. in Computer Science at the University of Colorado-Boulder and she received her undergraduate degree in computer engineering from the University of Tehran.
Many individuals have contributed to the Data Science Summer School in the past, notably Sharad Goel and Justin Rao, who co-founded the program.
The Data Science Summer School (DS3) is an intensive, eight-week hands-on introduction to data science for college students in the New York City area.
Brenda Fried is a senior at Brooklyn College majoring in computer science. She loves to tackle challenging problems and is enamored by the power of the command line. She’s been intrigued by the world of data science and machine learning ever since starting work at a computer vision startup in her sophomore year. She was excited to join DS3 to learn from an incredible group of dedicated researchers. When not coding, Brenda loves to swim, spray paint art and hike.
Harpreet Gaur is a rising senior majoring in Computer Systems Technology at CUNY City Tech, where she mentors and tutors students during the semester. Moreover, she works on software development based research projects and has been a CUNY Research Scholar and an Honors Scholar at City Tech in the past. Her passion for technology dates back to her gap year where she taught students computer science and its impact to under-served communities in India. After completing the DS3 program, she wants to work on existing societal predicaments that impact people on a large-scale using data science and machine learning. Harpreet is a dessert enthusiast and a film-maker on the side.
Adnan Hoq graduated as the Valedictorian from St. Joseph’s College with a double major in Mathematics & Computer Science. He has garnered experience in object-oriented programming and full-stack development under the tutelage of his mentor Dr. Callahan at NYU. Adnan created an automated tool for producing charts using R during his time as a software and data consultant at NYCDOHMH. Moreover, Adnan traveled to China to satiate his curiosity in Deep Learning when he was selected for Tsinghua University’s summer research program. Adnan possesses a unique proclivity for data science and machine learning research. He is currently trying his best to find an acceptable balance between industry and research in his career.
Emeka Samuel Mbazor is a rising senior at Lehman College, majoring in Computer Science and minoring in Data Science. Formerly a pre-medical student, he became interested in data and the stories contained within them after taking and enjoying a Biostatistics course. He hopes to further explore his interests and make a transition towards machine learning research in the coming year. Outside his academic and professional life, he also enjoys all things Star Wars.
Naomi graduated from St. Joseph’s College in Brooklyn in May 2019 with a double major in Mathematics and Computer Science. She discovered her passion for mathematics and computer science during her sophomore year. After that she worked to be accepted at a research program in which I modeled the dynamics of recidivism in the state of Arizona. She enjoys listening to the Beatles and Norm Macdonald jokes.
Cindy Muso is a rising junior at St. John’s University where she is pursuing a double major in Mathematics and Computer Science. Her curiosity and love for computer science began in high school after taking a programming class. Through the DS3 program at Microsoft, Cindy has developed an interest in machine learning and looks forward to working in this particular field in the near future. One of her many goals is to pursue a PhD in Computer Science and Mathematics. She is a strong advocate for diversity and inclusion and believes that research, no matter the scope or topic, can play a huge role in advancing society. This is why Cindy wishes to devote her time in conducting research after her undergrad. In addition to this, she likes to spend her time playing tennis, traveling, and having quality time with friends and family.
Etta Rapp is a junior at Stern College for Women (Yeshiva University), double majoring in Mathematics and Computer Science. She is passionate about both of these fields and the overlap between them, and was fascinated to learn about the many applications of math in computer science while at DS3. Outside of college, she enjoys reading, hiking, and spending time with her family. Etta is looking forward to attending the 2019 ACM Richard Tapia Celebration of Diversity in Computing Conference.
Roymil Terrero recently graduated from St. Joseph’s College, double majoring in Computer Science and Mathematics. He is passionate about machine learning, data analysis, and algorithms. Roymil joined Microsoft DS3 to enrich his knowledge of Machine Learning and Data analysis. He believes that today’s computational power should be used responsibly to extend human ability to learn and understand what could be impossible otherwise. He is intellectually curious about how things work and hopes to someday understand and modify the Linux kernel. In his spare time, he likes to cook, do 3D modeling, and play badminton with friends.
Renzhentaxi Baerde graduated from Adelphi University in May 2018 with a Bachelor’s degree in Computer Science. He dreams to be able to read a research paper and actually understand it. He also likes to downvote reddit posts and is facinated by procedural generation. Currently, he is sending out thousands of application, hoping to get a job before his family disowns him. He discovered his love for programming when he realized he could beat games by using a Cheat Engine. 1 out of 10 people find Taxi funny. 9 out of 10 people dont talk to Taxi.
Peter Farquharson is a rising senior at Lehman College majoring in Computer Science with a minor in Mathematics. He is passionate about working with Data Structures and Algorithms while incorporating tools from Data Science. He was thrilled to be a part of DS3 to gain an understanding of Data Science and working in a research setting with diverse individuals. He is an avid gamer and enjoys playing cricket and soccer in his spare time.
Akbar Mirza recently graduated from the City College of New York, double majoring in Computer Science and User Experience Design. He is excited about product design, UX, and software engineering. Akbar is an interdisciplinary thinker and loves to push his limits. In the past year, he’s taught a course on iOS Development, used data science to explore the MTA subway system, and launched an app with Quadrant 2 called MigraCam. Designed to help immigrants reached loved ones in emergency situations, Migracam is one of his best examples of using the power of civic tech to help others. Akbar is interested in expanding his knowledge in civic tech, user research, and software engineering and is open to new opportunities to grow and learn.
Brian Morte Hernandez
Brian Hernandez graduated from Hunter College in June of 2018 with a Bachelor’s degree in Computer Science. He has been absolutely enamored with Data Science and Machine Learning ever since he watched a neural network program overcome one of the hardest levels in his favorite video game. After completing a few courses and programs focused on data, he found his ultimate data-driven learning experience in DS3. Outside of technology, he loves dancing, singing, and doing handstands.
Phuong (Phoebe) Nguyen is a rising Senior at CUNY Baruch College, majoring in Computer Information System, specializing in Data Analytics. She found her passion for technology and education during her gap year when she filmed her first documentary about parenting methods in South East Asian countries. Her ambition is to work at the forefront of the technology with the ability to find innovative solutions to the world’s problem, serving as an advocate for underprivileged children and young women to get involved in Tech.
Sasha Paulovich will be graduating from Fordham University in December 2018 with a Bachelor’s degree in Computer Science. Originally a dancer, her interest in computer science stems from her desire to apply her creative tendencies to analytical challenges. For her, DS3 presented an opportunity to share the patterns and stories hidden in mountains of data. In addition to DS3, Sasha also participated in the Interactive Telecommunications Program Camp at NYU this past June, where she was able to explore the world of interactive/immersive technology. Inspired by her experiences this summer, Sasha hopes to pursue the intersection of interactive technology and data science, and discover ways in which she can communicate data-driven narratives.
Amanda Rodriguez is a senior at Lehman College, majoring in Mathematics and Computer Science. She plans to apply for PhD programs in Theoretical Computer Science during Fall 2018. In addition to her research interests, she loves to play tennis, hike, and travel. She spent her previous summer doing research in Colombia and traveling throughout Europe. After her time with DS3, she will participate in a service trip to Barranquitas, Puerto Rico to help those affected by Hurricane Maria through the CUNY Service Corps – Puerto Rico initiative.
Ayliana Teitelbaum is a rising Junior at Stern College for Women (Yeshiva University) studying Computer Science and Biology. She was introduced to computer science in high school, and knew she wanted to go into coding. However, she is also intrigued by biology, and hopes to combine both disciplines. Last summer, she worked in a program teaching elementary school students how to code while getting them excited about technology and software engineering. She enjoys using data to answer practical questions that come up in her and others’ daily lives, and hopes to continue to use technology to improve the lives of those around her.
Anandini Chawla is a rising sophomore at NYU studying Computer Science and Mathematics. After high school, she worked at a nonprofit in rural India which got her interested in civic tech, a field she hopes to explore and dig deeper into.
David Futran is a 5th year student in Macaulay Honors at Queens College, majoring in computer science with a minor in mathematics. Among his large range of interests, he loves to read, cook, and hike. He just spent a semester in Japan, which also deepened his interest in learning Japanese. He wanted to be part of the DS3 program in order to delve into Data Science, a field of study that has always intrigued him and he is considering pursuing in graduate school next year. He was also excited to be in a program run by Microsoft’s Data Science researchers and learn from them.
Rosemarie (Ro) Liriano
Rosemarie (Ro) Liriano is a rising Junior at CUNY Lehman College majoring in Computer Science with a minor in Sociology. Her interests range between social justice, artificial intelligence and machine learning, and public policy. She would like to work at the intersection of technology and social science in the future and for this reason she was excited to be accepted into the Microsoft DS3 program where she hopes to learn how to combine her love for technology and sociology in a way that could help impact change in the world.
Keri Mallari is am a rising junior at Lehman College majoring in Math and Computer Science. She applied to this program because she took a data-based class called the Future of New York City, and it was the best Computer Science class she’s taken in college. She wants to explore more data sets and learn more about data science.
Francois Mertil is a senior at New York City College of Technology, CUNY, is majoring in Applied Math focusing in Information Science.
Ilana Radinsky is a rising Junior at Stern College for Women (Yeshiva University) studying Computer Science and Math. She applied to the Microsoft Research Data Science Summer School because she believes in the power of data science – as an emerging field that bridges the gap between so many disciplines and industries, data science is a discipline with a huge potential for positive impact on mankind.
Rivka Schuster is a student at Touro College and a rising senior majoring in Computer Programming. She was thrilled to join the DS3 class because she enjoys learning and is intrigued by data. She’s looking forward to collaborating on a project with other students and Microsoft researchers.
Thoa Ta is a rising senior at St. John’s University, majoring in Computer Science with a minor in Social Justice as part of being an Ozanam Scholar at St. John’s. Her passion is to leverage technology for social good and environmental sustainability. After a long time struggling to find her place in Computer Science, she finally feels at home in DS3, where she learns powerful tools to make an impact through insight discovery. On a personal level, she comes from Vietnam and sees spiritual enlightenment as her lifelong pursuit.
Fatima Chebchoub is a rising senior at New York City College of Technology. She is a Computer Systems Technology major and interested in software engineering. Fatima was born and raise in Morocco, and she moved to the USA about five years ago. In her first semester, Fatima has gone from a student struggling with a new foreign language. to get the first place for writing the best essay in New York from the Literary Arts Festival New York City College of Technology. Fatima speaks four other foreign languages, She enjoys writing and she has a dream that one day she will write a book. At the moment, she holds a part-time job as a software developer at her school.
Kaciny Calixte graduated from SUNY Old Westbury in May of 2016 with a Bachelor’s degree in Computer and Information Science. She discovered her love for coding mid-Sophomore year and never looked back. Her current interests include web development, data science and machine learning. Her long-term goal is to further her educational journey by attending graduate school. During the Fall 2016 semester, she is set to complete an internship focused on bioinformatics, plant genomics, and machine learning at a Department of Energy national laboratory.
Jacqueline Curran graduated in May 2016 from Manhattan College with a degree in Economics and Business Analytics. During her time at Manhattan, she was a member of Beta Gamma Sigma, Alpha Iota Delta, and Manhattan College’s Federal Reserve Challenge Team. She was the recipient of the Richard J. Carey Medal for Economics, which is awarded annually to the top student in the department. Following the DS3 program, she will begin her career at KPMG in their Global Mobility Services Division.
Louise Lai is a rising Junior at NYU Stern School of Business, double majoring in Business & Political Economy and Computer Science. She has many interests – startups, data science and politics, but is primarily fascinated with artificial intelligence. She hopes that technology will change the world to be a happier and more equal place, and aspires to work to make that happen in the future. She is an avid corgi fan, although she owns no pets because she lives in between Malaysia and Australia.
Abraham Neuwirth is an undergraduate student at Touro College where he is majoring in Computer Science and minoring in Mathematics. He is passionate about fields where these two disciplines intersect such as data science and machine learning. His favorite R package is dplyr and his operator of choice is the pipe. When he isn’t sitting in front of a computer screen, he daydreams about working out.
Jai Punjwani is a rising junior at Adelphi University studying computer science. He loves programming in Java (his “native” tongue) and has even developed an Android app that allows students to find each other and study at his university. In the future, he wishes to join the field of cryptography so that he can strengthen security in a world with more data than ever. Aside from coding, Jai loves reading the Game of Thrones series and also enjoys dancing on his school’s Bhangra team.
Erica Ram recently graduated with a Bachelor of Science degree in Computer Science with a Mathematics minor from Adelphi University. She has been writing code since high school, and finds working with data using code very interesting because of the variety of possibilities and applications. She plans to attend graduate school for Computer Science in Fall of 2017.
Marieme Toure is a rising senior at CUNY New York City College of Technology, starting to look for roles in the financial industry. After obtaining her high school diploma in Senegal, Marieme came back to the US for college education. She is majoring in Applied Mathematics with a concentration in Information Science and minoring in Computer Science. As an undergrad, the invisible forces that shape our world fascinate her. Why does one company succeed and another fail? Is it possible to predict which idea will be the next big thing? Marieme enjoyed predicting yellow caps drivers’ efficiency this summer at Microsoft Research. She is planning to go to graduate school and get a Master degree in Quantitative Finance and work in the most prestigious firms. As a big sport fan, Marieme is supporting the American team for the Olympics games.
Eiman Ahmed is a rising Sophomore at Pace University where she is majoring in Computer Science and minoring in Mathetmatics and in Statistics. She first fell in love with coding when she took her first Java course in high school and now works as an app developer at her university’s technology consultancy. In her free time, she enjoys watching T.V. shows like Criminal Minds and going on long walks with her friends.
Glenda Ascencio, an undergraduate student with an entrepreneur, mathematical, and software development skill living in New York City. She’s majoring in mathematics and minoring in computer science at St. Joseph’s College. She loves challenging her mind by finding solutions to different programming problems. In her spare time, one of the things she loves to do is to program with Java, Python, HTML, CSS, and R. One of her fervent desires is to pursue a career in data analytics so she can inform and educate citizenry of the USA/Honduras because she wants to stop violence, poverty, and ignorance.
Shannon Evans was born in St. Lucia and is currently an international student at New York City College of Technology. An Applied Mathematics major with a concentration in finance, his ultimate goal is to become a financial analyst. He lives for the opportunity to become a world changer; to design financial systems that improves our lives. He also loves sports, particularly soccer, table tennis, and cricket.
Thomas Patino is a senior at Skidmore College majoring in Business Management. Thomas has worked in various projects involving neighborhood improvement and urban planning. With his experience at Microsoft, Thomas hopes to bridge the gap between the technology industry and Latinos. Thomas anticipates going to graduate school in the computer science field to utilize data and find creative solutions to community development.
Nikki Hanson, known to friends as Riley, is a rising senior at Queens College interested in software engineering, gaming, Japanese and increasing diversity in tech. A bit of a latecomer to the game, they were writing code before they knew what it was, and that path inevitably led them to return to school for a Bachelor of Science in Computer Science and a Bachelor of Arts in Math. Their favorite subject by far is Computational Theory, and they hope to get into Cryptography next.
Anastassiya Neznanova is a current honor transfer student at Queens College. She recently completed her undergraduate research in mathematics and published her paper in the International Journal of Undergraduate Research and Creative Activities. Anastassiya aims to pursue her BS in Computer Science and sees her career in entrepreneurship.
Riva Tropp is from Teaneck, New Jersey. A Computer Science minor at Yeshiva University, she enjoys the opportunities for exploration Data Science provides. Co-president of her university’s computer science club, Riva occupies herself tutoring, scripting, and planning club activities. Her favorite subway station is at 14th street and Eighth avenue.
Steven Vasquez was born and raised in the Bronx where he currently resides. He attends Manhattan College, and is studying Computer Science and minoring in mathematics. Steven is a brother of the fraternity Delta Kappa Epsilon and love sports as much he loves solving problems. He is excited to learn and grow this summer.
Jahaziel Guzman (Brooklyn College) was born in San Salvador, El Salvador and has been living in Brooklyn since 1996. He has had an interest in music and visual art since he was a child. In his freshman year of college, he developed an interest for math and programming, and also had the opportunity to work in a biology lab at Brooklyn College doing bioinformatics work.
Donald Hanson II
Donald Hanson II is from Laurelton, New York. He is a computer science major with a minor in music at Adelphi University. Right now, he is an IT student worker at his school and is currently in the process of creating his own website using HTML and CSS. People usually say that he is a guy who likes to stay positive and motivated, and he thinks that describes him very well; he always try to make the best of every situation.
Afzal Hossain is a junior in New York City College of Technology. He has an associate degree in Computer science and now he is studying Applied Mathematics in Finance. His goal is to study data science in graduate school.
Khanna Pugach is a junior at Baruch College majoring in computer information systems with a math minor. Besides, she is an international student from Russia and this is her third year in the US. She likes Nora Ephron’s books and tennis.
Franky Rodriguez was born in Mexico, grew up in Miami, and now is doing a double major in mathematics and computer information technology at St. Joseph’s College, Brooklyn. He loves challenging his mind and finding solutions and applications to many different problems. He has worked on various applications including writing a Java program that recognizes melodies by converting musical notes into relative seminotes and durations. In his spare time he indulges in playing and composing music.
Derek Sanz was born in 1993 to Dominican parents in Brooklyn, New York. It was 2011 when he entered Brooklyn College, took the introductory computer science course, and entered a non-stop frenzy of hard work and love for learning. 2013 was a year of fun: 9 computer science courses, a one-month study abroad trip to China, one internship and one fellowship.
Briana Vecchione is a rising CS Junior at Pace University. Though relatively new to the field, Briana is a member of both the Pforzheimer Honors College and the Seidenberg Creative Lab on campus. Her background consists mostly of web design, game design, and app development. In addition to DS3, Briana is also in the process of developing educational applications for international implementation in Senegal. She anticipates getting her PhD and working to utilize technology in developing regions.
Siobhan Wilmot-Dunbar is a Junior at Pace University studying computer science and minoring in digital design. She is also a part of Seidenberg Creative Labs, a web development and research group at Pace, and has done coding in Java, HTML, and CSS. Besides that, Siobhan plays piano, acoustic guitar, and steel drums, and has high hopes of one day combining her ability in computing with her love for music and visual arts.
2014 Microsoft Research Data Science Summer School (DS3)
2015 Microsoft Research Data Science Summer School (DS3)
2016 Microsoft Research Data Science Summer School (DS3)
2017 Microsoft Research Data Science Summer School (DS3)
2018 Microsoft Research Data Science Summer School (DS3)
2019 Microsoft Research Data Science Summer School (DS3)
Exploring the Reliability of the NYC Subway System
Akbar Mirza, Brian Hernandez, Amanda Rodriguez, Renzhentaxi Baerde, Phoebe Nguyen, Peter Farquharson, Ayliana Teitelbaum, Sasha Paulovich
The New York City subway is the largest rapid transit system in the world, serving approximately 5.5 million riders each day. Recently there has been a growing concern over the state of the subway system due to aging equipment as reflected in system-wide metrics such as “on-time percentage”, or how often trains run according to schedule. While these metrics provide some insight into the performance of the subway system, they fail to capture how riders experience the system. In this project we use recently released countdown clock data that logs where each train is reported to be at each minute of the day to gain a better understanding of how riders experience the subway system. We examine rider wait times and trip times, considering not just average but also worst-case performance of the system. We also compare the subway to above ground travel, investigate how changes to the system affect rider options, and look at how commutes vary across demographic groups. We find that the subway is typically quite reliable, but that averages can be misleading: variance in subway performance can account for up to a 50% difference between average and worst-case travel times. We also find a correlation between income and commute times and that small changes to the system (e.g., adding or removing stops or lines) can have large effects on riders’ options.
Watch the talk for more details. Source code for this project is available on GitHub.
Student Trajectories and School Choice in the NYC Public School System
Keri Mallari, David Futran, Francois Mertil, Ilana Radinsky, Anandini Chawla, Rivka Schuster, Ro Liriano, Thoa Ta
New York City serves over one million public school students each year, yet relatively little is understood in terms of how students progress through the school system. In this talk we use individual-level student data over a ten year time period to explore how early test performance correlates with later success, to describe and predict which students leave the public school system, and to examine effects of the recently implemented high school choice system.
Watch the talk or read the paper for more details. Source code for this project is available on GitHub.
Airbrb: Predicting Loyalty
Louise Lai, Kaciny Calixte, Jacqueline Curran, and Erica Ram
The advent of the sharing economy has redefined the way firms do business. Airbnb has led this revolution. With a valuation of $25 billion, it has become the world’s third most valued startup and has more rooms than the world’s largest hotel chain. Historically, customer loyalty was based on experience with a particular firm, but now it is based on experiences with many individuals. We chose to use the Inside Airbnb dataset to further investigate the evolving idea of loyalty. Airbnb has both hosts and guests as customers. Host loyalty is defined as a host renting consistently, and guest loyalty as guests returning frequently. We used decision trees to look at both the loyalty of the hosts and the guests. No matter the industry, market experts stand by measures of recency frequency to predict loyalty. However, our model is able to improve upon this idea with added features, such as review text and amenities. The end result is a model that successfully predicts the return rate of hosts and guests to Airbnb with a high level of accuracy.
Watch the talk or read the paper for more details. Source code for this project is available on GitHub.
Fare Share: Flow and Efficiency in NYC’s Taxi System
Jai Punjwani, Abraham Neuwirth, Marieme Toure, and Fatima Chebchoub
New York City is home to millions of people who rely on its robust transportation system. The taxi system plays a critical role in helping people navigate the city. With access to information about every single trip that occurred in a yellow taxi in 2013, we were able to reveal patterns in how people move throughout the city. We also analyzed driver efficiency, showing that there is a substantial skill involved in driving a taxi, with some drivers consistently earning up to 30% more than average. Finally, we used the highly granular nature of this data to identify the locations of redundant trips, and showed that a simple carpooling strategy could reduce the amount of money spent on taxis and the number of taxi trips taken by upwards of 7%.
Watch the talk or read the paper for more details. Source code for this project is available on GitHub as well as an interactive map of travel patterns across neighborhoods.
The Cost of Public School
Thomas Patino, Anastassiya Neznanova, Nikki Hanson, and Glenda Ascencio
New York City is home to the largest public school system in the country, which contains some of the best and worst schools in the state. Given this diversity, which often occurs over small geographic regions, there is extremely high demand for homes in the best public schools in the city. We investigate and quantify this demand by analyzing over 10,000 home sales in different school zones across the city and reveal the implicit cost of purchasing a home zoned for each elementary school in the city.
Watch the talk for more details. Source code for this project is available on GitHub as well as an interactive map of school zone prices.
The Ins and Outs of the New York City Subway System
Eiman Ahmed, Shannon Evans, Riva Tropp, and Steven Vazquez
Every day, the population of New York regions shrinks and swells as people travel into and around the city. With six million daily trips, the subway system is one of the main conduits for these travelers, but relatively little is known about the flow of subway passengers throughout the day. Using MTA’s public datasets, our team mapped the paths commuters take, and consequentially, the substantial changes to the population in the city’s many regions.
Watch the talk for more details. Source code for this project is available on GitHub.
Briana Vecchione, Franky Rodriguez, Donald Hanson II, Jahaziel Guzman
Bike sharing is an internationally implemented system for reducing public transit congestion, minimizing carbon emissions, and encouraging a healthy lifestyle. Since New York City’s launch of the CitiBike program in May 2013, however, various issues have arisen due to overcrowding and general flow. In response to these issues, CitiBike employees redistribute bicycles by vehicle throughout the New York City area. During the past year, over 500,000 bikes have been redistributed in this fashion. This solution is financially taxing, environmentally and economically inefficient, and often suffers from timing issues. What if CitiBike instead used its clientele to redistribute bicycles? In this talk, we describe the data analysis that we conducted in hopes of creating an incentive and rerouting scheme for riders to self-balance the system. We anticipate that we can decrease vehicle transportations by offering financial incentives to take bikes from relatively full stations and return bikes to relatively empty stations (with rerouting advice provided via an app).
For more details, please see our paper and talk.
An Empirical Analysis of Stop-and-Frisk in New York City
Md.Afzal Hossain, Khanna Pugach, Derek Sanz, Siobhan Wilmot-Dunbar
Between 2006 and 2012, the New York City Police Department made roughly four million stops as part of the city’s controversial stop-and-frisk program. We empirically study two aspects of the program by analyzing a large public dataset released by the police department that records all documented stops in the city. First, by comparing to block-level census data, we estimate stop rates for various demographic subgroups of the population. We find that the average annual number of stops of young, black men exceeds the number of such individuals in the general population. This disparity is even more pronounced when we account for geography, with the number of stops of young black men in certain neighborhoods several times greater than those in the local population. Second, we statistically analyze the reasons recorded in our data that officers state for making each stop (e.g., “furtive movements” or “sights and sounds of criminal activity”). By comparing which stated reasons best predict whether a suspect is ultimately arrested, we develop simple heuristics to aid officers in making better stop decisions.
For more details, please see our paper and talk.
Frequently Asked Questions
What types of students are you looking for?
One of the aims of DS3 is to help increase diversity, broadly defined, in computer science graduate programs. We are looking for college students in the New York City area who can help us meet this goal.
Are there specific requirements for participating in the program?
Applicants must be currently enrolled in an undergradate program in the New York City area. Other than that there are no specific course prerequisites, but a familiarity with computer programming and/or statistics is helpful.
Is housing provided?
No. The program is intended for students who already reside in the New York City area. However, a $5,000 stipend is provided.
Where is the program held?
In the New York City office of Microsoft Research: 641 Avenue of the Americas.
Can I receive college credit for participating?
We do not offer college credit, but you may be able to receive credit through your home institution.
Do I need to have my own computer?
No. We will provide a laptop computer for you to use during the program, and you will be able to keep it at the end of the summer.
I will be graduating this coming spring. Am I still eligible to participate in the program?
Yes. Graduating seniors are welcome to apply.
I am a graduate student. Am I eligible to participate in the program?
No. The program is for undergraduate students.
I am not a U.S. citizen. Am I still eligible to participate in the program?
Yes. The program is not restricted to U.S. citizens. However, you are responsible for obtaining appropriate approvals from your college regarding any immigration requirements. As mentioned above, you should be currently enrolled in an undergradate program in the New York City area.
Who can I contact for further information?
You can reach us by email at firstname.lastname@example.org.
We are no longer accepting applications for Summer 2019. Decisions for those who applied will be announced via email in early May.
If you have any questions about the Data Science Summer School, please see frequently asked questions.