The Data Science Summer School (DS3) is an intensive, eight-week hands-on introduction to data science for college students in the New York City area. As we are committed to increasing diversity in computer science, we strongly encourage women, minorities, and individuals with disabilities to apply.
Each student receives a $5,000 stipend for participating in the program, as well as a laptop.
DS3 includes both coursework in data science and group research projects. The summer school is taught by leading scientists at Microsoft Research, and is held at the new Microsoft Research office in the heart of New York City.
June 11, 2018 to August 3, 2018 (8 weeks)
Mondays, Wednesdays & Thursdays: 10:30am – 6:30pm
Tuesdays: 10:30am – 8:30pm
Fridays: 10:30am – 5:00pm
Dinner provided on Tuesdays, and lunch provided on Thursdays
Number of Participants
We typically have capacity for eight students.
Upper-level undergraduate students attending college in the New York City area who are interested in attending computer science graduate school, and who would benefit from an intensive introduction to data science. We seek to increase diversity in computer science, and so we especially encourage women, minorities, individuals with disabilities, and students from smaller colleges to apply to the program.
This introduction to data science will cover tools and techniques for acquiring, cleaning, and utilizing real-world data for research purposes. In contrast to traditional course work, where one is often handed a prepackaged dataset obtained by a third party and prepared for a specific exercise, research projects often involve not only cleaning and preparing “messy” data, but often also acquiring that data oneself (e.g., through an API). The initial phase of these projects involves a good deal of exploratory analysis to gain a preliminary understanding of the dataset. Students will be introduced to scripting (on the command line and with Python and R) for these purposes, and will gain direct experience in acquiring and modeling data from online sources.
The course also serves as an introduction to problems in applied statistics and machine learning. We will cover the theory behind simple but effective methods for supervised and unsupervised learning. Emphasis will be on formulating real-world modeling and prediction tasks as optimization problems and comparing methods in terms of practical efficacy and scalability. Students will learn to fit and evaluate such models, with applications including spam filtering and recommendation systems.
All course material from the 2017 program is available on GitHub.
Students will work on an original research project in groups, led by Microsoft research scientists.
MSR NYC Data Science Summer School 2018
All course material from the 2017 program is available on GitHub.
- Week 1: Git(hub), command line, exploratory data analysis in R
- Week 2: Statistics and machine learning
- Week 3: Causality and experiments
- Week 4: Python, APIs, etc.
Matt Goldman is an empirical microeconomist and a Researcher in Microsoft’s Chief Economist’s Office. His research interests include Digital Markets, Behavioral Economics, and Theoretical Econometrics. At Microsoft, he works on the economics of sponsored search auctions, hardware pricing and using machine learning and data mining tools to automate economic models of causal inference as part of Microsoft Research’s Project Alice Expedition.
Dan Goldstein works at the intersection of behavioral economics and computer science. Prior to joining Microsoft, Dan was a Principal Research Scientist at Yahoo Research and a marketing professor at London Business School. He received his Ph.D. at The University of Chicago and has taught and researched at Columbia, Harvard, Stanford and Max Planck Institute in Germany, where he was awarded the Otto Hahn Medal in 1997. His academic writings have appeared in journals from Science to Psychological Review. Dan is a member of the Academic Advisory Board of the UK’s Behavioral Insights Team (aka Britain’s “nudge unit”). He was elected President of the Society for Judgment and Decision Making for the year 2015-2016.
Jake Hofman is a Senior Researcher at Microsoft Research in New York City, where his work in computational social science involves applications of statistics and machine learning to large-scale social data. Prior to joining Microsoft, he was a member of the Microeconomics and Social Systems group at Yahoo! Research. Jake is also an Adjunct Assistant Professor of Applied Mathematics at Columbia University, where he has designed and taught classes on a number of topics ranging from biological physics to applied machine learning. He holds a B.S. in Electrical Engineering from Boston University and a Ph.D. in Physics from Columbia University.
Jacob LaRiviere is an economist at Microsoft in Preston McAfee’s group in the Office of the Chief Economist. His main research interests are Industrial Organization, Environmental & Public Economics, and Behavioral Economics. He uses applied theory to inform microeconometric and experimental empirical techniques. Jacob graduated from UC San Diego with a PhD in Economics and UC Berkeley with a BA in Economics. He is also an affiliate faculty in the econ department at University of Washington and an adjunct assistant professor of economics at University of Tennessee, where he is also a Fellow for Energy and Environmental Policy at the Baker Center for Public Policy.
Siddhartha Sen is a Researcher at Microsoft Research in New York City, and previously at the MSR Silicon Valley lab. He creates distributed systems that use novel data structures and algorithms to deliver unprecedented functionality or performance. Some of his data structures have been incorporated into undergraduate textbooks and curricula. Recently, he has generalized this approach to use contextual machine learning to optimize decisions in distributed systems infrastructure. Siddhartha received his BS degrees in computer science and mathematics and his MEng degree in computer science from MIT. From 2004-2007 he worked as a developer at Microsoft and built a network load balancer for Windows Server. He completed his PhD from Princeton University in 2013. Siddhartha received the first Google Fellowship in Fault-Tolerant Computing in 2009, the best student paper award at PODC 2012, and the best paper award at ASPLOS 2017.
William Cai is a Research Assistant at Microsoft Research in New York City, where he works with the computational social science group. He is interested in applying techniques from machine learning and computer science to derive insights into political and social data. He holds a B.S. in Computer Science from Yale University.
The Data Science Summer School (DS3) is an intensive, eight-week hands-on introduction to data science for college students in the New York City area.
Anandini Chawla is a rising sophomore at NYU studying Computer Science and Mathematics. After high school, she worked at a nonprofit in rural India which got her interested in civic tech, a field she hopes to explore and dig deeper into.
David Futran is a 5th year student in Macaulay Honors at Queens College, majoring in computer science with a minor in mathematics. Among his large range of interests, he loves to read, cook, and hike. He just spent a semester in Japan, which also deepened his interest in learning Japanese. He wanted to be part of the DS3 program in order to delve into Data Science, a field of study that has always intrigued him and he is considering pursuing in graduate school next year. He was also excited to be in a program run by Microsoft’s Data Science researchers and learn from them.
Rosemarie (Ro) Liriano
Rosemarie (Ro) Liriano is a rising Junior at CUNY Lehman College majoring in Computer Science with a minor in Sociology. Her interests range between social justice, artificial intelligence and machine learning, and public policy. She would like to work at the intersection of technology and social science in the future and for this reason she was excited to be accepted into the Microsoft DS3 program where she hopes to learn how to combine her love for technology and sociology in a way that could help impact change in the world.
Keri Mallari is am a rising junior at Lehman College majoring in Math and Computer Science. She applied to this program because she took a data-based class called the Future of New York City, and it was the best Computer Science class she’s taken in college. She wants to explore more data sets and learn more about data science.
Francois Mertil is a senior at New York City College of Technology, CUNY, is majoring in Applied Math focusing in Information Science.
Ilana Radinsky is a rising Junior at Stern College for Women (Yeshiva University) studying Computer Science and Math. She applied to the Microsoft Research Data Science Summer School because she believes in the power of data science – as an emerging field that bridges the gap between so many disciplines and industries, data science is a discipline with a huge potential for positive impact on mankind.
Rivka Schuster is a student at Touro College and a rising senior majoring in Computer Programming. She was thrilled to join the DS3 class because she enjoys learning and is intrigued by data. She’s looking forward to collaborating on a project with other students and Microsoft researchers.
Thoa Ta is a rising senior at St. John’s University, majoring in Computer Science with a minor in Social Justice as part of being an Ozanam Scholar at St. John’s. Her passion is to leverage technology for social good and environmental sustainability. After a long time struggling to find her place in Computer Science, she finally feels at home in DS3, where she learns powerful tools to make an impact through insight discovery. On a personal level, she comes from Vietnam and sees spiritual enlightenment as her lifelong pursuit.
Fatima Chebchoub is a rising senior at New York City College of Technology. She is a Computer Systems Technology major and interested in software engineering. Fatima was born and raise in Morocco, and she moved to the USA about five years ago. In her first semester, Fatima has gone from a student struggling with a new foreign language. to get the first place for writing the best essay in New York from the Literary Arts Festival New York City College of Technology. Fatima speaks four other foreign languages, She enjoys writing and she has a dream that one day she will write a book. At the moment, she holds a part-time job as a software developer at her school.
Kaciny Calixte graduated from SUNY Old Westbury in May of 2016 with a Bachelor’s degree in Computer and Information Science. She discovered her love for coding mid-Sophomore year and never looked back. Her current interests include web development, data science and machine learning. Her long-term goal is to further her educational journey by attending graduate school. During the Fall 2016 semester, she is set to complete an internship focused on bioinformatics, plant genomics, and machine learning at a Department of Energy national laboratory.
Jacqueline Curran graduated in May 2016 from Manhattan College with a degree in Economics and Business Analytics. During her time at Manhattan, she was a member of Beta Gamma Sigma, Alpha Iota Delta, and Manhattan College’s Federal Reserve Challenge Team. She was the recipient of the Richard J. Carey Medal for Economics, which is awarded annually to the top student in the department. Following the DS3 program, she will begin her career at KPMG in their Global Mobility Services Division.
Louise Lai is a rising Junior at NYU Stern School of Business, double majoring in Business & Political Economy and Computer Science. She has many interests – startups, data science and politics, but is primarily fascinated with artificial intelligence. She hopes that technology will change the world to be a happier and more equal place, and aspires to work to make that happen in the future. She is an avid corgi fan, although she owns no pets because she lives in between Malaysia and Australia.
Abraham Neuwirth is an undergraduate student at Touro College where he is majoring in Computer Science and minoring in Mathematics. He is passionate about fields where these two disciplines intersect such as data science and machine learning. His favorite R package is dplyr and his operator of choice is the pipe. When he isn’t sitting in front of a computer screen, he daydreams about working out.
Jai Punjwani is a rising junior at Adelphi University studying computer science. He loves programming in Java (his “native” tongue) and has even developed an Android app that allows students to find each other and study at his university. In the future, he wishes to join the field of cryptography so that he can strengthen security in a world with more data than ever. Aside from coding, Jai loves reading the Game of Thrones series and also enjoys dancing on his school’s Bhangra team.
Erica Ram recently graduated with a Bachelor of Science degree in Computer Science with a Mathematics minor from Adelphi University. She has been writing code since high school, and finds working with data using code very interesting because of the variety of possibilities and applications. She plans to attend graduate school for Computer Science in Fall of 2017.
Marieme Toure is a rising senior at CUNY New York City College of Technology, starting to look for roles in the financial industry. After obtaining her high school diploma in Senegal, Marieme came back to the US for college education. She is majoring in Applied Mathematics with a concentration in Information Science and minoring in Computer Science. As an undergrad, the invisible forces that shape our world fascinate her. Why does one company succeed and another fail? Is it possible to predict which idea will be the next big thing? Marieme enjoyed predicting yellow caps drivers’ efficiency this summer at Microsoft Research. She is planning to go to graduate school and get a Master degree in Quantitative Finance and work in the most prestigious firms. As a big sport fan, Marieme is supporting the American team for the Olympics games.
Eiman Ahmed is a rising Sophomore at Pace University where she is majoring in Computer Science and minoring in Mathetmatics and in Statistics. She first fell in love with coding when she took her first Java course in high school and now works as an app developer at her university’s technology consultancy. In her free time, she enjoys watching T.V. shows like Criminal Minds and going on long walks with her friends.
Glenda Ascencio, an undergraduate student with an entrepreneur, mathematical, and software development skill living in New York City. She’s majoring in mathematics and minoring in computer science at St. Joseph’s College. She loves challenging her mind by finding solutions to different programming problems. In her spare time, one of the things she loves to do is to program with Java, Python, HTML, CSS, and R. One of her fervent desires is to pursue a career in data analytics so she can inform and educate citizenry of the USA/Honduras because she wants to stop violence, poverty, and ignorance.
Shannon Evans was born in St. Lucia and is currently an international student at New York City College of Technology. An Applied Mathematics major with a concentration in finance, his ultimate goal is to become a financial analyst. He lives for the opportunity to become a world changer; to design financial systems that improves our lives. He also loves sports, particularly soccer, table tennis, and cricket.
Thomas Patino is a senior at Skidmore College majoring in Business Management. Thomas has worked in various projects involving neighborhood improvement and urban planning. With his experience at Microsoft, Thomas hopes to bridge the gap between the technology industry and Latinos. Thomas anticipates going to graduate school in the computer science field to utilize data and find creative solutions to community development.
Nikki Hanson, known to friends as Riley, is a rising senior at Queens College interested in software engineering, gaming, Japanese and increasing diversity in tech. A bit of a latecomer to the game, they were writing code before they knew what it was, and that path inevitably led them to return to school for a Bachelor of Science in Computer Science and a Bachelor of Arts in Math. Their favorite subject by far is Computational Theory, and they hope to get into Cryptography next.
Anastassiya Neznanova is a current honor transfer student at Queens College. She recently completed her undergraduate research in mathematics and published her paper in the International Journal of Undergraduate Research and Creative Activities. Anastassiya aims to pursue her BS in Computer Science and sees her career in entrepreneurship.
Riva Tropp is from Teaneck, New Jersey. A Computer Science minor at Yeshiva University, she enjoys the opportunities for exploration Data Science provides. Co-president of her university’s computer science club, Riva occupies herself tutoring, scripting, and planning club activities. Her favorite subway station is at 14th street and Eighth avenue.
Steven Vasquez was born and raised in the Bronx where he currently resides. He attends Manhattan College, and is studying Computer Science and minoring in mathematics. Steven is a brother of the fraternity Delta Kappa Epsilon and love sports as much he loves solving problems. He is excited to learn and grow this summer.
Jahaziel Guzman (Brooklyn College) was born in San Salvador, El Salvador and has been living in Brooklyn since 1996. He has had an interest in music and visual art since he was a child. In his freshman year of college, he developed an interest for math and programming, and also had the opportunity to work in a biology lab at Brooklyn College doing bioinformatics work.
Donald Hanson II
Donald Hanson II is from Laurelton, New York. He is a computer science major with a minor in music at Adelphi University. Right now, he is an IT student worker at his school and is currently in the process of creating his own website using HTML and CSS. People usually say that he is a guy who likes to stay positive and motivated, and he thinks that describes him very well; he always try to make the best of every situation.
Afzal Hossain is a junior in New York City College of Technology. He has an associate degree in Computer science and now he is studying Applied Mathematics in Finance. His goal is to study data science in graduate school.
Khanna Pugach is a junior at Baruch College majoring in computer information systems with a math minor. Besides, she is an international student from Russia and this is her third year in the US. She likes Nora Ephron’s books and tennis.
Franky Rodriguez was born in Mexico, grew up in Miami, and now is doing a double major in mathematics and computer information technology at St. Joseph’s College, Brooklyn. He loves challenging his mind and finding solutions and applications to many different problems. He has worked on various applications including writing a Java program that recognizes melodies by converting musical notes into relative seminotes and durations. In his spare time he indulges in playing and composing music.
Derek Sanz was born in 1993 to Dominican parents in Brooklyn, New York. It was 2011 when he entered Brooklyn College, took the introductory computer science course, and entered a non-stop frenzy of hard work and love for learning. 2013 was a year of fun: 9 computer science courses, a one-month study abroad trip to China, one internship and one fellowship.
Briana Vecchione is a rising CS Junior at Pace University. Though relatively new to the field, Briana is a member of both the Pforzheimer Honors College and the Seidenberg Creative Lab on campus. Her background consists mostly of web design, game design, and app development. In addition to DS3, Briana is also in the process of developing educational applications for international implementation in Senegal. She anticipates getting her PhD and working to utilize technology in developing regions.
Siobhan Wilmot-Dunbar is a Junior at Pace University studying computer science and minoring in digital design. She is also a part of Seidenberg Creative Labs, a web development and research group at Pace, and has done coding in Java, HTML, and CSS. Besides that, Siobhan plays piano, acoustic guitar, and steel drums, and has high hopes of one day combining her ability in computing with her love for music and visual arts.
Student Trajectories and School Choice in the NYC Public School System
Keri Mallari, David Futran, Francois Mertil, Ilana Radinsky, Anandini Chawla, Rivka Schuster, Ro Liriano, Thoa Ta
New York City serves over one million public school students each year, yet relatively little is understood in terms of how students progress through the school system. In this talk we use individual-level student data over a ten year time period to explore how early test performance correlates with later success, to describe and predict which students leave the public school system, and to examine effects of the recently implemented high school choice system.
Airbrb: Predicting Loyalty
Louise Lai, Kaciny Calixte, Jacqueline Curran, and Erica Ram
The advent of the sharing economy has redefined the way firms do business. Airbnb has led this revolution. With a valuation of $25 billion, it has become the world’s third most valued startup and has more rooms than the world’s largest hotel chain. Historically, customer loyalty was based on experience with a particular firm, but now it is based on experiences with many individuals. We chose to use the Inside Airbnb dataset to further investigate the evolving idea of loyalty. Airbnb has both hosts and guests as customers. Host loyalty is defined as a host renting consistently, and guest loyalty as guests returning frequently. We used decision trees to look at both the loyalty of the hosts and the guests. No matter the industry, market experts stand by measures of recency frequency to predict loyalty. However, our model is able to improve upon this idea with added features, such as review text and amenities. The end result is a model that successfully predicts the return rate of hosts and guests to Airbnb with a high level of accuracy.
Fare Share: Flow and Efficiency in NYC’s Taxi System
Jai Punjwani, Abraham Neuwirth, Marieme Toure, and Fatima Chebchoub
New York City is home to millions of people who rely on its robust transportation system. The taxi system plays a critical role in helping people navigate the city. With access to information about every single trip that occurred in a yellow taxi in 2013, we were able to reveal patterns in how people move throughout the city. We also analyzed driver efficiency, showing that there is a substantial skill involved in driving a taxi, with some drivers consistently earning up to 30% more than average. Finally, we used the highly granular nature of this data to identify the locations of redundant trips, and showed that a simple carpooling strategy could reduce the amount of money spent on taxis and the number of taxi trips taken by upwards of 7%.
Watch the talk or read the paper for more details. Source code for this project is available on GitHub as well as an interactive map of travel patterns across neighborhoods.
The Cost of Public School
Thomas Patino, Anastassiya Neznanova, Nikki Hanson, and Glenda Ascencio
New York City is home to the largest public school system in the country, which contains some of the best and worst schools in the state. Given this diversity, which often occurs over small geographic regions, there is extremely high demand for homes in the best public schools in the city. We investigate and quantify this demand by analyzing over 10,000 home sales in different school zones across the city and reveal the implicit cost of purchasing a home zoned for each elementary school in the city.
The Ins and Outs of the New York City Subway System
Eiman Ahmed, Shannon Evans, Riva Tropp, and Steven Vazquez
Every day, the population of New York regions shrinks and swells as people travel into and around the city. With six million daily trips, the subway system is one of the main conduits for these travelers, but relatively little is known about the flow of subway passengers throughout the day. Using MTA’s public datasets, our team mapped the paths commuters take, and consequentially, the substantial changes to the population in the city’s many regions.
Briana Vecchione, Franky Rodriguez, Donald Hanson II, Jahaziel Guzman
Bike sharing is an internationally implemented system for reducing public transit congestion, minimizing carbon emissions, and encouraging a healthy lifestyle. Since New York City’s launch of the CitiBike program in May 2013, however, various issues have arisen due to overcrowding and general flow. In response to these issues, CitiBike employees redistribute bicycles by vehicle throughout the New York City area. During the past year, over 500,000 bikes have been redistributed in this fashion. This solution is financially taxing, environmentally and economically inefficient, and often suffers from timing issues. What if CitiBike instead used its clientele to redistribute bicycles? In this talk, we describe the data analysis that we conducted in hopes of creating an incentive and rerouting scheme for riders to self-balance the system. We anticipate that we can decrease vehicle transportations by offering financial incentives to take bikes from relatively full stations and return bikes to relatively empty stations (with rerouting advice provided via an app).
An Empirical Analysis of Stop-and-Frisk in New York City
Md.Afzal Hossain, Khanna Pugach, Derek Sanz, Siobhan Wilmot-Dunbar
Between 2006 and 2012, the New York City Police Department made roughly four million stops as part of the city’s controversial stop-and-frisk program. We empirically study two aspects of the program by analyzing a large public dataset released by the police department that records all documented stops in the city. First, by comparing to block-level census data, we estimate stop rates for various demographic subgroups of the population. We find that the average annual number of stops of young, black men exceeds the number of such individuals in the general population. This disparity is even more pronounced when we account for geography, with the number of stops of young black men in certain neighborhoods several times greater than those in the local population. Second, we statistically analyze the reasons recorded in our data that officers state for making each stop (e.g., “furtive movements” or “sights and sounds of criminal activity”). By comparing which stated reasons best predict whether a suspect is ultimately arrested, we develop simple heuristics to aid officers in making better stop decisions.
Frequently Asked Questions
What types of students are you looking for?
One of the aims of DS3 is to help increase diversity, broadly defined, in computer science graduate programs. We are looking for college students in the New York City area who can help us meet this goal.
Are there specific requirements for participating in the program?
Applicants must be currently enrolled in an undergradate program in the New York City area. Other than that there are no specific course prerequisites, but a familiarity with computer programming and/or statistics is helpful.
Is housing provided?
No. The program is intended for students who already reside in the New York City area. However, a $5,000 stipend is provided.
Where is the program held?
In the New York City office of Microsoft Research: 641 Avenue of the Americas.
Can I receive college credit for participating?
We do not offer college credit, but you may be able to receive credit through your home institution.
Do I need to have my own computer?
No. We will provide a laptop computer for you to use during the program, and you will be able to keep it at the end of the summer.
I will be graduating this coming spring. Am I still eligible to participate in the program?
Yes. Graduating seniors are welcome to apply.
I am a graduate student. Am I eligible to participate in the program?
No. The program is for undergraduate students.
I am not a U.S. citizen. Am I still eligible to participate in the program?
Yes. The program is not restricted to U.S. citizens. However, you are responsible for obtaining appropriate approvals from your college regarding any immigration requirements. As mentioned above, you should be currently enrolled in an undergradate program in the New York City area.
Who can I contact for further information?
You can reach us by email at email@example.com.
We are no longer accepting applications for Summer 2018. Decisions for those who applied were announced via email in early May. Applications for next summer will open early in 2019.
If you have any questions about the Data Science Summer School, please see frequently asked questions.