Microsoft Research Data Science Summer School

Microsoft Research Data Science Summer School


Important update for 2020 and 2021: Due to COVID-19, we will be holding this year’s summer school virtually and it will be shortened to 4 weeks (down from 8 weeks).

The Data Science Summer School (DS3) is an intensive, four-week hands-on introduction to data science for college students in the New York City area. As we are committed to increasing diversity in computer science, we strongly encourage women, minorities, and individuals with disabilities to apply.

Each student receives a stipend for participating in the program, as well as a laptop. The stipend for the 4 week-long program being run during the pandemic is $3,000.

DS3 includes both coursework in data science and group research projects. The summer school is taught by leading scientists at Microsoft Research, and is held at the new Microsoft Research office in the heart of New York City.

To learn more about DS3, please look at the program details and the frequently asked questions.


Important update for 2020 and 2021: Due to COVID-19, we will be holding this year’s summer school virtually and it will be shortened to 4 weeks (down from 8 weeks).

Program Details

June 1, 2021 to June 25, 2021 (4 weeks)
Mondays to Fridays: 10:00am – 5:00pm

Number of Participants

We typically have capacity for eight students.

Target Audience

Upper-level undergraduate students attending college in the New York City area who are interested in attending computer science graduate school, and who would benefit from an intensive introduction to data science. We seek to increase diversity in computer science, and so we especially encourage women, minorities, individuals with disabilities, and students from smaller colleges to apply to the program.

Course Description

This introduction to data science will cover tools and techniques for acquiring, cleaning, and utilizing real-world data for research purposes. In contrast to traditional course work, where one is often handed a prepackaged dataset obtained by a third party and prepared for a specific exercise, research projects often involve not only cleaning and preparing “messy” data, but often also acquiring that data oneself (e.g., through an API). The initial phase of these projects involves a good deal of exploratory analysis to gain a preliminary understanding of the dataset. Students will be introduced to scripting (on the command line and with Python and R) for these purposes, and will gain direct experience in acquiring and modeling data from online sources. The course also serves as an introduction to problems in applied statistics and machine learning. We will cover the theory behind simple but effective methods for supervised and unsupervised learning. Emphasis will be on formulating real-world modeling and prediction tasks as optimization problems and comparing methods in terms of practical efficacy and scalability. Students will learn to fit and evaluate such models, with applications including spam filtering and recommendation systems. All course material from the 2019 program is available on GitHub.

Research Projects

Students will work on an original research project in groups, led by Microsoft research scientists.


Important update for 2020 and 2021: Due to COVID-19, we will be holding this year’s summer school virtually and it will be shortened to 4 weeks (down from 8 weeks). Updated material and links for 2021 will be posted soon.

MSR NYC Virtual Data Science Summer School 2020

All course material from the program will be posted on GitHub.


  • Week 1: Pre-requisites and background
  • Week 2: Core curriculum
  • Week 3: Extended assignments
  • Week 4: Data replication analysis projects


Current Instructors

Jake HofmanJake Hofman

Jake Hofman is a Senior Principal Researcher at Microsoft Research in New York City, where his work in computational social science involves applications of statistics and machine learning to large-scale social data. Prior to joining Microsoft, he was a member of the Microeconomics and Social Systems group at Yahoo! Research. Jake is also an Adjunct Assistant Professor of Applied Mathematics at Columbia University, where he has designed and taught classes on a number of topics ranging from biological physics to applied machine learning. He holds a B.S. in Electrical Engineering from Boston University and a Ph.D. in Physics from Columbia University


Siddhartha SenSiddhartha Sen

Siddhartha Sen is a Principal Researcher at Microsoft Research in New York City, and previously at the MSR Silicon Valley lab. He creates distributed systems that use novel data structures and algorithms to deliver unprecedented functionality or performance. Some of his data structures have been incorporated into undergraduate textbooks and curricula. Recently, he has generalized this approach to use contextual machine learning to optimize decisions in distributed systems infrastructure. Siddhartha received his BS degrees in computer science and mathematics and his MEng degree in computer science from MIT. From 2004-2007 he worked as a developer at Microsoft and built a network load balancer for Windows Server. He completed his PhD from Princeton University in 2013. Siddhartha received the first Google Fellowship in Fault-Tolerant Computing in 2009, the best student paper award at PODC 2012, and the best paper award at ASPLOS 2017.

Dan GoldsteinDan Goldstein

Dan Goldstein works at the intersection of behavioral economics and computer science. Prior to joining Microsoft, Dan was a Principal Research Scientist at Yahoo Research and a marketing professor at London Business School. He received his Ph.D. at The University of Chicago and has taught and researched at Columbia, Harvard, Stanford and Max Planck Institute in Germany, where he was awarded the Otto Hahn Medal in 1997. His academic writings have appeared in journals from Science to Psychological Review. Dan is a member of the Academic Advisory Board of the UK’s Behavioral Insights Team (aka Britain’s “nudge unit”). He was elected President of the Society for Judgment and Decision Making for the year 2015-2016.

Jared KatzmanJared Katzman

Jared Katzman is a research assistant at Microsoft Research, New York, where they explore how computational tools can identify and mitigate the negative consequences of socio-technical systems, such as platforms for artificial intelligence, search, and news media. In addition, Jared is a member of the art and technology incubator NEW INC and runs a mentorship program for LGBTQ+ students interested in technology through the non-profit Out in Tech. Prior to Microsoft, Jared graduated cum laude from Yale with a B.S. in Computer Science and has previously worked in Amazon Web Service’s AI Labs building AI-powered services.

Past Instructors

Many individuals have contributed to the Data Science Summer School in the past, notably Sharad Goel and Justin Rao, who co-founded the program.


The Data Science Summer School (DS3) is an intensive, hands-on introduction to data science for college students in the New York City area.

2020 Students

Iman Abakoyas

Iman Abakoyas is a sophomore at CUNY Lehman College majoring in Computer Science with a minor in Data Science. She has a great passion for problem solving and a love for math. She’s a self-taught programmer with a drive to challenge herself and learn new skills. She works as a tutor at her College where she supports students in tackling different programming and math related problems. She also serves as a secretary for The Society of Hispanic Professional Engineers. She is a fellow at Management Leadership for Tomorrow. She loves classical music and enjoys playing the piano.

Rajiv Basnet

Rajiv Basnet is a rising senior at St. Joseph’s College in Brooklyn, double majoring in Computer Science and Mathematics. He rejoices the use of artistic math involved in computational problem-solving. He aspires to be a Software Engineer and eventually wants to pursue a PhD in Computer Science and Mathematics. Having been worked as a Software Engineer Intern at Tarifica and as a Web Development Technician at Caldwell University, he also has enriched his skills associated with these fields. He hopes, through Microsoft’s DS3 program, to further explore modern data acquisition and statistical modeling techniques so as to broaden his academic horizon as well as to bring his ambitions to fruition. Rajiv is also a great admirer of music and spends quite a lot of his time with his guitar.

Hasanat Jahan

Hasanat Jahan is a rising junior at CUNY Queens College who is currently pursuing a Computer Science major with Computational Linguistics minor. Her enthusiasm for the possibilities in the world of data science and machine learning had inspired her to apply to the DS3 program and ultimately pursue the field in her career. In her free time, she likes to read, draw scribbles, and watch video essays.

Gabrielle Martinez

Gabrielle Martinez is a rising senior double majoring in Computer Science and Economics at Pace University. She has a fascination with the real-world applications of both fields especially in the case of public policy and economic research. Her interest in combining the two fields began with her first economics class freshman year where she felt empowered to be a force for change. For the past two summers, she has been investigating the effects of economic policy and natural disasters on public services under the guidance of her professor. She looks forward to learning how to build better models in DS3 and applying her knowledge to her policy research. In her free time, she’s a writer, an amateur archer, and a history buff.

Krushang Shah

Krushang Shah will be graduating from the City College of New York in December 2020 with a Bachelor’s degree in Computer Engineering. He is passionate about Data Science and Machine Learning. He has worked with startup teams, to design innovative products like Collision avoidance system and Motion sensing workout wear. He enjoys reading, cooking and watching Sci-fi movies in his spare time. He was excited to be a part of DS3, to gain a deeper understanding of Data Science and learn from Microsoft Researchers. His dream job would be at an intersection of Artificial Intelligence and Robotics,to research and develop products that can have an impact in advancement of the society.

Basira Shirzad

Basira Shirzad is a student at Macaulay Honors College at Queens College, majoring in Computer Science and minoring in Math and Business and Liberal Arts. Her career goal is to become a data scientist and to work in the business world, modeling and analyzing various data streams to solve complex problems. She enjoys working with data because it merges creative and unconventional thinking with technology to solve an array of challenges. Basira hopes to grow in the data science field by joining the DS3 program to enhance her skills in machine learning and data modeling. Outside of college, Basira enjoys playing Badminton, working on DIY projects, and baking.

Tamar Yastrab

Tamar Yastrab is a rising senior in Stern College for Women. A long-time lover of math, Tamar developed an affinity for computer science upon entering college. She has done research on how computers process different languages, especially with non-English alphabets, and she hopes to learn more about applications of data science and machine learning to this area in DS3. Tamar is very passionate about involving women in STEM (and computer science especially!) and is proud to have lead initiatives that empower women in the field. When she isn’t programming at her computer, Tamar enjoys studying Talmud, going on hikes, and playing piano.

Xiaona Zhou

Xiaona Zhou is a rising senior majoring in Applied Mathematics at New York City College of Technology (CUNY). As an undergraduate, she worked on seven undergraduate research projects in pure and applied mathematics, and her passion for coding came through mathematics. She developed a Shiny App that implements a federal tax algorithm developed by Dr. Sam Ferguson, which helps self-employed people, who buy health insurance from a government exchange, calculate the appropriate premium tax credit they are entitled to. She is also a math tutor at her college and a budget analyst intern in the Office of the Brooklyn Borough President. She wants to go to graduate school and study data science or computer science. She hopes to become a data scientist and use data for good, for a better society and a better future for all.

2019 Students

Brenda Fried

Brenda Fried is a senior at Brooklyn College majoring in computer science. She loves to tackle challenging problems and is enamored by the power of the command line. She’s been intrigued by the world of data science and machine learning ever since starting work at a computer vision startup in her sophomore year. She was excited to join DS3 to learn from an incredible group of dedicated researchers. When not coding, Brenda loves to swim, spray paint art and hike.

Harpreet Gaur

Harpreet Gaur is a rising senior majoring in Computer Systems Technology at CUNY City Tech, where she mentors and tutors students during the semester. Moreover, she works on software development based research projects and has been a CUNY Research Scholar and an Honors Scholar at City Tech in the past. Her passion for technology dates back to her gap year where she taught students computer science and its impact to under-served communities in India. After completing the DS3 program, she wants to work on existing societal predicaments that impact people on a large-scale using data science and machine learning. Harpreet is a dessert enthusiast and a film-maker on the side.

Adnan Hoq

Adnan Hoq graduated as the Valedictorian from St. Joseph’s College with a double major in Mathematics & Computer Science. He has garnered experience in object-oriented programming and full-stack development under the tutelage of his mentor Dr. Callahan at NYU. Adnan created an automated tool for producing charts using R during his time as a software and data consultant at NYCDOHMH. Moreover, Adnan traveled to China to satiate his curiosity in Deep Learning when he was selected for Tsinghua University’s summer research program. Adnan possesses a unique proclivity for data science and machine learning research. He is currently trying his best to find an acceptable balance between industry and research in his career.

Emeka Mbazor

Emeka Samuel Mbazor is a rising senior at Lehman College, majoring in Computer Science and minoring in Data Science. Formerly a pre-medical student, he became interested in data and the stories contained within them after taking and enjoying a Biostatistics course. He hopes to further explore his interests and make a transition towards machine learning research in the coming year. Outside his academic and professional life, he also enjoys all things Star Wars.

Naomi Moreira

Naomi graduated from St. Joseph’s College in Brooklyn in May 2019 with a double major in Mathematics and Computer Science. She discovered her passion for mathematics and computer science during her sophomore year. After that she worked to be accepted at a research program in which I modeled the dynamics of recidivism in the state of Arizona. She enjoys listening to the Beatles and Norm Macdonald jokes.

Cindy Muso

Cindy Muso is a rising junior at St. John’s University where she is pursuing a double major in Mathematics and Computer Science. Her curiosity and love for computer science began in high school after taking a programming class. Through the DS3 program at Microsoft, Cindy has developed an interest in machine learning and looks forward to working in this particular field in the near future. One of her many goals is to pursue a PhD in Computer Science and Mathematics. She is a strong advocate for diversity and inclusion and believes that research, no matter the scope or topic, can play a huge role in advancing society. This is why Cindy wishes to devote her time in conducting research after her undergrad. In addition to this, she likes to spend her time playing tennis, traveling, and having quality time with friends and family.

Etta Rapp

Etta Rapp is a junior at Stern College for Women (Yeshiva University), double majoring in Mathematics and Computer Science.  She is passionate about both of these fields and the overlap between them, and was fascinated to learn about the many applications of math in computer science while at DS3. Outside of college, she enjoys reading, hiking, and spending time with her family.  Etta is looking forward to attending the 2019 ACM Richard Tapia Celebration of Diversity in Computing Conference.

Roymil Terrero

Roymil Terrero recently graduated from St. Joseph’s College, double majoring in Computer Science and Mathematics. He is passionate about machine learning, data analysis, and algorithms. Roymil joined Microsoft DS3 to enrich his knowledge of Machine Learning and Data analysis. He believes that today’s computational power should be used responsibly to extend human ability to learn and understand what could be impossible otherwise. He is intellectually curious about how things work and hopes to someday understand and modify the Linux kernel. In his spare time, he likes to cook, do 3D modeling, and play badminton with friends.

2018 Students

Renzhentaxi Baerde

Renzhentaxi Baerde graduated from Adelphi University in May 2018 with a Bachelor’s degree in Computer Science. He dreams to be able to read a research paper and actually understand it. He also likes to downvote reddit posts and is facinated by procedural generation. Currently, he is sending out thousands of application, hoping to get a job before his family disowns him. He discovered his love for programming when he realized he could beat games by using a Cheat Engine. 1 out of 10 people find Taxi funny. 9 out of 10 people dont talk to Taxi.

Peter Farquharson

Peter Farquharson is a rising senior at Lehman College majoring in Computer Science with a minor in Mathematics. He is passionate about working with Data Structures and Algorithms while incorporating tools from Data Science. He was thrilled to be a part of DS3 to gain an understanding of Data Science and working in a research setting with diverse individuals. He is an avid gamer and enjoys playing cricket and soccer in his spare time.

Akbar Mirza

Akbar Mirza recently graduated from the City College of New York, double majoring in Computer Science and User Experience Design. He is excited about product design, UX, and software engineering. Akbar is an interdisciplinary thinker and loves to push his limits. In the past year, he’s taught a course on iOS Development, used data science to explore the MTA subway system, and launched an app with Quadrant 2 called MigraCam. Designed to help immigrants reached loved ones in emergency situations, Migracam is one of his best examples of using the power of civic tech to help others. Akbar is interested in expanding his knowledge in civic tech, user research, and software engineering and is open to new opportunities to grow and learn.

Brian Morte Hernandez

Brian Hernandez graduated from Hunter College in June of 2018 with a Bachelor’s degree in Computer Science. He has been absolutely enamored with Data Science and Machine Learning ever since he watched a neural network program overcome one of the hardest levels in his favorite video game. After completing a few courses and programs focused on data, he found his ultimate data-driven learning experience in DS3. Outside of technology, he loves dancing, singing, and doing handstands.

Phoebe Nguyen

Phuong (Phoebe) Nguyen is a rising Senior at CUNY Baruch College, majoring in Computer Information System, specializing in Data Analytics. She found her passion for technology and education during her gap year when she filmed her first documentary about parenting methods in South East Asian countries. Her ambition is to work at the forefront of the technology with the ability to find innovative solutions to the world’s problem, serving as an advocate for underprivileged children and young women to get involved in Tech.

Sasha Paulovich

Sasha Paulovich will be graduating from Fordham University in December 2018 with a Bachelor’s degree in Computer Science. Originally a dancer, her interest in computer science stems from her desire to apply her creative tendencies to analytical challenges. For her, DS3 presented an opportunity to share the patterns and stories hidden in mountains of data. In addition to DS3, Sasha also participated in the Interactive Telecommunications Program Camp at NYU this past June, where she was able to explore the world of interactive/immersive technology. Inspired by her experiences this summer, Sasha hopes to pursue the intersection of interactive technology and data science, and discover ways in which she can communicate data-driven narratives.

Amanda Rodriguez

Amanda Rodriguez is a senior at Lehman College, majoring in Mathematics and Computer Science. She plans to apply for PhD programs in Theoretical Computer Science during Fall 2018. In addition to her research interests, she loves to play tennis, hike, and travel. She spent her previous summer doing research in Colombia and traveling throughout Europe. After her time with DS3, she will participate in a service trip to Barranquitas, Puerto Rico to help those affected by Hurricane Maria through the CUNY Service Corps – Puerto Rico initiative.

Ayliana Teitelbaum

Ayliana Teitelbaum is a rising Junior at Stern College for Women (Yeshiva University) studying Computer Science and Biology. She was introduced to computer science in high school, and knew she wanted to go into coding. However, she is also intrigued by biology, and hopes to combine both disciplines. Last summer, she worked in a program teaching elementary school students how to code while getting them excited about technology and software engineering. She enjoys using data to answer practical questions that come up in her and others’ daily lives, and hopes to continue to use technology to improve the lives of those around her.

2017 Students

Anandini Chawla Anandini Chawla

Anandini Chawla is a rising sophomore at NYU studying Computer Science and Mathematics. After high school, she worked at a nonprofit in rural India which got her interested in civic tech, a field she hopes to explore and dig deeper into.

David Futran David Futran

David Futran is a 5th year student in Macaulay Honors at Queens College, majoring in computer science with a minor in mathematics. Among his large range of interests, he loves to read, cook, and hike. He just spent a semester in Japan, which also deepened his interest in learning Japanese. He wanted to be part of the DS3 program in order to delve into Data Science, a field of study that has always intrigued him and he is considering pursuing in graduate school next year. He was also excited to be in a program run by Microsoft’s Data Science researchers and learn from them.

Rosemarie (Lo) Liriano Rosemarie (Ro) Liriano

Rosemarie (Ro) Liriano is a rising Junior at CUNY Lehman College majoring in Computer Science with a minor in Sociology. Her interests range between social justice, artificial intelligence and machine learning, and public policy. She would like to work at the intersection of technology and social science in the future and for this reason she was excited to be accepted into the Microsoft DS3 program where she hopes to learn how to combine her love for technology and sociology in a way that could help impact change in the world.

Keri Mallari Keri Mallari

Keri Mallari is am a rising junior at Lehman College majoring in Math and Computer Science. She applied to this program because she took a data-based class called the Future of New York City, and it was the best Computer Science class she’s taken in college. She wants to explore more data sets and learn more about data science.

Francois Mertil Francois Mertil

Francois Mertil is a senior at New York City College of Technology, CUNY, is majoring in Applied Math focusing in Information Science.

Ilana Radinsky Ilana Radinsky

Ilana Radinsky is a rising Junior at Stern College for Women (Yeshiva University) studying Computer Science and Math. She applied to the Microsoft Research Data Science Summer School because she believes in the power of data science – as an emerging field that bridges the gap between so many disciplines and industries, data science is a discipline with a huge potential for positive impact on mankind.

Rivka Schuster

Rivka Schuster

Rivka Schuster is a student at Touro College and a rising senior majoring in Computer Programming. She was thrilled to join the DS3 class because she enjoys learning and is intrigued by data. She’s looking forward to collaborating on a project with other students and Microsoft researchers.

Thoa Ta Thoa Ta

Thoa Ta is a rising senior at St. John’s University, majoring in Computer Science with a minor in Social Justice as part of being an Ozanam Scholar at St. John’s. Her passion is to leverage technology for social good and environmental sustainability. After a long time struggling to find her place in Computer Science, she finally feels at home in DS3, where she learns powerful tools to make an impact through insight discovery. On a personal level, she comes from Vietnam and sees spiritual enlightenment as her lifelong pursuit.

2016 Students

Fatima Chebchoub Fatima Chebchoub

Fatima Chebchoub is a rising senior at New York City College of Technology. She is a Computer Systems Technology major and interested in software engineering. Fatima was born and raise in Morocco, and she moved to the USA about five years ago. In her first semester, Fatima has gone from a student struggling with a new foreign language. to get the first place for writing the best essay in New York from the Literary Arts Festival New York City College of Technology. Fatima speaks four other foreign languages, She enjoys writing and she has a dream that one day she will write a book. At the moment, she holds a part-time job as a software developer at her school.

Kaciny Calixte Kaciny Calixte

Kaciny Calixte graduated from SUNY Old Westbury in May of 2016 with a Bachelor’s degree in Computer and Information Science. She discovered her love for coding mid-Sophomore year and never looked back. Her current interests include web development, data science and machine learning. Her long-term goal is to further her educational journey by attending graduate school. During the Fall 2016 semester, she is set to complete an internship focused on bioinformatics, plant genomics, and machine learning at a Department of Energy national laboratory.

Jacqueline Curran Jacqueline Curran

Jacqueline Curran graduated in May 2016 from Manhattan College with a degree in Economics and Business Analytics. During her time at Manhattan, she was a member of Beta Gamma Sigma, Alpha Iota Delta, and Manhattan College’s Federal Reserve Challenge Team. She was the recipient of the Richard J. Carey Medal for Economics, which is awarded annually to the top student in the department. Following the DS3 program, she will begin her career at KPMG in their Global Mobility Services Division.

Louise Lai Louise Lai

Louise Lai is a rising Junior at NYU Stern School of Business, double majoring in Business & Political Economy and Computer Science. She has many interests – startups, data science and politics, but is primarily fascinated with artificial intelligence. She hopes that technology will change the world to be a happier and more equal place, and aspires to work to make that happen in the future. She is an avid corgi fan, although she owns no pets because she lives in between Malaysia and Australia.

Abraham Neuwirth Abraham Neuwirth

Abraham Neuwirth is an undergraduate student at Touro College where he is majoring in Computer Science and minoring in Mathematics. He is passionate about fields where these two disciplines intersect such as data science and machine learning. His favorite R package is dplyr and his operator of choice is the pipe. When he isn’t sitting in front of a computer screen, he daydreams about working out.

Jai Punjwani Jai Punjwani

Jai Punjwani is a rising junior at Adelphi University studying computer science. He loves programming in Java (his “native” tongue) and has even developed an Android app that allows students to find each other and study at his university. In the future, he wishes to join the field of cryptography so that he can strengthen security in a world with more data than ever. Aside from coding, Jai loves reading the Game of Thrones series and also enjoys dancing on his school’s Bhangra team.

Erica Ram Erica Ram

Erica Ram recently graduated with a Bachelor of Science degree in Computer Science with a Mathematics minor from Adelphi University. She has been writing code since high school, and finds working with data using code very interesting because of the variety of possibilities and applications. She plans to attend graduate school for Computer Science in Fall of 2017.

Marieme Toure Marieme Toure

Marieme Toure is a rising senior at CUNY New York City College of Technology, starting to look for roles in the financial industry. After obtaining her high school diploma in Senegal, Marieme came back to the US for college education. She is majoring in Applied Mathematics with a concentration in Information Science and minoring in Computer Science. As an undergrad, the invisible forces that shape our world fascinate her. Why does one company succeed and another fail? Is it possible to predict which idea will be the next big thing? Marieme enjoyed predicting yellow caps drivers’ efficiency this summer at Microsoft Research. She is planning to go to graduate school and get a Master degree in Quantitative Finance and work in the most prestigious firms. As a big sport fan, Marieme is supporting the American team for the Olympics games.

2015 Students

Eiman Ahmed Eiman Ahmed

Eiman Ahmed is a rising Sophomore at Pace University where she is majoring in Computer Science and minoring in Mathetmatics and in Statistics. She first fell in love with coding when she took her first Java course in high school and now works as an app developer at her university’s technology consultancy. In her free time, she enjoys watching T.V. shows like Criminal Minds and going on long walks with her friends.

Glenda Ascencio Glenda Ascencio

Glenda Ascencio, an undergraduate student with an entrepreneur, mathematical, and software development skill living in New York City. She’s majoring in mathematics and minoring in computer science at St. Joseph’s College. She loves challenging her mind by finding solutions to different programming problems. In her spare time, one of the things she loves to do is to program with Java, Python, HTML, CSS, and R. One of her fervent desires is to pursue a career in data analytics so she can inform and educate citizenry of the USA/Honduras because she wants to stop violence, poverty, and ignorance.

Shannon Evans Shannon Evans

Shannon Evans was born in St. Lucia and is currently an international student at New York City College of Technology. An Applied Mathematics major with a concentration in finance, his ultimate goal is to become a financial analyst. He lives for the opportunity to become a world changer; to design financial systems that improves our lives. He also loves sports, particularly soccer, table tennis, and cricket.

Thomas Patino Thomas Patino

Thomas Patino is a senior at Skidmore College majoring in Business Management. Thomas has worked in various projects involving neighborhood improvement and urban planning. With his experience at Microsoft, Thomas hopes to bridge the gap between the technology industry and Latinos. Thomas anticipates going to graduate school in the computer science field to utilize data and find creative solutions to community development.

Nikki Hanson Nikki Hanson

Nikki Hanson, known to friends as Riley, is a rising senior at Queens College interested in software engineering, gaming, Japanese and increasing diversity in tech. A bit of a latecomer to the game, they were writing code before they knew what it was, and that path inevitably led them to return to school for a Bachelor of Science in Computer Science and a Bachelor of Arts in Math. Their favorite subject by far is Computational Theory, and they hope to get into Cryptography next.

Anastassiya Neznanova Anastassiya Neznanova

Anastassiya Neznanova is a current honor transfer student at Queens College. She recently completed her undergraduate research in mathematics and published her paper in the International Journal of Undergraduate Research and Creative Activities. Anastassiya aims to pursue her BS in Computer Science and sees her career in entrepreneurship.

Riva Tropp Riva Tropp

Riva Tropp is from Teaneck, New Jersey. A Computer Science minor at Yeshiva University, she enjoys the opportunities for exploration Data Science provides. Co-president of her university’s computer science club, Riva occupies herself tutoring, scripting, and planning club activities. Her favorite subway station is at 14th street and Eighth avenue.

Steven Vasquez Steven Vasquez

Steven Vasquez was born and raised in the Bronx where he currently resides. He attends Manhattan College, and is studying Computer Science and minoring in mathematics. Steven is a brother of the fraternity Delta Kappa Epsilon and love sports as much he loves solving problems. He is excited to learn and grow this summer.

2014 Students

Jahaziel Guzman Jahaziel Guzman

Jahaziel Guzman (Brooklyn College) was born in San Salvador, El Salvador and has been living in Brooklyn since 1996. He has had an interest in music and visual art since he was a child. In his freshman year of college, he developed an interest for math and programming, and also had the opportunity to work in a biology lab at Brooklyn College doing bioinformatics work.

Donald Hanson Donald Hanson II

Donald Hanson II is from Laurelton, New York. He is a computer science major with a minor in music at Adelphi University. Right now, he is an IT student worker at his school and is currently in the process of creating his own website using HTML and CSS. People usually say that he is a guy who likes to stay positive and motivated, and he thinks that describes him very well; he always try to make the best of every situation.

Afzal Hossain Afzal Hossain

Afzal Hossain is a junior in New York City College of Technology. He has an associate degree in Computer science and now he is studying Applied Mathematics in Finance. His goal is to study data science in graduate school.

Khanna Pugach Khanna Pugach

Khanna Pugach is a junior at Baruch College majoring in computer information systems with a math minor. Besides, she is an international student from Russia and this is her third year in the US. She likes Nora Ephron’s books and tennis.

Franky Rodriguez Franky Rodriguez

Franky Rodriguez was born in Mexico, grew up in Miami, and now is doing a double major in mathematics and computer information technology at St. Joseph’s College, Brooklyn. He loves challenging his mind and finding solutions and applications to many different problems. He has worked on various applications including writing a Java program that recognizes melodies by converting musical notes into relative seminotes and durations. In his spare time he indulges in playing and composing music.

Derek Sanz Derek Sanz

Derek Sanz was born in 1993 to Dominican parents in Brooklyn, New York. It was 2011 when he entered Brooklyn College, took the introductory computer science course, and entered a non-stop frenzy of hard work and love for learning. 2013 was a year of fun: 9 computer science courses, a one-month study abroad trip to China, one internship and one fellowship.

Briana Vecchione Briana Vecchione

Briana Vecchione is a rising CS Junior at Pace University. Though relatively new to the field, Briana is a member of both the Pforzheimer Honors College and the Seidenberg Creative Lab on campus. Her background consists mostly of web design, game design, and app development. In addition to DS3, Briana is also in the process of developing educational applications for international implementation in Senegal. She anticipates getting her PhD and working to utilize technology in developing regions.

Siobhan Wilmot-Dunbar Siobhan Wilmot-Dunbar

Siobhan Wilmot-Dunbar is a Junior at Pace University studying computer science and minoring in digital design. She is also a part of Seidenberg Creative Labs, a web development and research group at Pace, and has done coding in Java, HTML, and CSS. Besides that, Siobhan plays piano, acoustic guitar, and steel drums, and has high hopes of one day combining her ability in computing with her love for music and visual arts.




Replicating “Predicting the Present” with search data

Iman Abakoyas, Rajiv Basnet, Hasanat Jahan, Gabrielle Martinez, Krushang Shah, Basira Shirzad, Tamar Yastrab, Xiaona Zhou

This year’s project involved replicating and extending a widely read paper (Choi and Varian, 2011) on using search data to predict current and future economic outcomes. Students worked in groups of two and wrote their own original code with two goals in mind: first to reproduce the results published in the paper, and second to extend those results in a direction of their choosing. The students were able to exactly replicate the paper’s results when using data provided by the authors, but saw some small discrepancies when using versions of the source data currently available online. We suspect these differences are due to changes in the underlying datasets and to unspecified preprocessing done by the authors. The students extended the paper in several ways: examining alternative models, forecasting on longer time horizons, and evaluating the value of search data on a longer timescale. In investigating the latter, the students found that the utility of search data has decreased since the time of the original publication, and that it recent years a simple baseline model that omits search data is, on average, more accurate than one that does. All data and code for the projects are available on Github.

(Note: our 2020 program was modified due to COVID-19, shortened from 8 weeks to 4 weeks and run virtually.)



Replicating “An Empirical Analysis of Radical Differences in Police Use of Force”

Brenda Fried, Naomi Moreira, Harpreet Gaur, Cindy Muso, Adnan Hoq, Etta Rapp, Emeka Mbazor, Roymill Terrero

This project replicates and extends a recent paper on racial bias in police use of force. We selected this paper because it is both widely read and also an ideal candidate for a data analysis replication. It uses relatively simple methodology that seems straightforward to implement and check, relies on two publicly available datasets, and contains more than 100 pages between the main text and extensive appendices. Despite this nearly ideal setting, completing the data analysis replication turned out to be much more complicated than expected and took several weeks itself, mainly for reasons that centered around how the original data were cleaned and featurized. These challenges came despite the extensive documentation in the paper and its appendix, but they also helped uncover insights that might not have been obvious from simply reading the paper. We extended the paper’s results through the addition of map and census information as well as predictive checks of the underlying models used in the paper. In this talk we discuss the various challenges we faced in replicating the results and the insights that the replication revealed.

Watch the talk for more details. Source code for this project is available on GitHub.



Exploring the Reliability of the NYC Subway System

Akbar Mirza, Brian Hernandez, Amanda Rodriguez, Renzhentaxi Baerde, Phoebe Nguyen, Peter Farquharson, Ayliana Teitelbaum, Sasha Paulovich

The New York City subway is the largest rapid transit system in the world, serving approximately 5.5 million riders each day. Recently there has been a growing concern over the state of the subway system due to aging equipment as reflected in system-wide metrics such as “on-time percentage”, or how often trains run according to schedule. While these metrics provide some insight into the performance of the subway system, they fail to capture how riders experience the system. In this project we use recently released countdown clock data that logs where each train is reported to be at each minute of the day to gain a better understanding of how riders experience the subway system. We examine rider wait times and trip times, considering not just average but also worst-case performance of the system. We also compare the subway to above ground travel, investigate how changes to the system affect rider options, and look at how commutes vary across demographic groups. We find that the subway is typically quite reliable, but that averages can be misleading: variance in subway performance can account for up to a 50% difference between average and worst-case travel times. We also find a correlation between income and commute times and that small changes to the system (e.g., adding or removing stops or lines) can have large effects on riders’ options.

Watch the talk for more details. Source code for this project is available on GitHub.




Student Trajectories and School Choice in the NYC Public School System

Keri Mallari, David Futran, Francois Mertil, Ilana Radinsky, Anandini Chawla, Rivka Schuster, Ro Liriano, Thoa Ta

New York City serves over one million public school students each year, yet relatively little is understood in terms of how students progress through the school system. In this talk we use individual-level student data over a ten year time period to explore how early test performance correlates with later success, to describe and predict which students leave the public school system, and to examine effects of the recently implemented high school choice system.

Watch the talk or read the paper for more details. Source code for this project is available on GitHub.




Airbrb: Predicting Loyalty

Louise Lai, Kaciny Calixte, Jacqueline Curran, and Erica Ram

The advent of the sharing economy has redefined the way firms do business. Airbnb has led this revolution. With a valuation of $25 billion, it has become the world’s third most valued startup and has more rooms than the world’s largest hotel chain. Historically, customer loyalty was based on experience with a particular firm, but now it is based on experiences with many individuals. We chose to use the Inside Airbnb dataset to further investigate the evolving idea of loyalty. Airbnb has both hosts and guests as customers. Host loyalty is defined as a host renting consistently, and guest loyalty as guests returning frequently. We used decision trees to look at both the loyalty of the hosts and the guests. No matter the industry, market experts stand by measures of recency frequency to predict loyalty. However, our model is able to improve upon this idea with added features, such as review text and amenities. The end result is a model that successfully predicts the return rate of hosts and guests to Airbnb with a high level of accuracy.

Watch the talk or read the paper for more details. Source code for this project is available on GitHub.


Fare Share: Flow and Efficiency in NYC’s Taxi System

Jai Punjwani, Abraham Neuwirth, Marieme Toure, and Fatima Chebchoub

New York City is home to millions of people who rely on its robust transportation system. The taxi system plays a critical role in helping people navigate the city. With access to information about every single trip that occurred in a yellow taxi in 2013, we were able to reveal patterns in how people move throughout the city. We also analyzed driver efficiency, showing that there is a substantial skill involved in driving a taxi, with some drivers consistently earning up to 30% more than average. Finally, we used the highly granular nature of this data to identify the locations of redundant trips, and showed that a simple carpooling strategy could reduce the amount of money spent on taxis and the number of taxi trips taken by upwards of 7%.

Watch the talk or read the paper for more details. Source code for this project is available on GitHub as well as an interactive map of travel patterns across neighborhoods.




The Cost of Public School

Thomas Patino, Anastassiya Neznanova, Nikki Hanson, and Glenda Ascencio

New York City is home to the largest public school system in the country, which contains some of the best and worst schools in the state. Given this diversity, which often occurs over small geographic regions, there is extremely high demand for homes in the best public schools in the city. We investigate and quantify this demand by analyzing over 10,000 home sales in different school zones across the city and reveal the implicit cost of purchasing a home zoned for each elementary school in the city.

Watch the talk for more details. Source code for this project is available on GitHub as well as an interactive map of school zone prices.


The Ins and Outs of the New York City Subway System

Eiman Ahmed, Shannon Evans, Riva Tropp, and Steven Vazquez

Every day, the population of New York regions shrinks and swells as people travel into and around the city. With six million daily trips, the subway system is one of the main conduits for these travelers, but relatively little is known about the flow of subway passengers throughout the day. Using MTA’s public datasets, our team mapped the paths commuters take, and consequentially, the substantial changes to the population in the city’s many regions.

Watch the talk for more details. Source code for this project is available on GitHub.



Self-Balancing Bikes

Briana Vecchione, Franky Rodriguez, Donald Hanson II, Jahaziel Guzman 

Bike sharing is an internationally implemented system for reducing public transit congestion, minimizing carbon emissions, and encouraging a healthy lifestyle. Since New York City’s launch of the CitiBike program in May 2013, however, various issues have arisen due to overcrowding and general flow. In response to these issues, CitiBike employees redistribute bicycles by vehicle throughout the New York City area. During the past year, over 500,000 bikes have been redistributed in this fashion. This solution is financially taxing, environmentally and economically inefficient, and often suffers from timing issues. What if CitiBike instead used its clientele to redistribute bicycles? In this talk, we describe the data analysis that we conducted in hopes of creating an incentive and rerouting scheme for riders to self-balance the system. We anticipate that we can decrease vehicle transportations by offering financial incentives to take bikes from relatively full stations and return bikes to relatively empty stations (with rerouting advice provided via an app).

For more details, please see our paper and talk.


An Empirical Analysis of Stop-and-Frisk in New York City

Md.Afzal Hossain, Khanna Pugach, Derek Sanz, Siobhan Wilmot-Dunbar

Between 2006 and 2012, the New York City Police Department made roughly four million stops as part of the city’s controversial stop-and-frisk program. We empirically study two aspects of the program by analyzing a large public dataset released by the police department that records all documented stops in the city. First, by comparing to block-level census data, we estimate stop rates for various demographic subgroups of the population. We find that the average annual number of stops of young, black men exceeds the number of such individuals in the general population. This disparity is even more pronounced when we account for geography, with the number of stops of young black men in certain neighborhoods several times greater than those in the local population. Second, we statistically analyze the reasons recorded in our data that officers state for making each stop (e.g., “furtive movements” or “sights and sounds of criminal activity”). By comparing which stated reasons best predict whether a suspect is ultimately arrested, we develop simple heuristics to aid officers in making better stop decisions.

For more details, please see our paper and talk.



Frequently Asked Questions

What types of students are you looking for?

One of the aims of DS3 is to help increase diversity, broadly defined, in computer science graduate programs. We are looking for college students in the New York City area who can help us meet this goal.

Are there specific requirements for participating in the program?

Applicants must be currently enrolled in an undergraduate program in the New York City area. Other than that there are no specific course prerequisites, but a familiarity with computer programming and/or statistics is helpful.

Is housing provided?

No. The program is intended for students who already reside in the New York City area. However, a stipend is provided. The stipend for the 4 week-long program being run during the pandemic is $3,000.

Where is the program held?

The DS3 program will be held virtually in light of COVID-19.

Can I receive college credit for participating?

We do not offer college credit, but you may be able to receive credit through your home institution.

Do I need to have my own computer?

No. We will provide a laptop computer for you to use during the program, and you will be able to keep it at the end of the summer.

I will be graduating this coming spring. Am I still eligible to participate in the program?

Yes. Graduating seniors are welcome to apply.

I am a graduate student. Am I eligible to participate in the program?

No. The program is for undergraduate students.

I am not a U.S. citizen. Am I still eligible to participate in the program?

Yes. The program is not restricted to U.S. citizens. However, you are responsible for obtaining appropriate approvals from your college regarding any immigration requirements. As mentioned above, you should be currently enrolled in an undergradate program in the New York City area.

Who can I contact for further information?

You can reach us by email at


Important update for 2020 and 2021: Due to COVID-19, we will be holding this year’s summer school virtually and it will be shortened to 4 weeks (down from 8 weeks).

Applicants must be currently enrolled in an undergraduate program in the New York City area.

Apply here for the summer 2021 program. Applications are open until April 26th, 2021.