In recent years an increasing number of students have turned to online resources, such as massive open online courses (MOOCs) for learning. But while these online courses give teachers more coverage, student-teacher ratios can often be ten thousand to one or worse. With such ratios, students no longer get the type of feedback they need to really understand the material. Codewebs is a system that I have been developing which addresses the problem of scalability in providing student feedback for online programming-intensive courses. Codewebs analyzes a massive code corpora of historical student submissions and uses it to provide instant, useful and detailed student feedback to tens of thousands of students in the same course. By relying on a statistical approach, the quality of feedback increases as our system sees more data and the feedback is automatically tailored for each assignment. I will present a novel data driven technique to discover shared “parts” amongst multiple student submission, a problem that is complicated by the fact that there are always many ways to accomplish the same functionality in code. Throughout, I will demonstrate results on Coursera’s Machine Learning course, which received over 1 million code submissions in its first run.
Finally, I will highlight the emerging issues of scalability and sustainability of education, why these issues require insight from computer scientists and discuss specific problems in this domain that my future research program will address.