Reinforcement Learning Open Source Fest

Reinforcement Learning Open Source Fest


The Reinforcement Learning (RL) Open Source Fest is a global online program focused on introducing students to open source reinforcement learning programs and software development while working alongside researchers, data scientists, and engineers on the Real World Reinforcement Learning team at Microsoft Research NYC. Students will work on a four-month research programming project during their break from university (May-August 2021). Accepted students will receive a $10,000 USD stipend.

Our goal is to bring together a diverse group of students from around the world to collectively solve open source reinforcement learning problems and advance the state-of-the-art research and development alongside the RL community while providing open source code written and released to benefit all.

At the end of the program, students will present each of their projects to the Microsoft Research Real World Reinforcement Learning team online.

Open Source Projects

Vowpal Wabbit (VW) is an open source machine learning library created by John Langford and developed by Microsoft Research with the help of many contributors. It is a fast, flexible, online, and active learning solution that empowers people to solve complex interactive machine learning problems, with a large focus on contextual bandits and reinforcement learning. It is a vehicle for both research prototyping and driving bleeding edge algorithms to production. RL OS Fest is all about open source projects in the Vowpal Wabbit ecosystem.

Check out the list of open source projects here.


To be eligible for the program, students must be enrolled in or accepted into an accredited institution including colleges, universities, Master programs, PhD programs, and undergraduate programs.

Student responsibilities during the program

  • Submit quality work: code compiles, has unit tests and documentation, and passes code review
  • Regularly communicate work completed, what you intend to do next, and blockers
  • Re-evaluate project tasks if you’re significantly ahead or behind schedule
  • Regular check-ins with your mentor/collaborator
  • Listen and respond to feedback
  • Pro-active learning

What makes a successful project?

Success looks different for every project. Challenging yourself and developing skills and knowledge are the most important part. Producing some sort of deliverable item is great, but not strictly required. We all know how development and experimentation goes, unforeseen problems can come up and present new challenges and that’s all part of the process. You’ll have a mentor and support along the way.

  • A successful engineering-oriented project might include pull requests merging your work, a design document, tests, and general documentation
  • A successful data science-oriented project might involve pull requests, reproducible experiments, data-sets, a report, and visualized results
  • A successful prototyping-oriented project might include an MVP, tests, and documentation


Program Timeline

*The upcoming program dates are subject to change, and will be finalized and updated here by January 22, 2021.

February 1, 2021 | Application period opens
March 1, 2021 | Application period closes
April 5, 2021 | Selected applicants notified
May 3, 2021 | Projects begin
August 9-13, 2021 | Project presentations


How to submit an application

The below outlines the information necessary to submit your application in our submission portal. There are 3 sections to the application submission.

  1. Application Details
    You will be asked to fill out the following questions:
    • Are you currently enrolled in an accredited university or college? (Please note, proof of enrollment will be required, if accepted)
    • Select your country
    • Upload your resume or a document containing a list of classes completed to date
    • Upload existing or past personal projects you’ve worked on or open source projects to which you’ve contributed
    • Choose your preferred project from the list of Open Source Projects
    • Upload your proposal for the selected Open Source Project. Include why you want to work on this problem specifically. Provide a rough outline of how you plan to execute on the project selected. This should include a week-by-week plan of what you’d need to learn and the challenges you foresee.
  1. Pre-Screen Exercises
    This step includes a pre-screening exercise which is required as part of your application. After completing the required Pre-Screening Exercise, you will be asked to provide a link to the GitHub repository with your project.
  2. Additional Open Source Project
    This year, you will have the option to submit an additional Open Source Project proposal from the list of projects provided.

The below outlines important information pertaining to your application:

  • Proposals submitted to Microsoft will not be returned. Microsoft cannot assume responsibility for the confidentiality of information submitted in the proposal. Therefore, proposals should not contain information that is confidential, restricted, or sensitive.
  • Incomplete proposals will not be considered.
  • Due to the volume of submissions, Microsoft Research cannot provide individual feedback on proposals.

RL Team



2020 Alumni

head shot of Harish Kamath

Harish Kamath

Harish Kamath is a Computer Science/Math undergraduate at Georgia Tech. His passions focus on reinforcement learning, generalization in learning, and making newer technologies cheaper, faster, and more accessible. Outside of work, I love playing/watching basketball, dance, and running! Once I graduate, I hope I end up somewhere where I can make the biggest lasting impact for the most people.

Challenge: Conversion of VowpalWabbit models into ONNX format

Currently, ONNX is the leading standard to represent machine learning models across platforms and frameworks. It describes a model as a computational graph consisting of a set of standard operators from an operator set that is constantly evolving to accommodate new types of models and operations. Being able to describe a model in ONNX format is important, as it allows for (1) models to be optimized and run across different architectures using a single runtime, and (2) it allows models created in different frameworks to interact with each other. Although other leading frameworks such as Tensorflow and Pytorch have mature tools to convert into ONNX format, VowpalWabbit today does not yet have the capability baked into the framework. This project focuses on introducing this functionality to VowpalWabbit, so that we can combine the fast model training and inference speed of VW with the representational capacity of other frameworks. We introduce new sparse operators that are used to instantiate VW regression models efficiently in ONNX format, show that you can directly translate regression and contextual bandits models with these operators, and give an example of such models being run in RLClientLib to show that they can now be ported into any inference framework.

head shot of Cassandra Marcussen

Cassandra Marcussen

Cassandra Marcussen is a junior at Columbia University studying Mathematics and Computer Science. Her interests lie in artificial intelligence, theoretical computer science, and contributing to technology within these fields through efficient computing and low-level optimizations. Cassandra is enthusiastic about open source code, and has loved working on an impactful open source system such as Vowpal Wabbit. In the future, she wishes to pursue graduate studies in Computer Science.

Challenge: Parallelized Parsing

Modern machines often utilize many threads to achieve good performance. Currently, VW uses a single thread to read in and parse input, and a single thread to learn. The parse thread presents a bottleneck, slowing down VW as a whole. By extracting the input reading into a separate thread and extending the parser to support many threads, VW can better utilize resources, achieve better performance, and have an improved design by separating logical components into independent modules. This project focuses on improving performance and design for the text input format, and also ensures compatibility with the cache input format.

head shot of Newton Mwai

Newton Mwai Kinyanjui

Newton Mwai Kinyanjui is from Nairobi Kenya. I’m currently pursuing my Ph.D. at Chalmers University of Technology in Sweden, working in causal inference and reinforcement learning towards machine learning for improved decision making in healthcare with Fredrik Johansson. I graduated from Carnegie Mellon University Africa with a Master of Science in Electrical and Computer Engineering.

Challenge: Library of contextual bandit estimators

Estimators are used in off-policy evaluation. One common estimator is IPS, and others are DR and PseudoInverse. These estimators work better or worse in different settings. This project explores reference implementations of each and allows for comparison between them to aid in understanding. We extend the estimators library and implement an interface to help researchers and data scientists test different estimators quickly and easily.

head shot of Milind Agarwal

Milind Agarwal

Milind Agarwal is a combined undergraduate and master’s student in Computer Science at Johns Hopkins University. His current research interests are natural language processing and machine translation for low-resource and endangered language settings. Before interning at Microsoft, he previously worked in many different academic research labs at Johns Hopkins gaining experience in a wide variety of fields including NLP, machine translation, computational biology, data visualization, and software development. After graduation in 2021, he hopes to join a Ph.D. program where he can continue to work on challenging NLP problems.

Challenge: Contextual Bandit Data Visualization with Jupyter Notebooks

Exploratory data analysis and data visualization have become an essential part of any data scientist’s toolkit. Visualizations not only allow you to kickstart your analysis by easily understanding the patterns in your data but also help you visually inspect your policies to understand their behaviour. We present cb_visualize, a python-based visualization library specialized for contextual bandits features and policy visualizations. This library offers robust visualizations for data exploration, training, feature importance, and action distributions and supports common contextual-bandit dataset formats used by Vowpal Wabbit like text, JSON, and DSJSON. We hope that this toolkit will be an asset for researchers and customers alike to better present and understand their data and analyses.

head shot of Mark Rucker

Mark Rucker

Mark Rucker is currently a 2nd year PhD student at the University of Virginia with a previous 8 year career as an enterprise software engineer. Mark’s PhD research explores how reinforcement learning models can be used to encourage health behavior change in individuals managing chronic health conditions. This research combines state of the art machine learning with web and mobile app development to support in-situ randomized control trials of behavior change interventions. After graduation Mark hopes to once again return to industry in order to develop high-quality products that deeply impact people’s lives.

Challenge: COBA: A Modern Benchmarking Package for Reproducible Contextual Bandit Research

Performance benchmarking on well-defined problems is a pillar of modern machine learning research. With clear problems and metrics, benchmarking has allowed the research community to maintain a high-level of independent effort while still making real and meaningful progress over time. The elegance of benchmarking — define, measure, repeat — however, belies real engineering challenges such as software maintenance, data distribution, statistical aggregation, and reproducibility to name a few. These challenges are especially salient in contextual bandit research where one not only needs a data set but also a harness to emulate interaction with the data. In an effort to reduce these burdens, while not losing any of benchmarking’s benefits, we present COBA, an ultra light-weight Python package for benchmarking contextual bandit algorithms. COBA uses a small set of clean and consistent interfaces to satisfy four core use cases: (1) creating reproducible benchmarks, (2) sharing reproducible benchmarks, (3) evaluating custom algorithms, and (4) exploring evaluation results.

head shot of Sharad Chitlangia

Sharad Chitlangia

Sharad Chitlangia is a senior year undergraduate student at BITS Pilani Goa, where I studied Electronics. I am specializing in the field of Artificial Intelligence. I’ve previously worked heavily at the intersection of Machine Learning and Systems and Explainable AI. Aside from work, I spend a lot of time in the Open Source Community and working on improving accessibility, especially in AI research.

Challenge: Pushing the Limits of VW with Flatbuffers

VowpalWabbit is known for its abilitiy to solve complex machine learning problems extremely fast. Through this project, we aim to take this ability, even further, by the introduction of Flatbuffers. Flatbuffers is an efficient cross-platform serialization library known for its memory access efficiency and speed. We develop Flatbuffer schemas, for input examples, to be able to store them as binary buffers and show a performance increase of 30%, or more compared to traditional formats.


Frequently asked questions

Is RL Open Source Fest considered an internship, a job, or any form of employment?

No. RL Open Source Fest is an activity that the student performs as an independent developer in collaboration with the real world reinforcement learning team at Microsoft Research, for which they are paid a stipend.

Can I work on this while doing an internship/working this summer?

Yes. Think of this as a fun project on the side.

What is the time commitment?

As much as you like!

Where does RL Open Source Fest occur?

The program occurs entirely online. There is no requirement to travel as part of the program.

What are the eligibility requirements for participation?

You must be currently enrolled in an accredited academic institute’s undergraduate, Masters, or PhD program. Check out the application process here.

Additional questions? Feel free to send them to us at