Exponentially better than A/B testing. Multiworld Testing (MWT) is the capability to test and optimize over K policies (context-based decision rules) using an amount of data and computation that scales logarithmically in K, without necessarily knowing these policies before or during data collection. MWT can answer exponentially more detailed questions compared to traditional A/B testing. The underlying machine learning methodology draws on research on “contextual bandits” and “counterfactual evaluation”.
A system for interactive learning. We implement MWT as MWT Decision Service, a machine learning system for making context-based decisions. The system supports the full cycle from exploration to logging to training policies to deploying them in production. Built as a cloud service, the system is widely applicable, modular, and easy to use. This is an ongoing project, released internally in Jun’15 and announced externally in Jul’16. The system is already deployed very successfully with MSN.
A typical example. Suppose one wants to optimize clicks on suggested news stories. To discover what works, one needs to explore over the possible news stories. Further, if the suggested news story can be chosen depending on the visitor’s profile, then one needs to explore over the possible “policies” that map profiles to news stories (and there are exponentially more “policies” than news stories!). Traditional machine learning fails at this because it does not explore. Whereas the Decision Service can explore continuously, and optimize decisions using this exploration data.
Team. We are a diverse group of researchers working on all aspects of MWT, spanning algorithms, machine learning, systems, and economics, and covering the entire range from theory to experiments to practical deployments. Most of us are located at Microsoft Research NYC. We can be contacted at Explore-Exploit@microsoft.com.