Jules Kruijswijk

The multi-armed bandit problem

A large number of statistical decision problems in the social sciences and beyond can be framed as a (contextual) multi-armed bandit problem – a specific type of reinforcement learning problem. However, it is notoriously hard to develop and evaluate policies that tackle these types of problems, and to use such policies in applied studies. To address this issue, we have developed StreamingBandit, a python web application for developing and testing bandit policies in field studies. StreamingBandit can sequentially select treatments using (online) policies in real time. Once StreamingBandit is implemented in an applied context, different policies can be tested, altered, nested, and compared. StreamingBandit makes it easy to apply a multitude of bandit policies for sequential allocation in field experiments, and allows for the quick development and re-use of novel policies.

In his talk, Jules will first introduce the multi-armed bandit problem and its hurdles, and show examples of policies, after which he will detail the implementation logic of StreamingBandit and provide several examples of its use.

About Jules Kruijswijk

Jules is a data scientist and researcher interested in all types of applications of machine learning problems. He received his BSc and MSc in Artificial Intelligence at the Radboud University Nijmegen. Currently he is in his second year of his PhD project at the department of Methodology and Statistics in Tilburg. In this project, he is working on different applications of the multi-armed bandit problem. Next to developing software for in the field applications, he is improving policies for computational personalization using hierarchical Bayesian models.