Abstract

Classification-based reinforcement learning (RL) methods have recently been proposed as an alternative to the traditional value-function based methods. These methods use a classifier to represent a policy, where the input (features) to the classifier is the state and the output (class label) for that state is the desired action. The reinforcement-learning community knows that focusing on more important states can lead to improved performance. In this paper, we investigate the idea of focused learning in the context of classification-based RL. Specifically, we define a useful notation of state importance, which we use to prove rigorous bounds on policy loss. Furthermore, we show that a classification-based RL agent may behave arbitrarily poorly if it treats all states as equally important.