Safe Policy Improvement with Baseline Bootstrapping
In this umbrella project, we investigate a class of conservative Offline RL algorithms that use uncertainty estimators to decide whether they can trust their prediction to optimize their policy of they would better reproduce the…