Reward Copilot for RL-driven Systems Optimization
- Karan Tandon ,
- Manav Mishra ,
- Gagan Somashekar ,
- Mayukh Das ,
- Nagarajan Natarajan
Advances in Neural Information Processing Systems 2024 - Workshop on ML For Systems |
Systems optimization problems such as workload auto-scaling, kernel parameter tuning, and cluster management arising in large-scale enterprise infrastructure are becoming increasingly RL-driven. While effective, it is difficult to set up the RL framework for such real-world problems – designing correct and useful reward functions or state spaces is highly challenging and needs a lot of domain expertise. Our proposed novel REWARD COPILOT solution can help design suitable and interpretable reward functions guided by client-provided specifications for any RL framework. Using experiments on standard benchmarks as well as systems-specific optimization problems, we show that our solution can return reward functions with a certain (informal) feasibility certificate, in addition to pareto-optimality