Reward Copilot for RL-driven Systems Optimization

Karan Tandon; Manav Mishra; Gagan Somashekar; Mayukh Das; Nagarajan Natarajan

Reward Copilot for RL-driven Systems Optimization

Karan Tandon ,
Manav Mishra ,
Gagan Somashekar ,
Mayukh Das ,
Nagarajan Natarajan

Advances in Neural Information Processing Systems 2024 - Workshop on ML For Systems | November 2024

Download BibTex

Systems optimization problems such as workload auto-scaling, kernel parameter tuning, and cluster management arising in large-scale enterprise infrastructure are becoming increasingly RL-driven. While effective, it is difficult to set up the RL framework for such real-world problems – designing correct and useful reward functions or state spaces is highly challenging and needs a lot of domain expertise. Our proposed novel REWARD COPILOT solution can help design suitable and interpretable reward functions guided by client-provided specifications for any RL framework. Using experiments on standard benchmarks as well as systems-specific optimization problems, we show that our solution can return reward functions with a certain (informal) feasibility certificate, in addition to pareto-optimality