Avoiding Negative Side Effects of Autonomous Systems in the Open World

  • Sandhya Saisubramanian ,
  • ,
  • Shlomo Zilberstein

Related File

Autonomous systems that operate in the open world often use incomplete models oftheir environment. Model incompleteness is inevitable due to the practical limitations inprecise model specification and data collection about open-world environments. Due to thelimited fidelity of the model, agent actions may producenegative side effects(NSEs) whendeployed. Negative side effects are undesirable, unmodeled effects of agent actions on theenvironment. NSEs are inherently challenging to identify at design time and may affect thereliability, usability and safety of the system. We present two complementary approachesto mitigate the NSE via: (1) learning from feedback, and (2) environment shaping. Thesolution approaches target settings with different assumptions and agent responsibilities.In learning from feedback, the agent learns a penalty function associated with a NSE. Weinvestigate the efficiency of different feedback mechanisms, including human feedback andautonomous exploration. The problem is formulated as a multi-objective Markov decisionprocess such that optimizing the agent’s assigned task is prioritized over mitigating NSE. Aslack parameter denotes the maximum allowed deviation from the optimal expected rewardfor the agent’s task in order to mitigate NSE. In environment shaping, we examine howa human can assist an agent, beyond providing feedback, and utilize their broader scopeof knowledge to mitigate the impacts of NSE. We formulate the problem as a human-agent collaboration with decoupled objectives. The agent optimizes its assigned task andmay produce NSE during its operation. The human assists the agent by performing modestreconfigurations of the environment so as to mitigate the impacts of NSE, without affectingthe agent’s ability to complete its assigned task. We present an algorithm for shaping andanalyze its properties. Empirical evaluations demonstrate the trade-offs in the performanceof different approaches in mitigating NSE in different settings.