What Went Wrong and Why? Diagnosing Situated Interaction Failures in the Wild

Effective situated interaction hinges on the well-coordinated operation of a set of competencies, including computer vision, speech recognition, and natural language, as well as higher-level inferences about turn taking and engagement. Systems often rely on a set of hand-coded and machine-learned components organized into several sensing and decision-making pipelines. Given their complexity and inter-dependencies, developing and debugging such systems can be challenging. “In-the-wild” deployments outside of controlled lab conditions bring further challenges due to unanticipated phenomena, including unexpected interactions such as playful engagements. We present a methodology for assessing performance, identifying problems, and diagnosing the root causes and influences of different types of failures on the overall performance of a situated interaction system functioning in the wild. We apply the methodology to a dataset of interactions collected with a robot deployed in a public space inside an office building. The analyses identify and characterize multiple types of failures, their causes, and their relationship to overall performance. We employ models that predict overall interaction quality from various combinations of failures. Finally, we discuss lessons learned with such a diagnostic methodology for improving situated systems deployed in the wild.