Computer visions (CV) systems are increasingly finding new roles in domains such as healthcare. These collaborative settings are a new challenge for CV systems, requiring the design of appropriate interaction paradigms. The provision of feedback, particularly of what the CV system can “see,” is a key aspect, and may not always be possible to present visually. We explore the design space for audio feedback for a scenario of interest, the clinical assessment of Multiple Sclerosis using a CV system. We then present a mixed-methods experimental study aimed at providing some first insights into the challenges and opportunities of designing audio feedback of this kind. Specifically, we compare audio feedback that differentiates which body parts the CV system can see to audio feedback that is undifferentiated. The findings reveal that it is not enough to simply convey that something might be out of view of the camera as what the camera can “see” depends on the specific configuration of participants and the peculiarities of the skeleton inference algorithms. The results highlight the importance of providing feedback which more naturally conveys spatial information in developing CV systems for collaborative use.