This paper presents a camera combo system for personal remote collaboration applications. The system consists of two different cameras. One camera has a wide field of view, and the other can pan/tilt/zoom (PTZ) based on analysis of the images captured by the wide angle camera. Unlike traditional approaches which usually drive the PTZ camera to follow the person or his/her head, our system is capable of capturing general objects of interest in remote collaboration. For instance, when the user raises something trying to show it to the remote person, our system will automatically position the PTZ camera to zoom in at the object. At the core of our system is a semantic saliency map that overcomes many limitations of low-level saliency maps computed from preliminary image features. We demonstrate how such a semantic saliency map can be computed through contextual analysis, sign analysis and transitional analysis, and how it can be used for PTZ camera control with a novel information loss optimization based virtual director. The effectiveness of the proposed method is demonstrated with real-world sequences.