Mobile devices are ubiquitous. People use their phones as a personal concierge not only discovering information but also searching for particular interest on-the-go and making decisions. This brings a new horizon for multimedia retrieval on mobile. While existing efforts have predominantly focused on understanding textual or a voice query, this paper presents a new perspective which understands visual queries captured by the built-in camera such that mobile-based social activities can be recommended for users to complete. In this work, a query image-based contextual model is proposed for visual search. A mobile user can take a photo and naturally indicate an object-of-interest within the photo via circle based gesture called “O” gesture. Both selected object-of-interest region as well as surrounding visual context in photo are used in achieving a search-based recognition by retrieving similar images based on a large-scale of visual vocabulary tree. Consequently, social activities such as visiting contextually relevant entities (i.e., local businesses) are recommended to the users based on their visual queries and GPS location. Along with the proposed method, an exemplary real application has been developed on Windows Phone 7 devices and evaluated with a wide variety of scenarios on million-scale image database. To test the performance of proposed mobile visual search model, extensive experimentation has been conducted and compared with state-of-the-art algorithms in content-based image retrieval (CBIR) domain.