Scalable and comprehensive testing of mobile apps is extremely challenging. Every test input needs to be run with a variety of contexts, such as: device heterogeneity, wireless network speeds, locations, and unpredictable sensor inputs. The range of values for each context, e.g. location, can be very large. In this paper we present a one of a kind cloud service, called ConVirt, to which app developers can submit their apps for testing. It leverages two key techniques to make app testing more tractable: (i) pruning the input space into representative contexts, and (ii) prioritizing the context test space to quickly discover failure scenarios for each app. We have implemented ConVirt as a cluster of VMs that can each emulate various combinations of contexts for tablet and phone apps. We evaluate ConVirt by testing 230 commercially available mobile apps based on a comprehensive library of real-world conditions. Our results show that ConVirt leads to an 11.4x improvement in the number of crashes discovered during testing compared to conventional UI automation (i.e., monkey-testing.).