As app quality is a deciding factor for user base growth, many automated testing services are available to reduce app developers’ burden. However, we argue that these existing services do not sufficiently bring real-world contexts into app testing, which reduces the visibility into how an unreleased app would perform in the wild. In fact, this is a challenging problem that current emulator-based or device-based testing services cannot properly or scalably address. This paper envisions a split-execution model for building automated and contextual testing services for mobile apps. This model allows the service to evolve over time, by adopting new algorithms and recruiting new physical devices. Finally, preliminary results from a prototype demonstrate the potential and feasibility of our proposed architecture.