There are a lot of applications that run on modern mobile operating systems. Inevitably, some of these applications fail in the hands of users. Diagnosing a failure to identify the culprit, or merely reproducing that failure in the lab is difficult. To get insight into this problem, we interviewed developers of five mobile applications and analyzed hundreds of trouble tickets. We find that support for diagnosing unexpected application behavior is lacking across major mobile platforms. Even when developers implement heavyweight logging during controlled trials, they do not discover many dependencies that are then stressed in the wild. They are also not well-equipped to understand how to monitor the large number of dependencies without impacting the phone’s limited resources such as CPU and battery. Based on these findings, we argue for three fundamental changes to failure reporting on mobile phones. The first is spatial spreading, which exploits the large number of phones in the field by spreading the monitoring work across them. The second is statistical inference, which builds a conditional distribution model between application behavior and its dependencies in the presence of partial information. The third is adaptive sampling, which dynamically varies what each phone monitors, to adapt to both the varying population of phones and what is being learned about each failure. We propose a system called MobiBug that combines these three techniques to simplify the task of diagnosing mobile applications.