Object localization is an image annotation task which consists of finding the location of a target object in an image. It is common to crowdsource annotation tasks and aggregate responses to estimate the true annotation. While for other kinds of annotations consensus is simple and powerful, it cannot be applied to object localization as effectively due to the task’s rich answer space and inherent noise in responses. We propose a probabilistic graphical model to localize objects in images based on responses from the crowd. We improve upon natural aggregation methods such as the mean and the median by simultaneously estimating the difficulty level of each question and skill level of every participant. We empirically evaluate our model on crowdsourced data and show that our method outperforms simple aggregators both in estimating the true locations and in ranking participants by their ability. We also propose a simple adaptive sourcing scheme that works well for very sparse datasets.