Many estimation tasks come in groups and hierarchies of related problems. In this paper we propose a hierarchical model and a scalable algorithm to perform inference for multitask learning. It infers task correlation and subtask structure in a joint sparse setting. Implementation is achieved by a distributed subgradient oracle and the successive application of prox-operators pertaining to groups and sub-groups of variables. We apply this algorithm to conversion optimization in display advertising. Experimental results on over 1TB data for up to 1 billion observations and 1 million attributes show that the algorithm provides significantly better prediction accuracy while simultaneously being efficiently scalable by distributed parameter synchronization.