This paper introduces a new formulation for discrete image labeling tasks, the Decision Tree Field (DTF), that combines and generalizes random forests and conditional random ﬁelds (CRF) which have been widely used in computer vision. In a typical CRF model the unary potentials are derived from sophisticated random forest or boosting based classiﬁers, however, the pairwise potentials are assumed to (1) have a simple parametric form with a pre-speciﬁed and ﬁxed dependence on the image data, and (2) to be deﬁned on the basis of a small and ﬁxed neighborhood. In contrast, in DTF, local interactions between multiple variables are determined by means of decision trees evaluated on the image data, allowing the interactions to be adapted to the image content. This results in powerful graphical models which are able to represent complex label structure. Our key technical contribution is to show that the DTF model can be trained efﬁciently and jointly using a convex approximate likelihood function, enabling us to learn over a million free model parameters. We show experimentally that for applications which have a rich and complex label structure, our model achieves excellent results.