Decision forests can be thought of as a flexible optimization toolbox with many avenues to alter or recombine the underlying architectural components to improve recognition accuracy and efficiency. In this chapter, we present two fundamental approaches for re-architecting the decision forest that yield higher prediction accuracy and shortened decision time. The first is entanglement, the sharing of information between the nodes in a decision forest such that the result of the binary tests applied at each tree node depends on the results of tests applied earlier during forest growth. Unlike a standard classifier which assumes all data points (even those neighboring in space or time) are IID, entanglement approach learns semantic correlation in non IID data. To demonstrate, we build an entangled decision forest (EDF) that exploits spatial correlation in human anatomy by simultaneously labeling voxels in CT scans from 12 anatomical structures. The second approach is the formulation of information gain as a function that is differentiable with respect to the parameters of the tree node data partitioning functional. This provides increased confidence and accuracy of maximum margin boundary localization and reduces classification time by using a few, shallow trees. We further extend the method to incorporate training label confidence, when available, into the information gain maximization. Due to bagging and random feature subset selection, we can retain decision forest virtues such as resiliency to overfitting. To demonstrate, we build a gradient ascent decision forest (GADF) that tracks visual objects in videos. For both approaches, superior accuracy and computational efficiency is shown in quantitative comparisons with state of the art algorithms.