In this paper, we present an algorithm for real-time tracking of articulated structures in dense disparity maps derived from stereo image sequences. A statistical image formation model that accounts for occlusions plays the central role in our tracking approach. This graphical model (a Bayesian network) assumes that the range image of each part of the structure is formed by drawing the depth candidates from a 3-D Gaussian distribution. The advantage over the classical mixture of Gaussians is that our model takes into account occlusions by picking the minimum depth (which could be regarded as a probabilistic version of z-buffering). The model also enforces articulation constraints among the parts of the structure. The tracking problem is formulated as an inference problem in the image formation model. This model can be extended and used for other tasks in addition to the one described in the paper and can also be used for estimating probability distribution functions instead of the ML estimates of the tracked parameters. For the purposes of real-time tracking, we used certain approximations in the inference process, which resulted in a real-time two-stage inference algorithm. We were able to successfully track upper human body motion in real time and in the presence of self-occlusions.