Abstract

We address the problem of tracking and reconstructing 3D human lip motions from a 2D view. This problem is challenging due both to the complex nature of lip motions and the minimal data available from a raw video stream of the face. We counter both of these difficulties with statistical approaches. We first build a physically-based 3D model of lips and train it to cover only the subspace of lip motions. We then track this model in video by finding the shape within the subspace that maximizes the posterior probability of the model given the observed features. In this study, the features are the likelihoods of the lip and non-lip color classes: we iteratively derive forces from these values to apply to the physical model and converge to the final solution. Because of the full 3D nature of the model, this framework allows us to track the lips from any head pose. In addition, because of the constraints imposed by the learned subspace of the model, we are able to accurately estimate the full 3D lip shape from the 2D view.