Voice-Driven Animation

  • Matthew Brand ,
  • Ken Shan

We introduce a model for learning a mapping between signals, and use this to drive facial animation directly from vocal cues. Instead of depending on heuristic intermediate representations such a phonemes or visems, the system learns its own representation, which includes dynamical and contextual information. In principle, this allows the system to make optimal use of context to handle ambiguity and relatively long-lasting facial co-articulation effects. The output is a eries of facial control parameters, suitable for driving many different kinds of animation ranging from photo-realistic image warps to 3D cartoon characters.