Abstract

Strings and string operations are very widely used, particularly in applications that involve text, speech or sequences. Yet the vast majority of probabilistic models contain only numerical random variables, not strings. In this paper, we show how belief propagation can be applied to do inference in models with string random variables which use common string operations like concatenation, find/replace and formatting. Our approach is to use weighted finite state automata to represent messages and transducers to perform message computations. Using belief propagation mean that string variables can be mixed with numerical variables to create rich hybrid models. We illustrate this approach by showing inference results for hybrid models with string and numerical variables in the domains of information extraction and computational biology.