Abstract

The Hierarchical Mixture of Experts (HME) is a well-known tree-structured model for regression and classification, based on soft probabilistic splits of the input space. In its original formulation its parameters are determined by maximum likelihood, which is prone to severe over-fitting, including singularities in the likelihood function. Furthermore the maximum likelihood framework offers no natural metric for optimizing the complexity and structure of the tree. Previous attempts to provide a Bayesian treatment of the HME model have relied either on local Gaussian representations based on the Laplace approximation, or have modified the model so that it represents the joint distribution of both input and output variables, which can be wasteful of resources if the goal is prediction. In this paper we describe a fully Bayesian treatment of the original HME model based on variational inference. By combining `local’ and `global’ variational methods we obtain a rigorous lower bound on the marginal probability of the data under the model. This bound is optimized during the training phase, and its resulting value can be used for model order selection. We present results using this approach for data sets describing robot arm kinematics.