Building Better Questionnaires with Probabilistic Modelling


April 30, 2013


Ricardo Silva




With the advent of new technologies such as search engines and data-rich social networks, there has been a major increase on the availability of indirect sources of measurement for social behaviour. Still, questionnaires remain an important probe into the attitude, preferences and other traits of populations of interest. This happens particularly in scientific contexts such as psychometrics, on surveying specific populations such as NHS staff members, or as a complement to noisy data, such as following up fMRI or social networks studies with questionnaires. As such, it is important to provide means to increase the quality of the data and the analysis. Although there exists a rich literature on measuring latent traits by a better design of questionnaires, a combination of theory-driven approaches and data-driven methods provides new exciting possibilities of improvement. Here we cover two aspects. First, it is of interest to keep questionnaires short such that response rates are high and artefacts of question ordering become less problematic. Designing shorter questionnaires is closely related to machine learning approaches for active learning and sampling, as we will discuss. Second, in many studies questions are designed as a way of targeting a priori latent traits, and this background knowledge can be exploited in a latent variable model within the family of “small rank plus sparse structure” models, and where algorithms based on composite likelihood approaches lead to scalable inference.