Abstract

Abstract. This paper proposes to employ a detailed tumor growth model to synthesize labelled images which can then be used to train an efficient data-driven machine learning tumor predictor. Our MR image synthesis step generates images with both healthy tissues as well as various tumoral tissue types. Subsequently, a discriminative algorithm based on random regression forests is trained on the simulated ground truth to predict the continuous latent tumor cell density, and the discrete tissue class associated with each voxel. The presented method makes use of a large synthetic dataset of 740 simulated cases for training and evaluation. A quantitative evaluation on 14 real clinical cases diagnosed with low-grade gliomas demonstrates tissue class accuracy comparable with state of the art, with added benefit in terms of computational efficiency and the ability to estimate tumor cell density as a latent variable underlying the multimodal image observations. The idea of synthesizing training data to train data-driven learning algorithms can be extended to other applications where expert annotation is lacking or expensive.