Abstract

Interactive image segmentation is an important computer vision problem that has numerous real world applications. Models for image segmentation are generally trained to minimize the Hamming error in pixel labeling. The Hamming loss does not ensure that the topology/structure of the object being segmented is preserved and therefore is not a strong indicator of the quality of the segmentation as perceived by users. However, it is still ubiquitously used for training models because it decomposes over pixels and thus enables efficient learning. In this paper, we propose the use of a novel family of higher-order loss functions that encourage segmentations whose layout is similar to the ground-truth segmentation. Unlike the Hamming loss, these loss functions do not decompose over pixels and therefore cannot be directly used for loss-augmented inference. We show how our loss functions can be transformed to allow efficient learning and demonstrate the effectiveness of our method on a challenging segmentation dataset and validate the results using a user study. Our experimental results reveal that training with our layout-aware loss functions results in better segmentations that are preferred by users over segmentations obtained using conventional loss functions.