A CNN-based Approach to Detecting Text from Images of Whiteboards and Handwritten Notes

2018 International Conference on Frontiers in Handwriting Recognition |

Detecting handwritten text from images of whiteboards and handwritten notes is an important yet under-researched topic. In this paper, we propose a convolutional neural network (CNN) based approach to address this problem. First, to detect text instances of different scales, a feature pyramid network is adopted as a backbone network to extract three feature maps of different scales from a given input image, where a scale-specific detection module is attached to each feature map. Then, for a pixel on each feature map, a detection module is used to predict whether there exists a text instance at its corresponding location in the input image. For positive prediction, the bounding box of the detected text segment and the links between the concerned pixel and its 8 neighbors on the feature map are predicted simultaneously. Based on the linkage information, text segments extracted from each feature map are grouped into text-lines respectively and wrongly grouped text-lines are separated by a graph-based text-line segmentation method. Finally, detection results from three different feature maps are aggregated by a skewed non-maximum suppression algorithm. Our proposed approach has achieved superior results on a testing set consisting of 285 natural scene images of whiteboards and handwritten notes.