The capability of managing personal photos is becoming crucial. In this work, we have attempted to solve the following pain points for mobile users: 1) intelligent photo tagging, best photo selection, event segmentation and album naming, 2) speech recognition and user intent parsing of time, location, people attributes and objects, 3) search by arbitrary queries.
We first segment and categorize the unstructured photo streams into multiple semantic-related albums in an automatic way. Second, we analyze the photo content by image tagging techniques, and further suggest the important time, location, semantic words in user albums with visual samples according to the photo distribution for each user.
Third, we learn a retrieval model that can simultaneously consider the relationship between the user input and the pre-defined vocabulary set by concept mapping, and the relationship between the vocabulary set and the photo visual content by deep learning techniques. We also consider the photo quality assessment to rank the images in the retrieval result. The whole system is shown as follows: