Abstract

With the ever-increasing use of smartphones and digital cameras, people are now able to take photos anywhere and anytime. Most of these photos simply end up stored in the cloud without further interaction. This occurs because we lack intelligent services to organize these personal photos well. Therefore, there is an urgent need for such a system to enable people to relive their memories by turning their photos into stories. This paper presents a storytelling system named Monet, which automatically creates interesting stories from personal photos by mimicking cinematic knowledge based on a set of predesigned editing styles. The system consists of two stages: photo summarization, which selects a subset of the “best” photos to represent a photo collection, and story remixing, which generates a stylish music video from the selected photos. During photo summarization, photos are grouped into events based on multimodal features (time and location). The “best” photos are then selected according to visual quality, event representativeness, and diversity. The second stage, story remixing, automatically selects an appropriate theme-dependent editing style based on the photo content. Each selected photo is converted to a video clip by applying a virtual camera with appropriate motions. A series of video effects, color filters, shapes, and transitions are then applied to the video clips according to cinematic rules. The generated video is finally multiplexed with a music clip to generate the story. Evaluations show that our system achieves superior performance to state-of-the-art photo event detection and story generation systems.