In this talk, I will present techniques for organizing two types of photo collections downloaded from Flickr.com: (1) famous landmark sites such as the Statue of Liberty and (2) general visual concepts such as “love” and “beauty.”
In the first part of the talk, I will present a system that integrates 2D appearance and 3D geometric constraints to efficiently build 3D models of landmarks, extract scene summaries, and recognize the landmark in new test images. The system starts by clustering images using low-dimensional global “gist” descriptors and then performs geometric verification to retain only the clusters whose images share a common 3D structure. Each valid cluster is represented by a single iconic view, and geometric relationships between iconic views are captured in an iconic scene graph. In addition to serving as a compact scene summary, this graph is used to guide structure from motion to efficiently produce 3D models covering most of the scene content. The set of iconic images can be also used for recognition, i.e., determining whether new test images contain the landmark.
In the second part of the talk, I will discuss the problem of computing iconic summaries for more general visual concepts that are not characterized by rigid 3D geometry. In this case, it is more appropriate to define iconic images as representatives of subsets of the collection consistent in terms of global 2D appearance and semantics. Such subsets are found by jointly clustering images using “gist” descriptors and Flickr tags. For each joint cluster, a representative iconic image is selected using an automatic quality-based ranking scheme. To visualize the resulting summary, iconic images are grouped according to their semantic “theme” (tag-based cluster) and multidimensional scaling is used to compute a 2D layout reflecting the relationships between the themes.