Filtering Web Images Effectively

Published

Posted by Rob Knies

 CVPR 2012 logo (opens in new tab)

You’re looking for a photo of a flower. Not just any photo—it needs to be horizontal in shape. And not just any flower—it needs to be a purple flower.

Microsoft Research Podcast

AI Frontiers: Models and Systems with Ece Kamar

Ece Kamar explores short-term mitigation techniques to make these models viable components of the AI systems that give them purpose and shares the long-term research questions that will help maximize their value. 

What do you do? You could perform a conventional image search on the web. There are lots of flowers out there—lots of shapes, lots of colors. Poke around for a while, and you just might find what you need.

Alternatively, you can use the filter bar in Bing Image Search (opens in new tab), which has been augmented by work from Microsoft Research Asia (opens in new tab). You type in a textual query: “flower” and filter for “purple,” “photograph,” and “wide,” and voilà, a collection of horizontal shots of purple flowers pops up.

The color filter is thanks, in large part, to research by Jingdong Wang (opens in new tab) and Shipeng Li (opens in new tab). They are in Providence, R.I., from June 16 to 21, attending the Institute of Electrical and Electronics Engineers’ 2012 Computer Society Conference on Computer Vision and Pattern Recognition (opens in new tab) (CVPR 2012), during which they are presenting their paper (opens in new tab), written in collaboration with Peng Wang, Gang Zeng, Jie Feng, and Hongbin Zha of the Key Laboratory on Machine Perception at Peking University.

That paper is part of a substantial conference contribution from Microsoft Research, which had no fewer than 41 peer-reviewed papers accepted for CVPR 2012. Andrew Blake (opens in new tab), Microsoft distinguished scientist and managing director of Microsoft Research Cambridge (opens in new tab), is program chair for the event, and Jian Sun (opens in new tab) of Microsoft Research Asia and Richard Szeliski (opens in new tab) of Microsoft Research Redmond (opens in new tab) are serving as area chairs.

A “salient object” is the primary component in an image. Imagine a photo of a horse in a field under a blue sky. The grass might be green, and the sky might be gorgeous, but in all likelihood, the salient object in the shot is the horse.

“Our goal is to develop an effective and efficient technique to locate the salient object,” Wang explains. “Particularly, we also predict if an image contains a salient object, which has been rarely studied before.”

The detection of salient objects in images has been—for a long time. Such an ability has broad applications—image cropping, adaptive image displays on mobile devices, extracting dominant colors within images, removing images lacking an object of interest. But localizing salient objects remains a challenge.

The problem is that objects have a variety of visual characteristics, making it difficult to differentiate salient objects from an image background simply by appearance. And while low-resolution thumbnail images, which proliferate on the web, are recognizable by humans, previous detection methods make it difficult to get the reliable image segmentation those methods require for success.

The Microsoft Research Asia researchers, though, use a learning approach called a “random forest,” an algorithm that clusters data points into groups, to predict the existence and the position of a salient object in an image.

“The key,” Wang explains, “is to describe an image using a global saliency description and to conduct a classification stage, to check the existence of the salient object, and a regression stage, to check the location.”

If a salient object can be identified, using the researchers’ techniques, its dominant color is extracted. If a salient object can’t be identified—which could be possible, for instance, in a landscape shot—the dominant color of the entire image is extracted.

To validate their results, Li and Wang constructed a large image database consisting of hundreds of thousands of manually labeled web images from Bing image search and deployed their algorithm to identify the existence, predict the location, and identify the color of the salient object in thumbnail images.

Subsequent tests determined that the researchers’ algorithm significantly outperforms existing state-of-the-art methods.

“We are pleased to see that the results justified our hypothesis that salient objects essentially share common patterns,” Wang says, “even though the objects are generally different. That motivated us to investigate machine-learning tools to solve the problems.”