Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

AI with creative eyes amplifies the artistic sense of everyone

July 27, 2017 | By Microsoft blog editor

By Gang Hua, Principal Researcher, Research Manager

Recent advances in the branch of artificial intelligence (AI) known as machine learning are helping everyone, including artistically challenged people such as myself, transform images and videos into creative and shareable works of art.

AI-powered computer vision techniques pioneered by researchers from Microsoft’s Redmond and Beijing research labs, for example, provide new ways for people to transfer artistic styles to their photographs and videos as well as swap the visual style of two images, such as the face of a character from the movie Avatar and Mona Lisa.

The style transfer technique for photographs, known as StyleBank, shipped this June in an update to Microsoft Pix, a smartphone application that uses intelligent algorithms published in more than 20 research papers from Microsoft Research to help users get great photos with every tap of the shutter button.

The field of style transfer research explores ways to transfer an artistic style from one image to another, such as the style of post-impressionism onto a picture of your flower garden. For applications such as Microsoft Pix, a challenge is to offer users multiple styles to choose from and the ability to transfer styles to their images quickly and efficiently.

Our solution, StyleBank, explicitly represents visual styles as a set of convolutional filter banks, with each bank representing one style. To transfer an image to a specific style, an auto-encoder decomposes the input image into multi-layer feature maps that are independent of any styles. The corresponding filter bank for a chosen style is convolved with the feature maps and then go through a decoder to render the image in the chosen style.

The network completely decouples styles from the content. Because of this explicit representation, we can both train new styles and render stylized images more efficiently compared to existing offerings in this space.

The StyleBank research is a collaboration between Beijing lab researchers Lu Yuan and Jing Liao, intern Dongdong Chen and me. We collaborated closely with the broader Microsoft Pix team within Microsoft’s research organization to integrate the style transfer feature with the smartphone application. Our team presented the work at the 2017 Conference on Computer Vision and Pattern Recognition July 21-26 in Honolulu, Hawaii.

We are also extending the StyleBank technology to render stable stylized videos in an online fashion. Our technique is described in a paper to be presented at the 2017 International Conference on Computer Vision in Venice, Italy, October 22-29.

Our approach leverages temporal information about feature correspondences between consecutive frames to achieve consistent and stable stylized video sequences in near real time. The technique adaptively blends feature maps from the previous frame and the current frame to avoid ghosting artifacts, which are prevalent in techniques that render videos frame-by-frame.

A third paper that I co-authored with Jing Liao and Lu Yuan along with my Redmond colleague Sing Bing Kang for presentation at SIGGRAPH 2017 July 30 – August 2 in Los Angeles, describes a technique for visual attribute transfer across images with distinct appearances but with perceptually similar semantic structure – that is, the images contain similar visual content.

For example, the technique can put the face of a character from the movie Avatar onto an image of Leonardo da Vinci’s famous painting of Mona Lisa and the face of Mona Lisa onto the character from Avatar. We call our technique deep image analogy. It works by finding dense semantic correspondences between two input images.

We look forward to sharing more details about these techniques to transform images and videos into creative and shareable works of art at the premier computer vision conferences this summer and fall.