Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

AI with creative eyes amplifies the artistic sense of everyone

July 27, 2017 | By Microsoft blog editor

By Gang Hua, Principal Researcher, Research Manager

Recent advances in the branch of artificial intelligence (AI) known as machine learning are helping everyone, including artistically challenged people such as myself, transform images and videos into creative and shareable works of art.

AI-powered computer vision techniques pioneered by researchers from Microsoft’s Redmond and Beijing research labs, for example, provide new ways for people to transfer artistic styles to their photographs and videos as well as swap the visual style of two images, such as the face of a character from the movie Avatar and Mona Lisa.

The style transfer technique for photographs, known as StyleBank, shipped this June in an update to Microsoft Pix, a smartphone application that uses intelligent algorithms published in more than 20 research papers from Microsoft Research to help users get great photos with every tap of the shutter button.

The field of style transfer research explores ways to transfer an artistic style from one image to another, such as the style of post-impressionism onto a picture of your flower garden. For applications such as Microsoft Pix, a challenge is to offer users multiple styles to choose from and the ability to transfer styles to their images quickly and efficiently.

Our solution, StyleBank, explicitly represents visual styles as a set of convolutional filter banks, with each bank representing one style. To transfer an image to a specific style, an auto-encoder decomposes the input image into multi-layer feature maps that are independent of any styles. The corresponding filter bank for a chosen style is convolved with the feature maps and then go through a decoder to render the image in the chosen style.

The network completely decouples styles from the content. Because of this explicit representation, we can both train new styles and render stylized images more efficiently compared to existing offerings in this space.

The StyleBank research is a collaboration between Beijing lab researchers Lu Yuan and Jing Liao, intern Dongdong Chen and me. We collaborated closely with the broader Microsoft Pix team within Microsoft’s research organization to integrate the style transfer feature with the smartphone application. Our team presented the work at the 2017 Conference on Computer Vision and Pattern Recognition July 21-26 in Honolulu, Hawaii.

We are also extending the StyleBank technology to render stable stylized videos in an online fashion. Our technique is described in a paper to be presented at the 2017 International Conference on Computer Vision in Venice, Italy, October 22-29.

Our approach leverages temporal information about feature correspondences between consecutive frames to achieve consistent and stable stylized video sequences in near real time. The technique adaptively blends feature maps from the previous frame and the current frame to avoid ghosting artifacts, which are prevalent in techniques that render videos frame-by-frame.

A third paper that I co-authored with Jing Liao and Lu Yuan along with my Redmond colleague Sing Bing Kang for presentation at SIGGRAPH 2017 July 30 – August 2 in Los Angeles, describes a technique for visual attribute transfer across images with distinct appearances but with perceptually similar semantic structure – that is, the images contain similar visual content.

For example, the technique can put the face of a character from the movie Avatar onto an image of Leonardo da Vinci’s famous painting of Mona Lisa and the face of Mona Lisa onto the character from Avatar. We call our technique deep image analogy. It works by finding dense semantic correspondences between two input images.

We look forward to sharing more details about these techniques to transform images and videos into creative and shareable works of art at the premier computer vision conferences this summer and fall.

Related:

Up Next

Computer vision, Graphics and multimedia

Microsoft Pix’s AI gets even smarter, Business Card feature works with LinkedIn to make it easier than ever to manage your contacts

Today's update from Microsoft debuts a new feature for its intelligent camera app, Microsoft Pix, that works with LinkedIn, showcasing the productivity capabilities of the AI-powered camera. The new Business Card feature for iPhones makes it quick and easy for you to add contacts, not only to your iPhone's address book but also to your LinkedIn account. To use, simply open Pix and point your iPhone at a contact's business card. Microsoft Pix automatically detects this and asks if you would like to take action.

Microsoft blog editor

Microsoft Pix before and after panoramic photo of Miners Landing

Artificial intelligence, Graphics and multimedia

New Microsoft Pix features let you take bigger, wider pictures and turns your videos into comics

Microsoft has released two new features with today’s update to Microsoft Pix for iOS, an app powered by a suite of intelligent algorithms developed by Microsoft researchers to take the guesswork out of getting beautiful photos and videos. The first of these features, Photosynth, helps create photos that take in more of the perspective or […]

Nicky Budd-Thanos

A snapshot from AirSim shows an aerial vehicle flying in an urban environment, training for real-world AI

Artificial intelligence

Toward AI that operates in the real world

By Ashish Kapoor, Microsoft Research It’s an exciting time to be a machine intelligence researcher. Recent successes in machine learning (ML) and artificial intelligence (AI), which span from achieving human-level parity in speech recognition to beating world champions in board games, indicate the promise of the recent methods. Most of these successes, however, are limited […]

Microsoft blog editor