The need

Natural Language Processing (NLP) is a field that is driving a revolution in the computer-human interaction. Pix2Story is an experiment in teaching an AI system to be creative, be inspired by a picture and take it to another level.

The idea

We wanted to see if we could create a natural and cohesive narrative showcasing NLP. We decided to create a web application on Azure, that allows users to upload a picture and get a machine-generated story based on several literary genres.

The solution

A trained visual semantic embedding model analyses the image and generates captions. The Pix2Story application then becomes the storyteller by transforming the captions and generating a narrative.

Technical details of Pix2Story

We based our work on several papers: Skip-Thought Vectors, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books and some repositories neural storyteller. The idea is to obtain the captions from the uploaded picture and feed them to the Recurrent Neural Network model to generate the narrative based on the genre and the picture.

We trained a visual semantic embedding model on the MS COCO captions dataset of 300,000 images to make sense of the visual input by analysing the uploaded image and generating the captions.

We also transformed the captions and generate a narrative based on the selected genre: Adventure, Sci-Fi or Thriller. For this, we trained an encoder-decoder model on more than 2,000 novels for two weeks.
This training allows each passage of the novels to be mapped to a skip-thought vector, a way of embedding thoughts in vector space.

This allowed us to understand not only words, but the meaning of those words in context to reconstruct the surrounding sentences of an encoded passage.

We are using the new Azure Machine Learning Service as well as the Azure model management SDK with Python 3 to create the Docker image with these models, then deploying it using AKS with GPU capability making the project ready for production.


Projects related to Pix2Story

Browse more innovation sandbox projects

Explore the possibilities of AI

Jump-start your own AI innovations with learning resources and development solutions from Microsoft AI.