AI lab stories
Get inspired with stories, new lab projects, and examples from developers and partners.
Storytelling is at the heart of human nature. Pix2Story teaches an AI system to be creative, turning an image into a story.Explore Pix2Story
Natural Language Processing (NLP) is a field that is driving a revolution in the computer-human interaction. Pix2Story is an experiment in teaching an AI system to be creative, be inspired by a picture and take it to another level.
We wanted to see if we could create a natural and cohesive narrative showcasing NLP. We decided to create a web application on Azure, that allows users to upload a picture and get a machine-generated story based on several literary genres.
A trained visual semantic embedding model analyzes the image and generates captions. The Pix2Story application then becomes the storyteller by transforming the captions and generating a narrative.
Neural AI storytelling with Pix2Story
Pix2Story teaches an AI to be creative by taking a picture and turning it into stories.
Technical details of Pix2Story
We based our work on several papers: Skip-Thought Vectors, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books and some repositories neural storyteller. The idea is to obtain the captions from the uploaded picture and feed them to the Recurrent Neural Network model to generate the narrative based on the genre and the picture.
We trained a visual semantic embedding model on the MS COCO captions dataset of 300,000 images to make sense of the visual input by analyzing the uploaded image and generating the captions.
We also transformed the captions and generate a narrative based on the selected genre: Adventure, SciFi or Thriller. For this, we trained for 2 weeks an encoder-decoder model on more than 2000 novels.
This training allows each passage of the novels to be mapped to a skip-thought vector, a way of embedding thoughts in vector space.
This allowed us to understand not only words but the meaning of those words in context to reconstruct the surrounding sentences of an encoded passage.
We are using the new Azure Machine Learning Service as well as the azure model management SDK with Python 3 to create the docker image with these models and deploy it using AKS with GPU capability making the project ready to production.
- Download Pix2Story code at GitHub
- Learn about machine learning at AI School
- Learn about AI Services at AI School
- Review Skip-Thought Vectors
- Review Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Review Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
- Find repositories at GitHub for neural storyteller
Explore the possibilities of AI
Find demos to get more ideas or learn about AI technology to jumpstart your own development.
Start creating your own AI experiences with courses in AI technology. Learn about conversational AI, machine learning, AI for devices, and cognitive services.
Dive into interactive demos that showcase AI in simple examples that explain the various capabilities of the Microsoft AI platform.