Storytelling is at the heart of human nature. Pix2Story teaches an AI system to be creative, turning an image into a story.Explore Pix2Story
Natural Language Processing (NLP) is a field that is driving a revolution in the computer-human interaction. Pix2Story is an experiment in teaching an AI system to be creative, be inspired by a picture and take it to another level.
We wanted to see if we could create a natural and cohesive narrative showcasing NLP. We decided to create a web application on Azure, that allows users to upload a picture and get a machine-generated story based on several literary genres.
A trained visual semantic embedding model analyzes the image and generates captions. The Pix2Story application then becomes the storyteller by transforming the captions and generating a narrative.
Neural AI storytelling with Pix2Story
Pix2Story teaches an AI to be creative by taking a picture and turning it into stories.
Technical details of Pix2Story
We based our work on several papers: Skip-Thought Vectors, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books and some repositories neural storyteller. The idea is to obtain the captions from the uploaded picture and feed them to the Recurrent Neural Network model to generate the narrative based on the genre and the picture.
We trained a visual semantic embedding model on the MS COCO captions dataset of 300,000 images to make sense of the visual input by analyzing the uploaded image and generating the captions.
We also transformed the captions and generate a narrative based on the selected genre: Adventure, SciFi or Thriller. For this, we trained for 2 weeks an encoder-decoder model on more than 2000 novels.
This training allows each passage of the novels to be mapped to a skip-thought vector, a way of embedding thoughts in vector space.
This allowed us to understand not only words but the meaning of those words in context to reconstruct the surrounding sentences of an encoded passage.
We are using the new Azure Machine Learning Service as well as the azure model management SDK with Python 3 to create the docker image with these models and deploy it using AKS with GPU capability making the project ready to production.
- Get Pix2Story source code on Github
- Learn how to build Pix2Story in AI School
- Learn about AI Services at AI School
- Review Skip-Thought Vectors
- Review Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Review Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
- Find repositories at GitHub for neural storyteller
Machine Reading Comprehension (MRC) answers questions about written text. Using a neural network, MRC mimics the process of human readers. Ask a question and MRC reads a document until an answer is formed.
Sketch2Code converts hand-written drawings to HTML prototypes. Designers share ideas on a whiteboard, then changes are shown instantly in the browser—helping improve collaboration between the designer, developer, and customer.
Intelligent robotics uses AI to increase collaboration between people and devices. Microsoft AI enables the next generation of robots to adapt to dynamic situations and communicate naturally with people.
Style Transfer creates new images by starting with a source photo, then applying a new visual style—similar to filters in other applications. Style Transfer demonstrates how to build a simple style transfer application with AI.
Explore the possibilities of AI
Jumpstart your own AI innovations with learning resources and development solutions from Microsoft AI.
Learn to create your own AI experiences with courses in AI technology. Engage with learning paths in conversational AI, machine learning, AI for devices, cognitive services, autonomous systems, AI business strategies, and responsible AI.
Start building AI solutions with powerful tools and services. Microsoft AI is a robust framework for developing AI solutions in conversational AI, machine learning, data sciences, robotics, IoT, and more.