AI Lab stories

Get inspired with stories, new lab projects, and examples from developers and partners.

Snow Leopard Trust Gen Studio Clean Water AI Pix2Story Spektacom Angel Eyes PoseTracker

The need

Natural Language Processing (NLP) is a field that is driving a revolution in the computer-human interaction. Pix2Story is an experiment in teaching an AI system to be creative, be inspired by a picture and take it to another level.

The idea

We wanted to see if we could create a natural and cohesive narrative showcasing NLP. We decided to create a web application on Azure, that allows users to upload a picture and get a machine-generated story based on several literary genres.

The solution

A trained visual semantic embedding model analyzes the image and generates captions. The Pix2Story application then becomes the storyteller by transforming the captions and generating a narrative.

Technical details of Pix2Story

We based our work on several papers: Skip-Thought Vectors, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books and some repositories neural storyteller. The idea is to obtain the captions from the uploaded picture and feed them to the Recurrent Neural Network model to generate the narrative based on the genre and the picture.

We trained a visual semantic embedding model on the MS COCO captions dataset of 300,000 images to make sense of the visual input by analyzing the uploaded image and generating the captions.

We also transformed the captions and generate a narrative based on the selected genre: Adventure, SciFi or Thriller. For this, we trained for 2 weeks an encoder-decoder model on more than 2000 novels.
This training allows each passage of the novels to be mapped to a skip-thought vector, a way of embedding thoughts in vector space.

This allowed us to understand not only words but the meaning of those words in context to reconstruct the surrounding sentences of an encoded passage.

We are using the new Azure Machine Learning Service as well as the azure model management SDK with Python 3 to create the docker image with these models and deploy it using AKS with GPU capability making the project ready to production.

Resources:

Explore the possibilities of AI

Find demos to get more ideas or learn about AI technology to jumpstart your own development.