The need

The Met collaborated with Microsoft and MIT to explore how AI could connect people to art. The goal was to imagine new ways for global audiences to discover, learn, and create with one of the world's foremost art collections.

The idea

The team started with two questions: Can we leverage Generative Adversarial Networks (GAN) to recombine artwork in new, interactive ways? If so, can we combine this with visual search to allow everyone to explore the collection?

The solution

Gen Studio was created, allowing users to explore dreamlike images—created by a GAN—and generated by AI. Gen Studio allows us to not just create random works, but to interpolate between real artworks in the collection.

Technical details for Gen Studio

Our first visualization lets users explore a two-dimensional slice of the vast “latent” space of the GAN. Users can move throughout this space and see how the GAN’s dreams change as they bump into real pieces in The Met’s collection. Our second visualization gives the user precise control of how to blend different works together into a larger work. Gen Studio shows the inferred visual structure underlying The Met’s collection, allowing explorers to create and recombine artwork that draw from a variety of styles, materials, and forms.

To create this experience, we used a microservice architecture of deep networks, Azure services, and blob storage. We used Visual Studio Code to develop a Flask API to serve the GAN from an Azure Kubernetes Service (AKS) cluster powered by Nvidia GPUs. These services make it possible to generate new images in real-time. Azure Kubernetes Service streamlines the path to production, making it possible to quickly deploy, host, and scale the solution.

Our GAN generates images from an initial ‘seed’ or vector of 140 numbers. A core challenge we faced was how to map images from The Met to a seed that generates it. To overcome this, we used gradient-descent-based network inversion, to learn the seeds for each image. The key was instructing the network to not just match the pixels of the target image, but also its high-level characteristics and content.

We loaded the Open Access images into an Azure Databricks cluster. We used Microsoft Machine Learning for Apache Spark (MMLSpark), to enrich these images with annotations from the Azure Computer Vision API. We then built a fast visual-similarity search by featuring all images with ResNet50, and constructing a locally sensitive hash tree on these features for approximate nearest neighbor lookup. We deployed this model onto AKS, and then used MMLSpark to add these nearest neighbors to our search index. We wrote the data from our Spark cluster to the Azure Search Service. The front-end was built using React and hosted in Azure using our App Service.

Resources

Projects related to Gen Studio

Browse more AI for society projects

Explore the possibilities of AI

Jumpstart your own AI innovations with learning resources and development solutions from Microsoft AI.