AI Lab stories
Get inspired with stories, new lab projects, and examples from developers and partners.
Snow Leopard Trust
AI, machine learning, and cognitive services are helping researchers find and protect threatened snow leopard populations.Learn about Snow Leopard Trust
Snow leopards are apex predators in Central Asia, known as “ghosts of the mountains” due to their elusive nature. Their population is a key indicator for the health of the whole ecosystem, but they can be difficult to spot and track in the wild.
Scientists use camera traps to spot snow leopards in their natural habitats with minimal disruption. Camera traps capture hundreds of thousands of images that need identification and analysis, a labor-intensive task that can be streamlined with AI services.
Microsoft Machine Learning for Apache Spark (MMLSpark) and the Azure Cognitive Services are used to automate image classification, allowing researchers to find well-camouflaged snow leopards within image sets more quickly, saving over 300 hours per camera survey.
Identifying snow leopards with AI
Camera traps capture hundreds of thousands of photos of snow leopards in the wild. Spotting well-camouflaged leopards within these photos is a labor-intensive task that AI can accomplish in minutes. We use MMLSpark, the Azure Cognitive Services, and Microsoft Cognitive Toolkit to automate image classification at scale.
Technical details for Snow Leopard Trust
Snow Leopards are a highly threatened species, native to the steppes and mountainous terrain of Asia. Despite their pivotal importance as this biome’s apex predator, we know very little about their numbers and behavior. Due the cats’ remote habitat, expansive range and extremely elusive nature, researchers use motion-triggered camera traps to observe snow leopards in the wild. Since the cameras trigger on any type of movement, most of the images are of goats, birds, and grass blowing in the wind. Only about 5 percent of the pictures actually contain a leopard, which can be hard to spot due to their camouflage. Over 1 million images have been gathered, and camera traps add 500,000 images each year. Manually reviewing all images to find a snow-leopards could take thousands of hours of time.
The Snow Leopard Trust used Microsoft AI to build a scalable image recognition program that is roughly 95 percent accurate in identifying snow leopards in camera trap photos. The team additionally created a live dashboard that highlights snow leopard hot spots. These spots serve as social meeting points for leopards and play important roles in their communication.
Deep Unsupervised Object Detection with Microsoft ML for Apache Spark
To create a leopard classifier, we used a technique called transfer learning where we specialize a large general-purpose vision network for a more specific classification task. In our workflow, we leverage ResNet50, a 50-layer deep convolutional network with residual connections that has been trained on the ImageNet classification challenge. Using Microsoft ML for Apache Spark, we can combine the accuracy and flexibility of deep models with the elastic scalability of Apache Spark to quickly featurize all images in the dataset and learn a classifier based on these features.
We augment our basic pipeline with several additional features to improve performance. First, we use the Azure Cognitive Services on Spark to embed large scale Bing Image Searches directly into Apache Spark. We can use some of Bing’s collective intelligence by searching for images of leopards and images of empty hillsides to augment our dataset. Additionally, we add horizontal flips to our dataset to further improve robustness. Lastly, we aggregate results over camera trap photo bursts to give the algorithm additional chances to spot a leopard in a batch of photos.
Simply classifying images of leopards is not enough to determine the number of leopards in the ecosystem. More specifically, it is tough to distinguish between an ecosystem with many shy leopards, and one with a few curious leopards that like to take selfies. To tackle this problem, we use tools like HotSpotter to identify individual leopards based on their spot patterns. However, these tools often require well-behaved, cropped images of the target animal. More explicitly, these methods require not just a leopard classifier, but a leopard detector. To transform our classifier into something that could highlight the patterns of the leopard, we created a distributed implementation of the black box model interpretability technique, LIME. Using LIME, we can refine our classifier into a model that can detect the actual patterns of the leopard, without requiring human-annotated bounding boxes.
Microsoft AI for Earth invests in environmental science
Microsoft has devoted 50 million dollars in grants to fund wildlife conservation. The AI for Earth program connects researchers in environmental science with the AI and computing resources they need to accomplish their goals. The program has also developed open-source tools to accelerate camera trap image analysis.
- Machine Learning Blog: Saving Snow Leopards with Deep Learning and Computer Vision on Spark
- Learn about image services at AI School
- Learn about machine learning at AI School
- Deep Learning Without Labels: The Challenges of Snow Leopard Conservation
- Academic Paper on Unsupervised Snow Leopard Detection on Spark
- Academic Paper on Deep Learning on Spark
- Microsoft Machine Learning for Apache Spark at GitHub
- The Azure Cognitive Services on Spark
- AI for Earth Camera Trap Initiative
- Learn about AI on Azure
- Learn about The Microsoft Cognitive Toolkit
- ResNet at arXiv
Created during a two-day hackathon, Gen Studio uses Microsoft AI to visually and creatively navigate art collections at the Metropolitan Museum of Art (The Met).Explore Gen Studio
The Met collaborated with Microsoft and MIT to explore how AI could connect people to art. The goal was to imagine new ways for global audiences to discover, learn, and create with one of the world's foremost art collections.
The team started with two questions: Can we leverage Generative Adversarial Networks (GAN) to recombine artwork in new, interactive ways? If so, can we combine this with visual search to allow everyone to explore the collection?
Gen Studio was created, allowing users to explore dreamlike images—created by a GAN—and generated by AI. Gen Studio allows us to not just create random works, but to interpolate between real artworks in the collection.
Immerse yourself in The Met collection
Gen Studio allows you to explore, search, and be immersed in The Met’s collection. Find an inspiring piece, then explore related works through immersive visual search—or recombine artwork into new experiences.
Technical details for Gen Studio
Our first visualization lets users explore a two-dimensional slice of the vast “latent” space of the GAN. Users can move throughout this space and see how the GAN’s dreams change as they bump into real pieces in The Met’s collection. Our second visualization gives the user precise control of how to blend different works together into a larger work. Gen Studio shows the inferred visual structure underlying The Met’s collection, allowing explorers to create and recombine artwork that draw from a variety of styles, materials, and forms.
To create this experience, we used a microservice architecture of deep networks, Azure services, and blob storage. We used Visual Studio Code to develop a Flask API to serve the GAN from an Azure Kubernetes Service (AKS) cluster powered by Nvidia GPUs. These services make it possible to generate new images in real-time. Azure Kubernetes Service streamlines the path to production, making it possible to quickly deploy, host, and scale the solution.
Our GAN generates images from an initial ‘seed’ or vector of 140 numbers. A core challenge we faced was how to map images from The Met to a seed that generates it. To overcome this, we used gradient-descent-based network inversion, to learn the seeds for each image. The key was instructing the network to not just match the pixels of the target image, but also its high-level characteristics and content.
We loaded the Open Access images into an Azure Databricks cluster. We used Microsoft Machine Learning for Apache Spark (MMLSpark), to enrich these images with annotations from the Azure Computer Vision API. We then built a fast visual-similarity search by featuring all images with ResNet50, and constructing a locally sensitive hash tree on these features for approximate nearest neighbor lookup. We deployed this model onto AKS, and then used MMLSpark to add these nearest neighbors to our search index. We wrote the data from our Spark cluster to the Azure Search Service. The front-end was built using React and hosted in Azure using our App Service.
Clean Water AI
Clean Water AI uses a deep learning neural network to detect dangerous bacteria and harmful particles in water. Drinking water can be seen at a microscopic level with real-time detection.Find Clean Water AI demo
Water safety can be difficult to maintain across the vast distribution of a municipal water system. Contamination by bacteria or dangerous particles is often difficult to detect before health issues occur.
AI detects water contamination issues, using trained models to recognize harmful particles and bacteria. Distributing devices that monitor water for problems will help cities detect contamination as quickly as possible.
Clean Water AI trains a neural network model, then deploys it to edge devices that classify and detect harmful bacteria and particles. Cities can install IoT devices across water sources to monitor quality in real time.
Monitoring water safety in real time
Clean Water AI uses AI and high definition cameras to detect bacteria and particles in a water source.
Technical details for Clean Water AI
Clean Water AI trains the convolutional neural network model on the cloud, then deploys it to edge devices. We used Caffe, a deep learning framework, which allows a higher frame rate when running with Intel Movidius Neural Computing Stick.
An IoT device can then classify and detect dangerous bacteria and harmful particles. The system can run continuously in real time. The cities can install IoT devices across different water sources to monitor water quality as well as contamination in real time.
Currently, Clean Water AI has been built as a proof of concept using a microscope and Up2 board. The entire prototype costs less than $500, and they’re plans to scale up production to help reduce unit costs.
Storytelling is at the heart of human nature. Pix2Story teaches an AI system to be creative, turning an image into a story.Explore Pix2Story
Natural Language Processing (NLP) is a field that is driving a revolution in the computer-human interaction. Pix2Story is an experiment in teaching an AI system to be creative, be inspired by a picture and take it to another level.
We wanted to see if we could create a natural and cohesive narrative showcasing NLP. We decided to create a web application on Azure, that allows users to upload a picture and get a machine-generated story based on several literary genres.
A trained visual semantic embedding model analyzes the image and generates captions. The Pix2Story application then becomes the storyteller by transforming the captions and generating a narrative.
Neural AI storytelling with Pix2Story
Pix2Story teaches an AI to be creative by taking a picture and turning it into stories.
Technical details of Pix2Story
We based our work on several papers: Skip-Thought Vectors, Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books and some repositories neural storyteller. The idea is to obtain the captions from the uploaded picture and feed them to the Recurrent Neural Network model to generate the narrative based on the genre and the picture.
We trained a visual semantic embedding model on the MS COCO captions dataset of 300,000 images to make sense of the visual input by analyzing the uploaded image and generating the captions.
We also transformed the captions and generate a narrative based on the selected genre: Adventure, SciFi or Thriller. For this, we trained for 2 weeks an encoder-decoder model on more than 2000 novels.
This training allows each passage of the novels to be mapped to a skip-thought vector, a way of embedding thoughts in vector space.
This allowed us to understand not only words but the meaning of those words in context to reconstruct the surrounding sentences of an encoded passage.
We are using the new Azure Machine Learning Service as well as the azure model management SDK with Python 3 to create the docker image with these models and deploy it using AKS with GPU capability making the project ready to production.
- Get Pix2Story source code on Github
- Learn how to build Pix2Story in AI School
- Learn about AI Services at AI School
- Review Skip-Thought Vectors
- Review Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Review Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
- Find repositories at GitHub for neural storyteller
Spektacom uses a mini sticker sensor on a cricket bat to collect data on the quality, speed, twist, and swing of the bat—to help professionals improve their game.
Cricket is an old sport with a dedicated following of fans across the globe. With a surge of interest in the game, professionals and amateurs are looking for ways to improve the quality of their game.
Players want data-driven assistance to improve the quality of their play. Spektacom uses sensor technology on cricket bats to harness data from real play, then creates insights from cloud-powered data analytics, machine learning, and AI.
The sensor sticker captures data and analyzes impact characteristics through wireless sensor technology and cloud analytics. The data is analyzed with AI models developed in Azure and transferred to the edge for continuous feedback.
Spectakom Power Bat
A tiny sensor sticker is attached to the bat to measure the quality, speed, and twist of the player’s swing—and the power transferred from the ball to bat at impact.
Technical details for Spectakom
Power Bats use an innovative sensor-sticker that measures the quality of a player’s shot by capturing data and analyzing impact characteristics through wireless sensor technology and cloud analytics. This unique non-intrusive sensor weighs less than five grams and is stuck behind the bat. Performance stats are used to return data-driven feedback to players and coaches.
The data from the Power Bats is analyzed with powerful AI models developed in Azure and transferred to the edge for continuous feedback to the player. In professional games, the sticker communicates using Bluetooth Low Energy (BLE) with an edge device called Stump box that is buried behind the wicket. The data from the stump box is transferred and analyzed in Azure and shot characteristics are shared with broadcasters in real-time.
Given that cricket stadiums have wireless access restrictions and stringent security requirements, to ensure secure communication between bat, edge device and Azure, Stump Box is powered by Microsoft Azure Sphere based hardware platform.
In case of amateur players, the smart bat pairs with the Spektacom mobile app to transfer and analyze sticker data in Azure. The solution is powered by Azure Sphere (Stump box), Azure IoT Hub, Azure Event Hub, Azure Functions, Azure Cosmos DB, and Azure ML 2.0.
Angel Eyes is an IoT device that monitors a baby’s sleeping position and environment. Caregivers can view a live stream from anywhere and receive notifications if the device detects any issues.
Accidental Suffocation and Strangulation in Bed (ASSB) is a prominent cause of death for newborns. Baby monitors could be smart enough to react to scenarios and data, and to proactively alert caregivers when a baby is at risk for ASSB.
A smart baby monitor uses AI to track the infant and their surrounding conditions. The system is always on, continuously monitoring and alerting caregivers. The device is non-intrusive, monitoring externally without any physical restrictions.
Angel Eyes uses Microsoft Azure, with a camera and sensors for temperature and humidity. When it detects risks like high room temperature, an object in the crib, or the baby’s unsafe position—it alerts the caregiver to take corrective action.
Safer monitoring for newborns
Angel Eyes was created by parents who wanted to track their baby’s safety at all times, without losing sleep. Learn how Microsoft Azure’s cognitive and visual services plus IoT monitoring devices helps ensure their baby’s well-being.
Technical details for Angel Eyes
Accidental Suffocation and Strangulation in Bed (ASSB) is a significant cause of infant mortality, with more than 85 percent of ASSB occurring from birth to six months of age. Infants with low birth weight have an increased risk of ABBS and SIDS. In 2016, 900 deaths in the US were reported and attributed to accidental suffocation and strangulation in bed. There is an immediate need to improve upon the classic baby monitors in order to make them smart enough to adapt to various situations, to react to various scenarios and data points, and to take appropriate corrective action.
Research identifies some key requirements for a smart monitoring system:
- Always On: Continually watching over the baby and its surrounding conditions.
- Proactive: Proactively notifying the caregiver if the surrounding conditions change.
- Non-Intrusive: Externally observing the infant’s environment without poking and prodding, thereby allowing the infant the least restrictive environment.
Angel Eyes was created to ensure a safe and non-intrusive way of monitoring the infant that is leaps and bounds ahead of the on-market baby monitor cameras utilizing the recent advancements in Deep Learning. Angel Eyes is a non-intrusive, safe and efficient monitoring system, that proactively alerts the care givers as the environment or the infant position changes.
The key components of the system are powered by Microsoft Azure and a Raspberry PI device, a Pi-Camera, a DHT11 Temperature and Humidity Sensor and utilizing the Microsoft Azure stack to relay communications back to the caregiver. This method helps us solve for multiple use cases:
a. When the infant is sleeping on their back with no objects detected, a safe light is visible.
b. When Angel Eyes visually detects that the infant has moved to their side, or that an object is in the crib, it sends an alert to the caregiver.
c. When Angel Eyes environmental sensors detects that the room temperature is high, an alert is sent to the caregiver notifying them to take corrective action.
PoseTracker uses deep learning to track the position and orientation of objects. This solution will use your phone camera to measure and track the angle, orientation, and distance of an item in real time.Learn about PoseTracker
Correctly capturing an object’s position, orientation, and identity is a major challenge—without prior information, stereo optics, or measurements—it can be hard to measure scale or distance, and object recognition requires a large labeled dataset.
Convolutional neural networks (CNN) has made significant strides in object recognition, classification, and segmentation, as used in self-driving vehicles, for example. PoseTracker leverages the power of CNN to recognize and track objects in 3D.
PoseTracker uses a patented optical marker approach to infer an object’s pose from 2D images, then tracks the position from one image to all subsequent images—based on comparisons to a predefined 3D orientation.
The complex problem of position
Tracking an object’s changing distance and position is an important challenge to solve in medical imaging, self-driving vehicles, manufacturing, drones, and many IoT applications. PoseTracker is a collaborative proof of concept to solve 3D positioning.
Technical details for PoseTracker
Convolutional neural networks, a class of deep neural network, has made significant strides in the recent years in terms of object recognition, classification and segmentation leading to significant development in self driving vehicles and a great variety of computer vision application.
However, there have been very few practical implementations of these advanced approaches in object 3D pose estimation. The ability to recognize and track the object in the 3D reference space is still a difficult problem to resolve due to some several challenging issues:
- The 3D pose information is hard to capture, requiring complicated setups involving stereo optical or magnetic localization apparatus.
- The lack of prior information about the object of interest.
- A labeled dataset with the proper pose information is very hard to obtain in large quantity. The traditional image manipulation like axis scaling and transformations will inevitably corrupt 3D pose information.
The idea is to leverage the power of CNN and implement an application to recognize and track the pose (position and orientation) of objects in 3D with a patented optical marker that will help to identify the rotation and estimate the pose of the object.
PoseTracker is a proof of concept for a simple object pose detection pipeline, integrated with rotation information based on a 3D pose tracking solution (an optical marker).
The application analyzes the 2D images taken from a camera with the optical marker always visible. The application, with a supervised training, detects the marker, that infers its orientation information from one image to all subsequent images based on comparison to a predefined 3D orientation.
This different approach to solve the pose tracker issues will help in the future, to use your phone camera get the angle, orientation, and distance that an object is from you in real time.
Explore the possibilities of AI
Find demos to get more ideas or learn about AI technology to jumpstart your own development.
Start creating your own AI experiences with courses in AI technology. Learn about conversational AI, machine learning, AI for devices, and cognitive services.
Dive into interactive demos that showcase AI in simple examples that explain the various capabilities of the Microsoft AI platform.