Cameras are now everywhere. Large-scale video processing is a grand challenge representing an important frontier for analytics, what with videos from factory floors, traffic intersections, police vehicles, and retail shops. It’s the golden era for computer vision, AI, and machine learning – it’s a great time now to extract value from videos to impact science, society, and business!
Project Rocket‘s goal is to democratize video analytics: build a system for real-time, low-cost, accurate analysis of live videos. This system will work across a geo-distributed hierarchy of intelligent edges and large clouds, with the ultimate goal of making it easy and affordable for anyone with a camera stream to benefit from video analytics. For information regarding our project, please see our publications.
“Safer Cities, Safer People” US Department of Transportation Award
Institute of Transportation Engineering 2017 Achievements Award – “Video Analytics for Vision Zero”
ACM MobiSys 2017 Best Demo
Microsoft 2017 Hackathon Grand Prize Winner
Rocket: Video analytics stack
Microsoft Rocket Video Analytics Platform is now available on GitHub!
Download from GitHub | Learn about the key features
Rocket is an extensible software stack for democratizing video analytics: making it easy and affordable for anyone with a camera stream to benefit from computer vision and machine learning algorithms. Rocket allows programmers to plug-in their favorite vision algorithms while scaling across a hierarchy of intelligent edges and the cloud.
Video analytics for Vision Zero
One of the verticals this project is focused on is video streams from cameras at traffic intersections. Traffic-related accidents are among the top 10 reasons for fatalities worldwide. This project partners with jurisdictions to identify traffic details—vehicles, pedestrians, bikes—that impact traffic planning and safety.
We conducted a pilot study in Bellevue, Washington for active traffic monitoring of traffic intersections live 24×7. We hosted a traffic dashboard powered by Rocket’s video analytics live at Bellevue’s Traffic Management Center. The dashboard alerts the traffic authorities on abnormal traffic volumes. Read our case study report.
Resource-accuracy tradeoff for video queries
VideoStorm is a video analytics system that processes thousands of video analytics queries on live video streams over large clusters. Given the high costs of vision processing, resource management is crucial. We consider two key characteristics of video analytics: resource-quality tradeoff with multi-dimensional configurations, and variety in quality and lag goals. VideoStorm’s offline profiler generates query resource-quality profile, while its online scheduler allocates resources to queries to maximize performance on quality and lag, in contrast to the commonly used fair sharing of resources in clusters.
Read the full NSDI 2017 paper >
While it is promising to balance resource and quality (or accuracy) by selecting a suitable configuration (e.g., the resolution and frame rate of the input video), one must also address the significant dynamics of the configurations’ impact on video analytics accuracy. Chameleon dynamically picks the best configurations for video analytics pipelines periodically, while also efficiently searching the large space of configurations. Chameleon relies on the underlying characteristics (e.g., the velocity and sizes of objects) that affect the best configuration have enough temporal and spatial correlation to allow the search cost to be amortized over time and across multiple video feeds.
Read the full SIGCOMM 2018 paper >
Querying large video datasets with low latency and low cost
Large volumes of videos are continuously recorded from cameras deployed for traffic control and surveillance with the goal of answering “after the fact” queries: identify video frames with objects of certain classes (cars, bags) from many days of recorded video. While advancements in convolutional neural networks (CNNs) have enabled answering such queries with high accuracy, they are too expensive and slow. We build Focus, a system for low-latency and low-cost querying on large video datasets. Focus uses cheap ingestion techniques to index the videos by the objects occurring in them. At ingest-time, it uses compression and video-specific specialization of CNNs. Focus handles the lower accuracy of the cheap CNNs by judiciously leveraging expensive CNNs at query-time. To reduce query time latency, we cluster similar objects and hence avoid redundant processing. Using experiments on video streams from traffic, surveillance and news channels, we see that Focus uses 58X fewer GPU cycles than running expensive ingest processors and is 37X faster than processing all the video at query time.
Read the full OSDI 2018 paper >
Virtualizing steerable cameras
Cameras are often electronically steerable (pan, tilt, zoom or “PTZ”) and have to support multiple applications simultaneously like amber alert scanning based on license plate recognition and traffic volume monitoring. The primary challenge in supporting multiple such applications concurrently is that the view and image requirements of the applications differ. Allowing applications to directly steer the cameras inevitably leads to conflicts. Our solution virtualizes the camera hardware. With virtualization, we break the one-to-one binding between the camera and the application. The application binds itself to a virtual instance of the camera and specifies its view requirements, e.g., orientation, resolution, zoom. Our system does its best to provide the most recent view that meets the applications’ requirements.
Read the full IPSN 2017 paper >
Watch the video >
In-vehicle video analytics
We developed ParkMaster, which integrates with users’ smartphones, mounted on the car’s dashboard, to sample the presence of cars at road-side parking spots from the driver’s vehicle itself. It includes two main components: a smartphone app, which runs on the driver’s smartphone (edge), performs real-time visual analytics and a Azure cloud service that maintains a real-time database summarizing the number of available parking spaces and provides client support for location services. While the user is driving, with the smartphone placed on the windshield ParkMaster captures video with the phone’s camera and, locally processing frames in real-time, estimated the availability of roadside parking spaces. On-road experiments from two major cities in the US and Europe (Los Angeles and Paris), and a small European village show that ParkMaster achieves an overall end-to-end accuracy close to 90% with a negligible overhead (less the one megabyte / hour) in mobile cellular data consumption.
Video analytics for wireless cameras
We built Vigil, a video surveillance system, which leverages wireless cameras with edge computing capability to support real-time scene surveillance in enterprise campuses, retail stores, and across smart cities. Vigil intelligently partitions video processing between intelligent edges and the cloud to save wireless capacity, which can then be used to support additional cameras therefor increasing the spatially coverage of the surveilled region. It incorporates video analytics with novel video frame prioritization and video stream scheduling algorithms to optimize bandwidth utilization. We have tested Vigil across three sites using both White-Space and Wi-Fi networks. Depending on the level of activity in the scene, experimental results show that Vigil is able to increase geographical coverage anywhere from 5 to 200 times more than some of state-of- art systems that simply upload video streams. For a fixed region of coverage and bandwidth, Vigil outperforms the default equal throughput allocation strategy of Wi-Fi by delivering up to 25% more objects relevant to a user’s query.