Live Video Analytics

Live Video Analytics




Microsoft Research blog


Cameras are now everywhere. Large-scale video processing is a grand challenge representing an important frontier for analytics, what with videos from factory floors, traffic intersections, police vehicles, and retail shops. It’s the golden era for computer vision, AI, and machine learning – it’s a great time now to extract value from videos to impact science, society, and business!

Project Rocket‘s goal is to democratize video analytics: build a system for real-time, low-cost, accurate analysis of live videos. This system will work across a geo-distributed hierarchy of intelligent edges and large clouds, with the ultimate goal of making it easy and affordable for anyone with a camera stream to benefit from video analytics. For information regarding our project, please see our publications.


icon: award ribbon“Safer Cities, Safer People” US Department of Transportation Award

icon: award ribbonInstitute of Transportation Engineering 2017 Achievements Award – “Video Analytics for Vision Zero

icon: award ribbonACM MobiSys 2017 Best Demo

icon: award ribbonMicrosoft 2017 Hackathon Grand Prize Winner

Rocket: Video analytics stack

Microsoft Rocket Video Analytics Platform is now available on GitHub!

Download from GitHubLearn about the key features

Rocket is an extensible software stack for democratizing video analytics: making it easy and affordable for anyone with a camera stream to benefit from computer vision and machine learning algorithms. Rocket allows programmers to plug-in their favorite vision algorithms while scaling across a hierarchy of intelligent edges and the cloud.

Video Analytics Stack graphic

Video analytics for Vision Zero

One of the verticals this project is focused on is video streams from cameras at traffic intersections. Traffic-related accidents are among the top 10 reasons for fatalities worldwide. This project partners with jurisdictions to identify traffic details—vehicles, pedestrians, bikes—that impact traffic planning and safety.

We conducted a pilot study in Bellevue, Washington for active traffic monitoring of traffic intersections live 24×7. We hosted a traffic dashboard powered by Rocket’s video analytics live at Bellevue’s Traffic Management Center. The dashboard alerts the traffic authorities on abnormal traffic volumes. Read our case study report.

Screenshot: dashboard of traffic analysis in Bellevue, WA

Resource-accuracy tradeoff for video queries

VideoStorm is a video analytics system that processes thousands of video analytics queries on live video streams over large clusters. Given the high costs of vision processing, resource management is crucial. We consider two key characteristics of video analytics: resource-quality tradeoff with multi-dimensional configurations, and variety in quality and lag goals. VideoStorm’s offline profiler generates query resource-quality profile, while its online scheduler allocates resources to queries to maximize performance on quality and lag, in contrast to the commonly used fair sharing of resources in clusters.

Read the full NSDI 2017 paper >

Graphic: VideoStorm Manager                       Graph: quality vs. cost

While it is promising to balance resource and quality (or accuracy) by selecting a suitable configuration (e.g., the resolution and frame rate of the input video), one must also address the significant dynamics of the configurations’ impact on video analytics accuracy. Chameleon dynamically picks the best configurations for video analytics pipelines periodically, while also efficiently searching the large space of configurations. Chameleon relies on the underlying characteristics (e.g., the velocity and sizes of objects) that affect the best configuration have enough temporal and spatial correlation to allow the search cost to be amortized over time and across multiple video feeds.

Read the full SIGCOMM 2018 paper >

Querying large video datasets with low latency and low cost

Large volumes of videos are continuously recorded from cameras deployed for traffic control and surveillance with the goal of answering “after the fact” queries: identify video frames with objects of certain classes (cars, bags) from many days of recorded video. While advancements in convolutional neural networks (CNNs) have enabled answering such queries with high accuracy, they are too expensive and slow. We build Focus, a system for low-latency and low-cost querying on large video datasets. Focus uses cheap ingestion techniques to index the videos by the objects occurring in them. At ingest-time, it uses compression and video-specific specialization of CNNs. Focus handles the lower accuracy of the cheap CNNs by judiciously leveraging expensive CNNs at query-time. To reduce query time latency, we cluster similar objects and hence avoid redundant processing. Using experiments on video streams from traffic, surveillance and news channels, we see that Focus uses 58X fewer GPU cycles than running expensive ingest processors and is 37X faster than processing all the video at query time.

Read the full OSDI 2018 paper >

Graphic: Ingest time vs. Quality time

Virtualizing steerable cameras

Cameras are often electronically steerable (pan, tilt, zoom or “PTZ”) and have to support multiple applications simultaneously like amber alert scanning based on license plate recognition and traffic volume monitoring. The primary challenge in supporting multiple such applications concurrently is that the view and image requirements of the applications differ. Allowing applications to directly steer the cameras inevitably leads to conflicts. Our solution virtualizes the camera hardware. With virtualization, we break the one-to-one binding between the camera and the application. The application binds itself to a virtual instance of the camera and specifies its view requirements, e.g., orientation, resolution, zoom. Our system does its best to provide the most recent view that meets the applications’ requirements.

Read the full IPSN 2017 paper >

Watch the video >

Camera view showing tagged areas

In-vehicle video analytics

Camera view from within car

We developed ParkMaster, which integrates with users’ smartphones, mounted on the car’s dashboard, to sample the presence of cars at road-side parking spots from the driver’s vehicle itself. It includes two main components: a smartphone app, which runs on the driver’s smartphone (edge), performs real-time visual analytics and a Azure cloud service that maintains a real-time database summarizing the number of available parking spaces and provides client support for location services. While the user is driving, with the smartphone placed on the windshield ParkMaster captures video with the phone’s camera and, locally processing frames in real-time, estimated the availability of roadside parking spaces. On-road experiments from two major cities in the US and Europe (Los Angeles and Paris), and a small European village show that ParkMaster achieves an overall end-to-end accuracy close to 90% with a negligible overhead (less the one megabyte / hour) in mobile cellular data consumption.

Video analytics for wireless cameras

We built Vigil, a video surveillance system, which leverages wireless cameras with edge computing capability to support real-time scene surveillance in enterprise campuses, retail stores, and across smart cities. Vigil intelligently partitions video processing between intelligent edges and the cloud to save wireless capacity, which can then be used to support additional cameras therefor increasing the spatially coverage of the surveilled region. It incorporates video analytics with novel video frame prioritization and video stream scheduling algorithms to optimize bandwidth utilization. We have tested Vigil across three sites using both White-Space and Wi-Fi networks. Depending on the level of activity in the scene, experimental results show that Vigil is able to increase geographical coverage anywhere from 5 to 200 times more than some of state-of- art systems that simply upload video streams. For a fixed region of coverage and bandwidth, Vigil outperforms the default equal throughput allocation strategy of Wi-Fi by delivering up to 25% more objects relevant to a user’s query.

Graphic: Edge computing node detection Graphic: set of 5 camera views






  • Portrait of Haoyu Zhang

    Haoyu Zhang

    Intern - 2015

    Princeton University

  • Portrait of Shubham Jain

    Shubham Jain

    Intern - 2015, 2016

    Rutgers University

  • Portrait of Yao Lu

    Yao Lu

    Intern - 2015, 2016

    University of Washington

  • Portrait of Michael Hung

    Michael Hung

    Intern - 2016

    University of Southern California

  • Portrait of Giulio Grassi

    Giulio Grassi

    Intern - 2015, 2016

    Sorbonne Université / LIP6

  • Portrait of Kevin Hsieh

    Kevin Hsieh

    Intern - 2017

    Carnegie Mellon University

  • Portrait of Enrique  Saurez Apuy

    Enrique Saurez Apuy

    Intern - 2017

    Georgia Tech

Public talks

Keynotes, seminars, conferences

Keynote talks

  • IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (October 23rd, 2017)
    Victor Bahl, “Live Video Analytics
  • 3rd IEEE International Conference on Collaboration and Internet Computing (October 15th, 2017)
    Victor Bahl, “Democratizing Video Analytics
  • Emerging Topics in Computing Symposium, University of Buffalo Computer Systems Engineering Dept. 50th Anniversary (September 29th, 2017)
    Victor Bahl, “Live Video Analytics the Perfect Edge Computing Application
  • 35th IEEE International Performance Computing and Communications Conference (December 10th, 2016)
    Victor Bahl, “Distributed Video Analytics

University department seminars

  • ETH Zurich (Aug 2017)
    Ganesh Ananthanarayanan, “Taming the Video Star! Real-time Video Analytics at Scale”
  • University of California at Berkeley (May 2017)
    Ganesh Ananthanarayanan, “Taming the Video Star! Real-time Video Analytics at Scale”
  • Washington University of St. Louis (April 28, 2017)
    Victor Bahl, “Live Video Analytics the Perfect Edge Computing Application”
  • Cornell University (April 2017)
    Ganesh Ananthanarayanan, “Taming the Video Star! Real-time Video Analytics at Scale”

Miscellaneous Invited Talks

  • Ganesh Ananthanarayanan, “Video Analytics for Vision Zero”, Microsoft Office of the CTO Summit (February 2017)
  • Victor Bahl, “Distributed Video Analytics”, The First IEEE/ACM Symposium on Edge Computing, Washington DC, USA (October 28th 2016)
  • Peter Bodik, “Cameras everywhere! Video Analytics at Scale”, Microsoft Research Faculty Summit, Redmond, WA (July 13th, 2016)


  • Haoyu Zhang, “Live Video Analytics at Scale with Approximation and Delay-Tolerance”, USENIX NSDI, Boston, MA, 2017.
  • Aakanksha Chowdhery, “The Design and Implementation of a Wireless Video Surveillance System”, ACM MobiCom, Paris, France, 2015.


In the news