Microsoft Research Blog

Microsoft Research Blog

The Microsoft Research blog provides in-depth views and perspectives from our researchers, scientists and engineers, plus information about noteworthy events and conferences, scholarships, and fellowships designed for academic and scientific communities.

Researchers develop new visual intelligence techniques to boost smart home security

June 16, 2017 | By Microsoft blog editor

By Kangping Liu, Senior Research Program Manager, Microsoft Research Asia

Imagine when you leave your house or apartment that a smart home security system can automatically “look after” your home, giving you real-time notices about events happening at home, or providing you with a short video including all events of interest that happened while you were gone. With this system, parents could keep track of their older kids’ activities, and get real-time alarms about potential dangers facing their kids or elder family members. Sounds great, but these functions would make high demands on smart home systems because they require both accurate event understanding and real-time processing.

As a response to Microsoft Research Asia’s “Big Video Data Analytics” collaborative research call for proposals, Dr. Weiyao Lin, an associate professor in the electronic engineering department of Shanghai Jiao Tong University in China, has tackled these challenges with deep learning techniques. Collaborating with Dr. Tao Mei, a senior researcher at Microsoft Research Asia, Lin and his students have designed a system that is able to detect abnormal events in real time and adaptively create “online” summarization videos for user-selected events of interest. The system also allows remote interactions and controls through a smartphone.

Smart Home Security

Lin proposed a real-time event detection method based on deep learning, which integrates visual object detection, tracking, and event parsing into one single convolutional network-based framework. The method can reliably detect abnormal events–such as a person falling down–in different scenarios, in real time. Lin also developed an event-based video summarization method. Unlike most existing summarization approaches, this method performs online summarization, which embeds the summarization step in the video capturing process. In this way, the extra computation load, which is normally required in traditional offline summarization methods, can be largely saved. Moreover, Lin’s summarization method also introduces an event-based scheme that is able to automatically identify event types and adaptively create different summarization videos according to user-selected events-of-interest.

“This takes us one step further to realizing a fully automatic and highly intelligent home security system,” said Lin.

Besides home security scenarios, this system could also be applied in other locations, including shopping malls, schools, and streets. For example, the system could be deployed in classrooms to create “personalized” summary videos for the daily school activities of a pupil. It could also be used to automatically obtain statistical data about traffic violations on a crossroad (e.g., frequency of crossing red light events) or teaching activities in a class (e.g., frequency of Q&A activities).

Lin’s work was partially inspired by research on video analysis conducted in Mei’s team, as well as the MSR Video to Text (MSR-VTT) dataset, a new large-scale video benchmark for video understanding. This dataset comprises 41.2 hours and 10,000 web video clips with 200,000 clip-sentence pairs, covering diverse visual content and categories. By working with Mei, Lin constructed the initial learning models for event detection using the MSR-VTT dataset.

Among other publications from this collaborative research, Lin and Mei co-authored “A diffusion and clustering-based approach for finding coherent motions and understanding crowd scenes”, which is published at IEEE Transactions on Image Processing, vol.25, 2016.

“Dr. Lin’s work is unique in that it can create a personalized event summary from a live video stream in real-time,” said Mei. “This is very useful for a wide variety of public and home security applications.” As video data is increasing at an unprecedented level, intelligent video analysis has been an emerging and important area of study within Microsoft Research. Mei hopes to collaborate further with Lin’s team as each does further research in the video space.

This past May, Lin was invited to share this project at Microsoft Research Asia Symposium on Collaborative Research. The live demo was well received by symposium attendees and Microsoft researchers and won “Best Demo of The Year” award.

Smart Home Security

Best Demo of The Year
Left to right: Prof. Weiyao Lin, Shanghai Jiao Tong University; Dr. Tim Pan, senior director, Microsoft Research Asia

Up Next

collage of images from 2019

Artificial intelligence, Computer vision, Human-computer interaction, Security, privacy, and cryptography

Microsoft Research 2019 reflection—a year of progress on technology’s toughest challenges

Research is about achieving long-term goals, often through incremental progress. As the year comes to an end, it’s a good time to step back and reflect on the work that researchers at Microsoft and their collaborators have done to advance the state of the art in computing, particularly by increasing the capabilities and reach of […]

Microsoft blog editor

Artificial intelligence, Data platforms and analytics

Cloud computing aids researchers in solving the unsolvable in medical data labeling

It’s not uncommon for physicians to disagree about a diagnosis. That’s why people often seek a second or third opinion when faced with a serious or complex health concern. What if instead of a second opinion, hundreds of expert opinions could be collated? What if those experts were a combination of both humans and AI […]

Vani Mandava

Director, Data Science Outreach

Artificial intelligence

Making intelligence intelligible with Dr. Rich Caruana

Episode 26, May 30, 2018 - Dr. Rich Caruana talks about how the rise of deep neural networks has made understanding machine predictions more difficult for humans, and discusses an interesting class of smaller, more interpretable models that may help to make the black box nature of machine learning more transparent.

Microsoft blog editor