Internet Media

Established: March 27, 2000

The Internet Media group aims to build seamless yet efficient media applications and systems through breakthroughs in fundamental theory and innovations in algorithm and system technology. We address the problem of media content sensing, processing, analysis, delivery, format adaptation, and the generic scalability issues of media computing systems in terms of bandwidth, processing capability, screen resolution, memory, and battery power.

Areas of Focus:

Intelligent Video Analytics

Video is the biggest big data that contains an enormous amount of information. We are leveraging computer vision and deep learning to develop a cloud-based intelligence engine that can turn raw video data into insights to facilitate various applications and services. Target application scenarios include smart home surveillance, business (retail store, office) intelligence, public security, video storytelling and sharing, etc. We have taken a human centric approach where a significant effort has been focused on understanding human, human attributes and human behaviors. Our research has contributed to a number of video APIs offered in Microsoft Cognitive Services (


Project Titanium aims at bringing new computing experiences through enriched cloud-client computing. While data and programs can be provided as services from the cloud, the screen, referring to the entire collection of data involved in user interface, constitutes the missing third dimension. Titanium will address the problems of adaptive screen composition, representation, and processing, following the roadmap of Titanium Screen, Titanium Remote, Titanium Live, and Titanium Cloud. As “Titanium” suggests, it will provide a light-weight yet efficient solution towards ultimate computing experiences in the cloud plus service era.


Project Mira aims at enabling multimedia representation and processing towards perceptual quality rather than pixel-wise fidelity through a joint effort of signal processing, computer vision, and machine learning. In particular, it seeks to build systems not only incorporating this newly developed vision and learning technologies into compression but also inspiring new vision technologies by looking at the problem from the view of signal processing. By bridging vision and signal processing, this project is expected to offer a fresh frame of mind to multimedia representation and processing.


We envision that, with the development of sensing, networking, and storage technologies, the Internet will rapidly expand and grow into a universal network containing physical and virtual objects. This project will explore theoretical and engineering problems in such a network: at the edge, it considers massive data acquisition in wireless sensor networks and mobile networks; in the center, it addresses the interconnection between networks and data communications in their entire life cycle. This project will leverage and develop technologies in network coding, distributed compressive sensing, network optimization, and network protocols.


LiquidSilver is a media framework that works seamlessly across platforms, devices, and even applications. In LiquidSilver, media is represented in a scalable, portable, adaptive, and self-contained way. Unlike the traditional closed-form media systems, LiquidSilver consists of a set of media components and tools that facilitate capturing, editing, coding, delivery, and consumption throughout its life circle. This project advances multimedia technologies in broad Media 2.0 applications, and helps to create and consume media on demand.

Fundamental Theory for Media Representation

Despite much evolution of digital media representation, the underlying theory remains the same, leaving little room for further improvement. This project studies new representations of digital media based on the recent progresses on signal processing. We will investigate the fundamental signal processing theory for media representation leveraging local properties of the content; develop cutting-edge technologies to efficiently represent media for future standards; and build media representation systems to advance the state of the art.

Selected Publications

•Feng Wu, Honghui Sun, Guobin Shen, Shipeng Li, Ya-Qin Zhang, Bruce Lin, Ming-Chieh Li, “SMART: an efficient, scalable and robust streaming video system”, EURASIP on Applied Signal Processing, vol. 2, pp 192-206, 2004




Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks

February 2016

Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Term Memory (LSTM) can learn feature representations and model long-term temporal dependencies automatically, we propose an end-to-end fully connected deep LSTM network for skeleton based…

    Click the icon to access this download

  • Website