This seminar consists of four mini talks to be presented at the IEEE International Workshop on Multimedia Signal Processing (MMSP), Hangzhou, China, October 17-19, 2011.
Mini Talk 1: Crowdsourcing Region of Interest Determination for Videos, by Flavio Ribeiro, Dinei Florencio
Abstract: The ability to identify and track visually interesting regions has many practical applications – for example, in image and video compression, visual marketing and foveal machine vision. Due to challenges in modeling the peculiarities of human physiological and psychological responses, automatic detection of fixation points is an open problem. Indeed, no objective methods are currently capable of fully modeling the human perception of regions of interest (ROIs). Thus, research often relies on user studies with eye tracking systems. In this paper we propose a cost-effective and convenient alternative, obtained by having internet workers annotate videos with ROI coordinates. The workers use an interactive video player with a simulated mouse-driven fovea, which models the fall-off in resolution of the human visual system. Since this approach is not supervised, we implement methods for identifying inaccurate or alicious results. Using this proposal, one can collect ROI data in an automated fashion, and at a much lower cost than laboratory studies.
Mini Talk 2: Interpolation of Combined Head and Room Impulse Response for Audio Spatialization, by Sanjeev Mehrotra, Wei-ge Chen, Zhengyou Zhang Abstract: Audio spatialization is becoming an important part of creating realistic experiences needed for immersive video conferencing and gaming. Using a combined head and room impulse response (CHRIR) has been recently proposed as an alternative to using separate head related transfer functions (HRTF) and room impulse responses (RIR). Accurate measurements of the CHRIR at various source and listener locations and orientations are needed to perform good quality audio spatialization. However, it is infeasible to accurately measure or model the CHRIR for all possible locations and orientations. Therefore, low-complexity and accurate interpolation techniques are needed to perform audio spatialization in real-time. In this talk, we present a novel frequency domain interpolation technique which naturally interpolates the interaural level difference (ILD) and interaural time difference (ITD) for each frequency component in the spectrum. The proposed technique allows for an accurate and low-complexity interpolation of the CHRIR as well as allowing for a low-complexity audio spatialization technique which can be used for both headphones as well as loudspeakers.
Mini Talk 3: Low-Complexity, Near-Lossless Coding of Depth Maps from Kinect-Like Depth Cameras, by Sanjeev Mehrotra, Zhengyou Zhang, Qin Cai, Cha Zhang, Philip A. Chou Abstract: Depth cameras are gaining interest rapidly in the market as depth plus RGB is being used for a variety of applications ranging from foreground/background segmentation, face tracking, activity detection, and free viewpoint video rendering. In this talk, we present a novel low-complexity, near-lossless codec for coding depth maps. This coding requires no buffering of video frames, is table-less, can encode or decode a frame in close to 5ms with little code optimization, and provides between 7:1 to 16:1 compression ratio for near-lossless coding of 16-bit depth maps generated by the Kinect camera.
Mini Talk 4: ViewMark: An Interactive Videoconferencing System for Mobile Devices, by Shu Shi, Zhengyou Zhang
Abstract: ViewMark, a server-client based interactive mobile videoconferencing system is proposed in this paper to enhance the remote meeting experience for mobile users. Compared with the state-of-the-art mobile videoconferencing technology, ViewMark is novel in allowing a mobile user to interactively change the viewpoint of the remote video, create viewmarks, and hear with spatial audio. In addition, ViewMark also streams the screen of the presentation slides to mobile devices. In this paper, we introduce the system design of ViewMark in details, compare the devices that can be used to implement interactive videoconferencing, and demonstrate the prototype system we have built on Windows Mobile platform.