Virtual view synthesis from an array of cameras has been an essential element of three-dimensional video broadcasting/conferencing. In this paper, we propose a scheme based on a hybrid camera array consisting of four regular video cameras and one time-of-flight depth camera. During rendering, we use the depth image from the depth camera as initialization, and compute a view-dependent scene geometry using constrained plane sweeping from the regular cameras. View-dependent texture mapping is then deployed to render the scene at the desired virtual viewpoint. Experimental results show that the addition of the time-of-flight depth camera greatly improves the rendering quality compared with an array of regular cameras with similar sparsity. In the application of 3D video boardcasting/conferencing, our hybrid camera system demonstrates great potential in reducing the amount of data for compression/streaming while maintaining high rendering quality.