We present a technique for estimating intrinsic images from image+depth video, such as that acquired from a Kinect camera. Intrinsic image decomposition in this context has importance in applications like object modeling, in which surface colors need to be recovered without illumination effects. The proposed method is based on two new types of decomposition constraints derived from the multiple viewpoints and reconstructed 3D scene geometry of the video data. The first type provides shading constraints that enforce relationships among the shading components of different surface points according to their similarity in surface orientation. The second type imposes temporal constraints that favor consistency in the intrinsic color of a surface point seen in different video frames, which improves decomposition in cases of view-dependent non-Lambertian reflections. Local and non-local variants of the two constraints are employed in a manner complementary to local and non-local reflectance constraints used in previous works. Together they are formulated within a linear system that allows for efficient optimization. Experimental results demonstrate that each of the new constraints appreciably elevates the quality of intrinsic image estimation, and that they jointly yield decompositions that compare favorably to current techniques.