RGB-D Dataset 7-Scenes

Established: January 1, 2013

RGB-D Dataset 7-ScenesThe 7-Scenes dataset is a collection of tracked RGB-D camera frames. The dataset may be used for evaluation of methods for different applications such as dense tracking and mapping and relocalization techniques.


All scenes were recorded from a handheld Kinect RGB-D camera at 640×480 resolution. We use an implementation of the KinectFusion system to obtain the ‘ground truth’ camera tracks, and a dense 3D model. Several sequences were recorded per scene by different users, and split into distinct training and testing sequence sets. Details on how this data can be used for example for the evaluation of relocalization methods can be found in our papers listed under publications.

7-Scenes Overview

Data Description

For each scene, we provide one zip file which contains several sequences. Each sequence is a continuous stream of tracked RGB-D camera frames. Tracking has been performed using ICP and frame-to-model alignment with respect to a dense reconstruction represented by a truncated signed distance volume.

Each sequence (seq-XX.zip) consists of 500-1000 frames. Each frame consists of three files:

  • Color: frame-XXXXXX.color.png (RGB, 24-bit, PNG)
  • Depth: frame-XXXXXX.depth.png (depth in millimeters, 16-bit, PNG, invalid depth is set to 65535).
  • Pose: frame-XXXXXX.pose.txt (camera-to-world, 4×4 matrix in homogeneous coordinates).

For each scene, we further provide:

  • For Evaluation: TrainSplit.txt / TestSplit.txt (splits used in the papers listed under publications).
  • TSDF Volume: The dense reconstruction used for frame-to-model alignment. Volumes are stored in MetaImage format (2-file-format with a text header and binary data). Binary data stores the signed distances for each voxel in 16-bit short. All volumes are of size 512x512x512 with varying element spacings and origin offsets. The spacings and offsets in millimeters are provided in the header file.
  • A screenshot of the raycasted dense reconstruction.

Please note: The RGB and depth camera have not been calibrated and we can’t provide calibration parameters at the moment. The recorded frames correspond to the raw, uncalibrated camera images. In the KinectFusion pipeline we used the following default intrinsics for the depth camera: Principle point (320,240), Focal length (585,585).

License Agreement

The data is provided for non-commercial use only. By downloading the data, you accept the license agreement which can be downloaded here.


If you report results based on the 7-scenes dataset, please cite at least one of the papers mentioned under publications. You may choose the paper that is more relevant to your own publication.