Abstract

This paper considers person re-identification (re-id) in videos. We introduce a new video re-id dataset, named Motion Analysis and Re-identification Set (MARS), a video extension of the Market-1501 dataset. To our knowledge, MARS is the largest video re-id dataset to date. Containing 1,261 IDs and around 20,000 tracklets, it provides rich visual information compared to image-based datasets. Meanwhile, MARS reaches a step closer to practice. The tracklets are automatically generated by the Deformable Part Model (DPM) as pedestrian detector and the GMMCP tracker. A number of false detection/tracking results are also included as distractors which would exist predominantly in practical video databases. Extensive evaluation of the state-of-the-art methods including the space-time descriptors and CNN is presented. We show that CNN in classification mode can be trained from scratch using the consecutive bounding boxes of each identity. The learned CNN embedding outperforms other competing methods considerably and has good generalization ability on other video re-id datasets upon fine-tuning.