We present the multi-timescale collaborative tracker for single object tracking. The tracker simultaneously utilizes different types of “forces”, namely attraction, repulsion and support, to take advantage of their complementary strengths. We model the three forces via three components that are learned from the sample sets with different timescales. The long-term descriptive component attracts the target sample, while the medium-term discriminative component repulses the target from the background. They are collaborated in the appearance model to benefit each other. The short-term regressive component combines the votes of the auxiliary samples to predict the target’s position, forming the context-aware motion model. The appearance model and the motion model collaboratively determine the target state, and the optimal state is estimated by a novel coarse-to-fine search strategy. We have conducted an extensive set of experiments on the standard 50 video benchmark. The results confirm the effectiveness of each component and their collaboration, outperforming current state-of-the-art methods.