We propose a new incremental motion estimation algorithm to deal with long image sequences. It applies to a sliding window of triplets of images, but unlike previous approaches, which rely on point matches across three or more views, we also use those points shared only by two views. This is important because matches between two views are more common than those across more views. The problem is formulated as a series of local bundle adjustments in such a way that the estimated camera motions in the whole sequence are consistent with each other. Two implementations are described. The first is an exact one, which, based on the observation of the sparse structure of the adjustment network, embeds the optimization of 3D structure parameters within the optimization of the camera pose parameters. This optimization embedding considerably reduces the minimization complexity. The second is a mathematical procedure which transforms the original problem involving both 3D structure and pose parameters into a much smaller one, a minimization over just the camera’s pose parameters. This leads to an even higher computational gain. Because we make full use of local image information, our technique is more accurate than previous incremental techniques, and is very close to, and considerably faster than, global bundle adjustment. Experiments with both synthetic and real data have been conducted to compare the proposed technique with other techniques, and have shown our technique to be clearly superior.