We present a method to enable multi-touch interactions on an arbitrary flat surface using a pair of cameras mounted above the surface. Current systems in this domain mostly make use of special touch-sensitive hardware, require cameras to be mounted behind the display, or are based on infrared sensors used in various configurations. The very few that use ordinary cameras mounted overhead for touch detection fail to do so accurately due to the difficulty in computing the proximity of fingertips to the surface with a precision that would match the behaviour of a truly touch-sensitive surface. This paper describes a novel computer vision algorithm that can robustly identify finger tips and detect touch with a precision of a few millimetres above the surface. The algorithm relies on machine learning methods and a geometric finger model to achieve the required precision, and can be ‘trained’ to work in different physical settings. We provide a quantitative evaluation of the method and demonstrate its use for gesture based interactions with ordinary tablet displays, both in single user and remote collaboration scenarios.