We investigate the problem of learning to rank on a cluster using Web search data composed of 140,000 queries and approximately fourteen million URLs, and a boosted tree ranking algorithm called LambdaMART. We compare to a baseline algorithm that has been carefully engineered to allow training on the full dataset using a single machine, in order to evaluate the loss or gain incurred by the distributed algorithms we consider. Our contributions are two-fold: (1) we implement a method for improving the speed of training when the training data fits in main memory on a single machine; (2) we develop a training method for the case where the training data size exceeds the main memory of a single machine that easily scales to far larger datasets, i.e., billions of examples, and is based on data distribution. Results of our methods on a real-world Web dataset indicate significant improvements in training speed.