Most learning to rank research has assumed that the utility of presenting different documents to users is independent, producing learned ranking functions that often return redundant results. The few approaches that avoid this repetition have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a learning-to-rank formulation that optimizes the fraction of satisfied users, with a scalable algorithm that also explicitly takes document similarity and ranking context into account. We present theoretical justifications for this approach as well as a near-optimal algorithm. Our evaluation adds optimizations that improve empirical performance, and shows that our algorithms learn orders of magnitude more quickly than the previous approaches.