Abstract

Farsite is a scalable, distributed file system that logically functions as a centralized file server but that is physically implemented on a set of client desktop computers. Farsite provides high degrees of reliability and availability by storing replicas of files on multiple machines. Replicas are placed to maximize the effective system availability, using a distributed, iterative, randomized placement algorithm. We perform a large-scale simulation of three candidate algorithms using machine availability data collected from over 50,000 desktop computers. We find that algorithmic efficiency and placement efficacy run counter to each other. We fit analytic functions to the improvement rates and provide explanations for the fitted curves. We explore the algorithms’ properties through study of their dynamic behavior. We visualize algorithmic placements and compare them to theoretical worst cases. We quantify the degree of machine failure correlation and develop a formula to approximate its effect.