Current simulation technologies support at most hundreds of thousands of nodes, and fall short on the emerging large-scale networking systems that usually involve millions of nodes. We meet this challenge with our distributed simulation engine that is able to run millions of instances and is tested with a production P2P protocol, using commodity PC clusters. This simulation engine is part of the WiDS toolkit, which takes a holistic approach to the research and development of distributed systems. We also propose a critical optimization, called Slow Message Relaxation (SMR), to trade simulation accuracy for performance. By taking advantage of the fact that distributed protocols are resilient to network fluctuation, SMR executes events in a logical time window much wider than the conventional lookahead scheme allows. We analyze and bound the potential effect of the distortion on application logic and other general metrics. Our experiments demonstrate that the simulation engine is able to achieve order of a magnitude speedup with statistically accurate simulation results.