Atrax, a distributed web crawler
This talk describes Atrax, a distributed and very fast web crawler. Running Atrax on a cluster of four DS20E Alpha servers saturates our internet connection. During a recent crawl, we were able to download about 115 Mbits/sec, or about 50 million web pages per day, over a sustained period of time. Atrax has been used to collect the raw data for numerous web studies performed at Compaq Research.
Marc Najork is a senior member of the research staff at Compaq Computer Corporation’s Systems Research Center. His current research focuses on high-performance web crawling and web characterization. He was a principal contributor to Mercator, the web crawler used by AltaVista. In the past, he has worked on 3D animation, information visualization, algorithm animation, visual programming languages, and tools for web surfing. He received his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 1994.
- Marc Najork
- Microsoft Research (Intern)