Atrax, a distributed web crawler
Date
August 22, 2001
Speaker
Marc Najork
Affiliation
Microsoft Research (Intern)
Overview
This talk describes Atrax, a distributed and very fast web crawler. Running Atrax on a cluster of four DS20E Alpha servers saturates our internet connection. During a recent crawl, we were able to download about 115 Mbits/sec, or about 50 million web pages per day, over a sustained period of time. Atrax has been used to collect the raw data for numerous web studies performed at Compaq Research.
Speakers
Marc Najork
Marc Najork is a senior member of the research staff at Compaq Computer Corporation’s Systems Research Center. His current research focuses on high-performance web crawling and web characterization. He was a principal contributor to Mercator, the web crawler used by AltaVista. In the past, he has worked on 3D animation, information visualization, algorithm animation, visual programming languages, and tools for web surfing. He received his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 1994.