Atrax, a distributed web crawler

Date

August 22, 2001

Speaker

Marc Najork

Affiliation

Microsoft Research (Intern)

Overview

This talk describes Atrax, a distributed and very fast web crawler. Running Atrax on a cluster of four DS20E Alpha servers saturates our internet connection. During a recent crawl, we were able to download about 115 Mbits/sec, or about 50 million web pages per day, over a sustained period of time. Atrax has been used to collect the raw data for numerous web studies performed at Compaq Research.

Speakers

Marc Najork

Marc Najork is a senior member of the research staff at Compaq Computer Corporation’s Systems Research Center. His current research focuses on high-performance web crawling and web characterization. He was a principal contributor to Mercator, the web crawler used by AltaVista. In the past, he has worked on 3D animation, information visualization, algorithm animation, visual programming languages, and tools for web surfing. He received his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 1994.

People

‚Äč