In this paper, we study search bot traffic from search engine query
logs at a large scale. Although bots that generate search traffic
aggressively can be easily detected, a large number of distributed,
low rate search bots are difficult to identify and are often associated
with malicious attacks. We present SBotMiner, a system for
automatically identifying stealthy, low-rate search bot traffic from
query logs. Instead of detecting individual bots, our approach captures
groups of distributed, coordinated search bots. Using sampled
data from two different months, SBotMiner identifies over 123
million bot-related pageviews, accounting for 3.8% of total traffic.
Our in-depth analysis shows that a large fraction of the identified
bot traffic may be associated with various malicious activities such
as phishing attacks or vulnerability exploits. This finding suggests
that detecting search bot traffic holds great promise to detect and
stop attacks early on.