Abstract

IP-based blacklist is an effective way to filter spam
emails. However, building and maintaining individual
IP addresses in the blacklist is difficult, as new mali-
cious hosts continuously appear and their IP addresses
may also change over time. To mitigate this problem,
researchers have proposed to replace individual IP ad-
dresses in the blacklist with IP clusters, e.g., BGP clus-
ters. In this paper, we closely examine the accuracy of
IP-cluster-based approaches to understand their effec-
tiveness and fundamental limitations. Based on such
understanding, we propose and implement a new clus-
tering approach that considers both network origin and
DNS information, and incorporate it with SpamAssas-
sin, a popular spam filtering system widely used today.
Applying our approach to a 7-month email trace col-
lected at a large university department, we can reduce
the false negative rate by 50% compared with directly
applying various public IP-based blacklists without in-
creasing the false positive rate. Furthermore, using hon-
eypot email accounts and real user accounts, we show
that our approach can capture 30% – 50% of the spam
emails that slip through SpamAssassin today.

‚Äč