Abstract

Malicious Web activities continue to be a major threat to the safety of online Web users. Despite the plethora forms of attacks and the diversity of their delivery channels, in the back end, they are all orchestrated through malicious Web infrastructures, which enable miscreants to do business with each other and utilize others’ resources. Identifying the linchpins of the dark infrastructures and distinguishing those valuable to the adversaries from those disposable are critical for gaining an upper hand in the battle against them. In this paper, using nearly 4 million malicious URL paths crawled from different attack channels, we perform a largescale study on the topological relations among hosts in the malicious Web infrastructure. Our study reveals the existence of a set of topologically dedicated malicious hosts that play orchestrating roles in malicious activities. They are well connected to other malicious hosts and do not receive traffic from legitimate sites. Motivated by their distinctive features in topology, we develop a graph-based approach that relies on a small set of known malicious hosts as seeds to detect dedicate malicious hosts in a large scale. Our method is general across the use of different types of seed data, and results in an expansion rate of over 12 times in detection with a low false detection rate of 2%. Many of the detected hosts operate as redirectors, in particular Traffic Distribution Systems (TDSes) that are long-lived and receive traffic from new attack campaigns over time. These TDSes play critical roles in managing malicious traffic flows. Detecting and taking down these dedicated malicious hosts can therefore have more impact on the malicious Web infrastructures than aiming at short-lived doorways or exploit sites.