Many web services aim to track clients as a basis for analyzing their behavior and providing personalized services. Despite much debate regarding the collection of client information, there have been few quantitative studies that analyze the effectiveness of host-tracking and the associated privacy risks.
In this paper, we perform a large-scale study to quantify the amount of information revealed by common host identifiers. We analyze month-long anonymized datasets collected by the Hotmail web-mail service and the Bing search engine, which include millions of hosts across the global IP address space. In this setting, we compare the use of multiple identifiers, including browser information, IP addresses, cookies, and user login IDs.
We further demonstrate the privacy and security implications of host-tracking in two contexts. In the first, we study the causes of cookie churn in web services, and show that many returning users can still be tracked even if they clear cookies or utilize private browsing. In the second, we show that host-tracking can be leveraged to improve security. Specifically, by aggregating information across hosts, we uncover a stealthy malicious attack associated with over 75,000 bot accounts that forward cookies to distributed locations.