Mining user similarity based on location history

Proceedings of the 16th ACM SIGSPATIAL conference on Advance in Geographical Information Systems |


The pervasiveness of location-acquisition technologies (GPS, GSM networks, etc.) enable people to conveniently log the location histories they visited with spatio-temporal data. The increasing availability of large amounts of spatio-temporal data pertaining to an individual’s trajectories has given rise to a variety of geographic information systems, and also brings us opportunities and challenges to automatically discover valuable knowledge from these trajectories. In this paper, we move towards this direction and aim to geographically mine the similarity between users based on their location histories. Such user similarity is significant to individuals, communities and businesses by helping them effectively retrieve the information with high relevance. A framework, referred to as hierarchical-graph-based similarity measurement (HGSM), is proposed for geographic information systems to consistently model each individual’s location history and effectively measure the similarity among users. In this framework, we take into account both the sequence property of people’s movement behaviors and the hierarchy property of geographic spaces. We evaluate this framework using the GPS data collected by 65 volunteers over a period of 6 months in the real world. As a result, HGSM outperforms related similarity measures, such as the cosine similarity and Pearson similarity measures.

Publication Downloads

GeoLife GPS Trajectories

August 9, 2012

This is a GPS trajectory dataset collected in (Microsoft Research Asia) GeoLife project by 182 users in a period of over three years (from April 2007 to August 2012). A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of which contains the information of latitude, longitude and altitude. This dataset contains 17,621 trajectories with a total distance of about 1.2 million kilometers and a total duration of 48,000+ hours. These trajectories were recorded by different GPS loggers and GPS-phones, and have a variety of sampling rates. 91 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point. This dataset recoded a broad range of users' outdoor movements, including not only life routines like go home and go to work but also some entertainments and sports activities, such as shopping, sightseeing, dining, hiking, and cycling. This trajectory dataset can be used in many research fields, such as mobility pattern mining, user activity recognition, location-based social networks, location privacy, and location recommendation.