Outlier Detection for Information Networks
The study of networks has emerged in diverse disciplines as a means of analyzing complex relationship data. There has been a significant amount of work in network science which studies properties of networks, querying over networks, link analysis, in uence propagation, network optimization, and many other forms of network analysis. Only recently has there been some work in the area of outlier detection for information network data.
Outlier (or anomaly) detection is a very broad field and has been studied in the context of a large number of application domains. Many algorithms have been proposed for outlier detection in high-dimensional data, uncertain data, stream data and time series data. By its inherent nature, network data provides very different challenges that need to be addressed in a special way. Network data is gigantic, contains nodes of different types, rich nodes with associated attribute data, noisy attribute data, noisy link data, and is dynamically evolving in multiple ways. This thesis focuses on outlier detection for such networks with respect to two interesting perspectives: (1) community based outliers and (2) query based outliers.
The proposed concept of outlier detection from networks opens up a new direction of outlier detection research. The detected outliers, which cannot be found by traditional outlier detection techniques, provide new insights into the application area. The algorithms we developed can be applied to many areas, including social network analysis, cyber-security, distributed systems, health care, and bio-informatics. As both the amount of data as well as the linkage increase in a variety of domains, such network-based techniques will find more applications and more opportunities for research for various settings.