A Computational Framework for Finding Interestingness Hotspots in Spatial Datasets

Date

2016-12

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The significant growth of spatial data increased the need for automated discovery of spatial knowledge. An important task when analyzing spatial data is hotspot discovery. In this dissertation, we propose a novel methodology for discovering interestingness hotspots in spatial datasets. We define interestingness hotspots as contiguous regions in space which are interesting based on a domain expert’s notion of interestingness captured by an interestingness function. We propose computational methods for finding interestingness hotspots in point-based and polygonal spatial datasets, and gridded spatial-temporal datasets. The proposed framework identifies hotspots maximizing an externally given interestingness function defined on any number of spatial or non-spatial attributes using a five-step methodology, which consists of: (1) identifying neighboring objects in the dataset, (2) generating hotspot seeds, (3) growing hotspots from identified hotspot seeds, (4) post-processing to remove highly overlapping neighboring redundant hotspots, and (5) finding the scope of hotspots. In particular, we introduce novel hotspot growing algorithms that grow hotspots from hotspot seeds. A novel growing algorithm for point-based datasets is introduced that operates on Gabriel Graphs, capturing the neighboring relationships of objects in a spatial dataset. Moreover, we present a novel graph-based post-processing algorithm, which removes highly overlapping hotspots and employs a graph simplification step that significantly improves the runtime of finding maximum weight independent set in the overlap graph of hotspots. The proposed post-processing algorithm is quite generic and can be used with any methods to cope with overlapping hotspots or clusters. Additionally, the employed graph simplification step can be adapted as a preprocessing step by algorithms that find maximum weight clique and maximum weight independent sets in graphs. Furthermore, we propose a computational framework for finding the scope of two-dimensional point-based hotspots. We evaluate our framework in case studies using a gridded air-pollution dataset, and point-based crime and taxicab datasets in which we find hotspots based on different interestingness functions and we give a comparison of our framework with a state of the art hotspot discovery technique. Experiments show that our methodology succeeds in accurately discovering interestingness hotspots and does well in comparison to traditional hotspot detection methods.

Description

Keywords

Hotspot discovery, Spatial data mining, Interestingness hotspot, Maximum weight clique, Polygon models, Polygon emptiness

Citation

Portions of this document appear in: Akdag, Fatih, Josh Urban Davis, and Christoph F. Eick. "A computational framework for finding interestingness hotspots in large spatio-temporal grids." In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data, pp. 21-29. ACM, 2014. And in: Akdag, Fatih, and Christoph F. Eick. "An optimized interestingness hotspot discovery framework for large gridded spatio-temporal datasets." In 2015 IEEE International Conference on Big Data (Big Data), pp. 2010-2019. IEEE, 2015. And in: Akdag, Fatih, and Christoph F. Eick. "Interestingness Hotspot Discovery in Spatial Datasets Using a Graph-Based Approach." In International Conference on Machine Learning and Data Mining in Pattern Recognition, pp. 530-544. Springer, Cham, 2016. And in: Akdag, Fatih, Christoph F. Eick, and Guoning Chen. "Creating polygon models for spatial clusters." In International Symposium on Methodologies for Intelligent Systems, pp. 493-499. Springer, Cham, 2014.