Transforming Point Data into Proximity Graphs

dc.contributor.advisorEick, Christoph F.
dc.contributor.committeeMemberShi, Weidong
dc.contributor.committeeMemberYuan, Xiaojing
dc.creatorSubramanyam Varalakshmi, Arjun 1991-
dc.date.accessioned2018-11-30T17:23:21Z
dc.date.available2018-11-30T17:23:21Z
dc.date.createdMay 2018
dc.date.issued2018-05
dc.date.submittedMay 2018
dc.date.updated2018-11-30T17:23:21Z
dc.description.abstractEstablishing neighborhood relationships among data points is important for several data analysis applications, for which structural information is essential. Since the early 1980’s, proximity graphs have served as one of the classical approaches to characterize neighborhood relationships; the most popular proximity graphs include Delaunay Triangulation (DT), Gabriel graphs (GG), relative neighborhood graphs (RNG), and minimum spanning tree (MST). These graphs find their applications in geographical analysis, spatial analysis, pattern recognition, evolutionary biology, computer vision, cluster analysis, and visualization. The emergence of big data has created a need for scalable algorithms to generate proximity graphs for massive datasets. In this thesis, we propose a novel approach for creating DT, GG, and RNG by leveraging the cluster computing capabilities of Apache Spark. To compute DT, we rely on the divide and conquer paradigm. We first partition the spatial dataset into a grid; next, we compute the DT for each grid partition and separate the resulting triangles into the core and boundary triangles; next, we merge the adjacent partitions recursively and re-compute boundary triangles; finally, we return the union of obtained triangles as the final result. GG and RNG are implemented on top of a DT, by removing edges from the DT. In our GG algorithm, we compute the circumcenter for every triangle and associate it with edges of the triangle. The circumcenter coordinates associated with the non-boundary edges form the Voronoi edge. We filter those Delaunay edges, which do not intersect with their corresponding Voronoi edge, to obtain a GG. Relative neighborhood graphs are formed by removing those edges, for which the distance between the endpoints of an edge is less than the distance between any other point and one of the endpoints of the edge. We evaluate our algorithms to create DT, GG, and RNG for a benchmark of datasets having between one and eight million data records. Our benchmark tests showcase the significant performance improvement obtained with our algorithms and the impact of the grid partitioning technique on the runtime. We apply our algorithms to create clusters of Texas climate data and for boundary detection on a genetic dataset.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://hdl.handle.net/10657/3478
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectProximity graphs
dc.subjectApache Spark
dc.subjectGabriel graph
dc.subjectGrid partitioning
dc.subjectScalable
dc.subjectRelative neighbor graph
dc.titleTransforming Point Data into Proximity Graphs
dc.type.dcmiText
dc.type.genreThesis
local.embargo.lift2020-05-01
local.embargo.terms2020-05-01
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelMasters
thesis.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SUBRAMANYAMVARALAKSHMI-THESIS-2018.pdf
Size:
3.56 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.44 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.83 KB
Format:
Plain Text
Description: