PS-MWC: A Scalable, Distributed Maximum Weight Clique Algorithm Using Apache Spark

dc.contributor.advisorEick, Christoph F.
dc.contributor.committeeMemberShi, Weidong
dc.contributor.committeeMemberHan, Zhu
dc.creatorQiu, Qian 1987-
dc.date.accessioned2019-05-23T14:16:31Z
dc.date.createdAugust 2018
dc.date.issued2018-08
dc.date.submittedAugust 2018
dc.date.updated2019-05-23T14:16:31Z
dc.description.abstractThe maximum weight clique problem (MWCP), in which we search for a clique with the maximum sum of vertices’ weights, is a long-standing problem in graph theory. It has a wide range of applications in wireless network telecommunications, railroad dispatching, social-network analysis, and hotspot discovery. The MWCP is a challenging NP-Hard problem; therefore, sequential algorithms are not able to solve it for graphs with millions of nodes and edges. To alleviate this problem, we propose a scalable and distributed approach, PS-MWC, to solve the MWCP utilizing Apache Spark, an in-memory cluster computing framework. PS-MWC consists of two main phases: a partitioning phase and a solving phase. In the partitioning phase, we partition the original problem into smaller sub-problems in parallel, such that each sub-problem is small enough to be efficiently solved using a sequential MWCP algorithm. In the solving phase, we solve the sub-problems and get candidate solutions in parallel. We apply efficient pruning strategies in both phases, to discard sub-problems if they have no chance to beat the best candidate solution found so far. Finally, we output the best solution among all candidate solutions. We evaluated the performance of PS-MWC on a benchmark of MWCP. Experimental results showed that PS-MWC found the same solutions as state-of-the-art algorithms such as FastWClq and WLMC. We also examined the benefits of our partitioning strategy, the relationship between the number of vertices of graphs and the runtime of PS-MWC, as well as the pruning efficiency. The results showed that after partitioning, sizes of sub-problems were much less than that of the original graphs. Moreover, there was an almost linear relationship between sizes of original graphs and the runtime. High efficiency of the pruning strategy was reflected in the results that more than 99.80% of sub-problems had been pruned. For a graph with more than 3 million vertices and more than 117 million edges, PS-MWC took 1349 and 575 seconds on an EMR cluster with 64 and 128 cores, respectively.
dc.description.departmentComputer Science, Department of
dc.format.digitalOriginborn digital
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/10657/3989
dc.language.isoeng
dc.rightsThe author of this work is the copyright owner. UH Libraries and the Texas Digital Library have their permission to store and provide access to this work. Further transmission, reproduction, or presentation of this work is prohibited except with permission of the author(s).
dc.subjectMaximum weight clique
dc.subjectApache Spark
dc.subjectDistributed algorithms
dc.titlePS-MWC: A Scalable, Distributed Maximum Weight Clique Algorithm Using Apache Spark
dc.type.dcmiText
dc.type.genreThesis
local.embargo.lift2020-08-01
local.embargo.terms2020-08-01
thesis.degree.collegeCollege of Natural Sciences and Mathematics
thesis.degree.departmentComputer Science, Department of
thesis.degree.disciplineComputer Science
thesis.degree.grantorUniversity of Houston
thesis.degree.levelMasters
thesis.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
QIU-THESIS-2018.pdf
Size:
1.56 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
4.42 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
1.81 KB
Format:
Plain Text
Description: