PS-MWC: A Scalable, Distributed Maximum Weight Clique Algorithm Using Apache Spark

Date

2018-08

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The maximum weight clique problem (MWCP), in which we search for a clique with the maximum sum of vertices’ weights, is a long-standing problem in graph theory. It has a wide range of applications in wireless network telecommunications, railroad dispatching, social-network analysis, and hotspot discovery. The MWCP is a challenging NP-Hard problem; therefore, sequential algorithms are not able to solve it for graphs with millions of nodes and edges. To alleviate this problem, we propose a scalable and distributed approach, PS-MWC, to solve the MWCP utilizing Apache Spark, an in-memory cluster computing framework. PS-MWC consists of two main phases: a partitioning phase and a solving phase. In the partitioning phase, we partition the original problem into smaller sub-problems in parallel, such that each sub-problem is small enough to be efficiently solved using a sequential MWCP algorithm. In the solving phase, we solve the sub-problems and get candidate solutions in parallel. We apply efficient pruning strategies in both phases, to discard sub-problems if they have no chance to beat the best candidate solution found so far. Finally, we output the best solution among all candidate solutions. We evaluated the performance of PS-MWC on a benchmark of MWCP. Experimental results showed that PS-MWC found the same solutions as state-of-the-art algorithms such as FastWClq and WLMC. We also examined the benefits of our partitioning strategy, the relationship between the number of vertices of graphs and the runtime of PS-MWC, as well as the pruning efficiency. The results showed that after partitioning, sizes of sub-problems were much less than that of the original graphs. Moreover, there was an almost linear relationship between sizes of original graphs and the runtime. High efficiency of the pruning strategy was reflected in the results that more than 99.80% of sub-problems had been pruned. For a graph with more than 3 million vertices and more than 117 million edges, PS-MWC took 1349 and 575 seconds on an EMR cluster with 64 and 128 cores, respectively.

Description

Keywords

Maximum weight clique, Apache Spark, Distributed algorithms

Citation