The Maximum Clique is a fundamental problem in graph theory and has numerous applications in various domains. The problem is known to be NP-hard, and even the most efficient algorithm requires significant computational resources when applied to medium or large graphs. To obtain substantial acceleration and improve scalability, we enable highly parallel computations by proposing a many-core graphical processing unit implementation targeting large and sparse real-world graphs. We developed our algorithm from CPU-based solvers, redesigned the graph preprocessing step, ntroduced an alternative parallelization scheme, and implemented block-level and warp-level parallelism. We show that the latter performs better when the amount of threads included in a block cannot be fully exploited. We analyze the advantages and disadvantages of the proposed strategy and its behavior on different graph topologies. Our approach, applied to sparse real-world graph instances, shows a geomean speed-up of 9x, an average speed-up of over 19x, and a peak speed-up of over 70x, compared to a parallel implementation of the BBMCSP algorithm. It also obtains a geometric mean speed-up of 1.21x and an average speed-up of over 2.0x on the same graph instances compared to the parallel implementation of the LMC algorithm.

Efficiently Computing Maximum Clique of Sparse Graphs with Many-Core Graphical Processing Units

Cardone, Lorenzo;Di Martino, Salvatore;Quer, Stefano
2024

Abstract

The Maximum Clique is a fundamental problem in graph theory and has numerous applications in various domains. The problem is known to be NP-hard, and even the most efficient algorithm requires significant computational resources when applied to medium or large graphs. To obtain substantial acceleration and improve scalability, we enable highly parallel computations by proposing a many-core graphical processing unit implementation targeting large and sparse real-world graphs. We developed our algorithm from CPU-based solvers, redesigned the graph preprocessing step, ntroduced an alternative parallelization scheme, and implemented block-level and warp-level parallelism. We show that the latter performs better when the amount of threads included in a block cannot be fully exploited. We analyze the advantages and disadvantages of the proposed strategy and its behavior on different graph topologies. Our approach, applied to sparse real-world graph instances, shows a geomean speed-up of 9x, an average speed-up of over 19x, and a peak speed-up of over 70x, compared to a parallel implementation of the BBMCSP algorithm. It also obtains a geometric mean speed-up of 1.21x and an average speed-up of over 2.0x on the same graph instances compared to the parallel implementation of the LMC algorithm.
2024
978-989-758-706-1
File in questo prodotto:
File Dimensione Formato  
20240603cameraReady.pdf

non disponibili

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 270.14 kB
Formato Adobe PDF
270.14 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2989447