Semantic segmentation is one of the popular tasks in computer vision, providing pixel-wise annotations for scene understanding. However, segmentation-based convolutional neural networks require tremendous computational power. In this work, a fully-pipelined hardware accelerator with support for dilated convolution is introduced, which cuts down the redundant zero multiplications. Furthermore, we propose a genetic algorithm based automated channel pruning technique to jointly optimize computational complexity and model accuracy. Finally, hardware heuristics and an accurate model of the custom accelerator design enable a hardware-aware pruning framework. We achieve 2.44X lower latency with minimal degradation in semantic prediction quality (−1.98 pp lower mean intersection over union) compared to the baseline DeepLabV3+ model, evaluated on an Arria-10 FPGA. The binary files of the FPGA design, baseline and pruned models can be found in github.com/pierpaolomori/SemanticSegmentationFPGA.
Accelerating and pruning CNNs for semantic segmentation on FPGA / Mori, Pierpaolo; Vemparala, Manoj-Rohit; Fasfous, Nael; Mitra, Saptarshi; Sarkar, Sreetama; Frickenstein, Alexander; Frickenstein, Lukas; Helms, Domenik; Nagaraja, Naveen-Shankar; Stechele, Walter; Passerone, Claudio. - STAMPA. - DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference:(2022), pp. 145-150. (Intervento presentato al convegno Proceedings of the 59th ACM/IEEE Design Automation Conference tenutosi a San Francisco (USA) nel 10-14 luglio 2022) [10.1145/3489517.3530424].
Accelerating and pruning CNNs for semantic segmentation on FPGA
Mori, Pierpaolo;Passerone,Claudio
2022
Abstract
Semantic segmentation is one of the popular tasks in computer vision, providing pixel-wise annotations for scene understanding. However, segmentation-based convolutional neural networks require tremendous computational power. In this work, a fully-pipelined hardware accelerator with support for dilated convolution is introduced, which cuts down the redundant zero multiplications. Furthermore, we propose a genetic algorithm based automated channel pruning technique to jointly optimize computational complexity and model accuracy. Finally, hardware heuristics and an accurate model of the custom accelerator design enable a hardware-aware pruning framework. We achieve 2.44X lower latency with minimal degradation in semantic prediction quality (−1.98 pp lower mean intersection over union) compared to the baseline DeepLabV3+ model, evaluated on an Arria-10 FPGA. The binary files of the FPGA design, baseline and pruned models can be found in github.com/pierpaolomori/SemanticSegmentationFPGA.File | Dimensione | Formato | |
---|---|---|---|
3489517.3530424.pdf
accesso aperto
Descrizione: Accelerating and Pruning CNNs for Semantic Segmentation on FPGA
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
PUBBLICO - Tutti i diritti riservati
Dimensione
3.11 MB
Formato
Adobe PDF
|
3.11 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2970797