Deep convolutional neural networks achieve state-of-the-art performance in image classification. The computational and memory requirements of such networks are however huge, and that is an issue on embedded devices due to their constraints. Most of this complexity derives from the convolutional layers and in particular from the matrix multiplications they entail. This paper proposes a complete approach to image classification providing common layers used in neural networks. Namely, the proposed approach relies on a heterogeneous CPU-GPU scheme for performing convolutions in the transform domain. The Compute Unified Device Architecture(CUDA)-based implementation of the proposed approach is evaluated over three different image classification networks on a Tegra K1 CPU-GPU mobile processor. Experiments show that the presented heterogeneous scheme boasts a 50 speedup over the CPU-only reference and outperforms a GPU-based reference by 2, while slashing the power consumption by nearly 30%.

GPGPU Accelerated Deep Object Classification on a Heterogeneous Mobile Platform / Rizvi, SYED TAHIR HUSSAIN; Cabodi, Gianpiero; Patti, Denis; Francini, Gianluca. - In: ELECTRONICS. - ISSN 2079-9292. - 5:4(2016). [10.3390/electronics5040088]

GPGPU Accelerated Deep Object Classification on a Heterogeneous Mobile Platform

RIZVI, SYED TAHIR HUSSAIN;CABODI, Gianpiero;PATTI, DENIS;
2016

Abstract

Deep convolutional neural networks achieve state-of-the-art performance in image classification. The computational and memory requirements of such networks are however huge, and that is an issue on embedded devices due to their constraints. Most of this complexity derives from the convolutional layers and in particular from the matrix multiplications they entail. This paper proposes a complete approach to image classification providing common layers used in neural networks. Namely, the proposed approach relies on a heterogeneous CPU-GPU scheme for performing convolutions in the transform domain. The Compute Unified Device Architecture(CUDA)-based implementation of the proposed approach is evaluated over three different image classification networks on a Tegra K1 CPU-GPU mobile processor. Experiments show that the presented heterogeneous scheme boasts a 50 speedup over the CPU-only reference and outperforms a GPU-based reference by 2, while slashing the power consumption by nearly 30%.
2016
File in questo prodotto:
File Dimensione Formato  
electronics-05-00088.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 502.19 kB
Formato Adobe PDF
502.19 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2659082
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo