Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

Motetti, Beatrice Alessandra; Risso, Matteo; Burrello, Alessio; Macii, Enrico; Poncino, Massimo; Jahier Pagliari, Daniele

doi:10.1109/tc.2024.3449084

The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory occupation improvements. These optimization techniques are usually applied independently. We propose a novel methodology to apply them jointly via a lightweight gradient-based search, and in a hardware-aware manner, greatly reducing the time required to generate Pareto-optimal DNNs in terms of accuracy versus cost (i.e., latency or memory). We test our approach on three edge-relevant benchmarks, namely CIFAR-10, Google Speech Commands, and Tiny ImageNet. When targeting the optimization of the memory footprint, we are able to achieve a size reduction of 47.50% and 69.54% at iso-accuracy with the baseline networks with all weights quantized at 8 and 2-bit, respectively. Our method surpasses a previous state-of-the-art approach with up to 56.17% size reduction at iso-accuracy. With respect to the sequential application of state-of-the-art pruning and mixed-precision optimizations, we obtain comparable or superior results, but with a significantly lowered training time. In addition, we show how well-tailored cost models can improve the cost versus accuracy trade-offs when targeting specific hardware for deployment.

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks / Motetti, Beatrice Alessandra; Risso, Matteo; Burrello, Alessio; Macii, Enrico; Poncino, Massimo; Jahier Pagliari, Daniele. - In: IEEE TRANSACTIONS ON COMPUTERS. - ISSN 0018-9340. - ELETTRONICO. - 73:11(2024), pp. 2619-2633. [10.1109/tc.2024.3449084]

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

Motetti, Beatrice Alessandra;Risso, Matteo;Burrello, Alessio;Macii, Enrico;Poncino, Massimo;Jahier Pagliari, Daniele

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2024
			
	Codice DOI
	
				https://dx.doi.org/10.1109/tc.2024.3449084
			
	Titolo della Rivista
	
				IEEE TRANSACTIONS ON COMPUTERS
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Arxiv_Joint_Pruning_and_Channel-wise_Mixed-Precision_Quantization_for_Efficient_Deep_Neural_Networks.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 3.44 MB Formato Adobe PDF Visualizza/Apri	3.44 MB	Adobe PDF	Visualizza/Apri
Joint_Pruning_and_Channel-Wise_Mixed-Precision_Quantization_for_Efficient_Deep_Neural_Networks.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 1.43 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.43 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2992757

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

PORTO @ Archivio Istituzionale della Ricerca

Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

Motetti, Beatrice Alessandra;Risso, Matteo;Burrello, Alessio;Macii, Enrico;Poncino, Massimo;Jahier Pagliari, Daniele

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)