Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Risso, M.; Burrello, A.; Benini, L.; Macii, E.; Poncino, M.; Jahier Pagliari, D.

doi:10.1109/IGSC55832.2022.9969373

Quantization is widely employed in both cloud and edge systems to reduce the memory occupation, latency, and energy consumption of deep neural networks. In particular, mixed-precision quantization, i.e., the use of different bit-widths for different portions of the network, has been shown to provide excellent efficiency gains with limited accuracy drops, especially with optimized bit-width assignments determined by automated Neural Architecture Search (NAS) tools. State-of-The-Art mixed-precision works layer-wise, i.e., it uses different bit-widths for the weights and activations tensors of each network layer. In this work, we widen the search space, proposing a novel NAS that selects the bit-width of each weight tensor channel independently. This gives the tool the additional flexibility of assigning a higher precision only to the weights associated with the most informative features. Testing on the MLPerf Tiny benchmark suite, we obtain a rich collection of Pareto-optimal models in the accuracy vs model size and accuracy vs energy spaces. When deployed on the MPIC RISC-V edge processor, our networks reduce the memory and energy for inference by up to 63% and 27% respectively compared to a layer-wise approach, for the same accuracy.

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes / Risso, M.; Burrello, A.; Benini, L.; Macii, E.; Poncino, M.; Jahier Pagliari, D.. - ELETTRONICO. - (2022), pp. 1-6. (Intervento presentato al convegno 13th IEEE International Green and Sustainable Computing Conference, IGSC 2022 tenutosi a Pittsburgh, PA, USA nel 2022) [10.1109/IGSC55832.2022.9969373].

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Risso M.;Burrello A.;Benini L.;Macii E.;Poncino M.;Jahier Pagliari D.

2022

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2022
			
	Codice ISBN
	
				978-1-6654-6550-2
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Multi_precision_NAS__Camera_Ready_.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 1.07 MB Formato Adobe PDF Visualizza/Apri	1.07 MB	Adobe PDF	Visualizza/Apri
Risso_et_al_2022_Channel-wise_Mixed-precision_Assignment_for_DNN_Inference_on_Constrained_Edge.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 1.13 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.13 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2974504

PORTO @ Archivio Istituzionale della Ricerca

Channel-wise Mixed-precision Assignment for DNN Inference on Constrained Edge Nodes

Risso M.;Burrello A.;Benini L.;Macii E.;Poncino M.;Jahier Pagliari D.

2022

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)