MATAR: Multi-Quantization-Aware Training for Accurate and Fast Hardware Retargeting

Mori, Pierpaolo; Thoma, Moritz; Frickenstein, Lukas; Shambhavi Balamuthu Sampath,; Fasfous, Nael; Manoj Rohit Vemparala,; Frickenstein, Alexander; Stechele, Walter; Mueller-Gritschneder, Daniel; Passerone, Claudio

Quantization of deep neural networks (DNNs) re- duces their memory footprint and simplifies their hardware arith- metic logic, enabling efficient inference on edge devices. Different hardware targets can support different forms of quantization, e.g. full 8-bit, or 8/4/2-bit mixed-precision combinations, or fully- flexible bit-serial solutions. This makes standard quantization- aware training (QAT) of a DNN for different targets challenging, as there needs to be careful consideration of the supported quantization-levels of each target at training time. In this paper, we propose a generalized QAT solution that results in a DNN which can be retargeted to different hardware, without any retraining or prior knowledge of the hardware’s supported quantization policy. First, we present the novel training scheme which makes the model aware of multiple quantization strategies. Then we demonstrate the retargeting capabilities of the resulting DNN by using a genetic algorithm to search for layer-wise, mixed-precision solutions that maximize performance and/or accuracy on the hardware target, without the need of fine-tuning. By making the DNN agnostic of the final hardware target, our method allows DNNs to be distributed to many users on different hardware platforms, without the need for sharing the training loop or dataset of the DNN developers, nor detailing the hardware capabilities ahead of time by the end-users of the efficient quantized solution. Models trained with our approach can generalize on multiple quantization policies with minimal accuracy degradation compared to target- specific quantization counterparts.

MATAR: Multi-Quantization-Aware Training for Accurate and Fast Hardware Retargeting / Mori, Pierpaolo; Thoma, Moritz; Frickenstein, Lukas; Balamuthu Sampath, Shambhavi; Fasfous, Nael; Rohit Vemparala, Manoj; Frickenstein, Alexander; Stechele, Walter; Mueller-Gritschneder, Daniel; Passerone, Claudio. - ELETTRONICO. - (2024). (Intervento presentato al convegno Design, Automation & Test in Europe Conference (DATE 2024) tenutosi a Valencia (Spain) nel 25-27 March 2024).

MATAR: Multi-Quantization-Aware Training for Accurate and Fast Hardware Retargeting

Pierpaolo Mori;Moritz Thoma;Lukas Frickenstein;Shambhavi Balamuthu Sampath;Nael Fasfous;Manoj Rohit Vemparala;Alexander Frickenstein;Walter Stechele;Daniel Mueller-Gritschneder;Claudio Passerone

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2024
			
	Codice ISBN
	
				978-3-9819263-8-5
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
_Writing__DATE24___Multi_Branch_QNN (9).pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 442.06 kB Formato Adobe PDF Visualizza/Apri	442.06 kB	Adobe PDF	Visualizza/Apri
591_pdf_upload.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 473.73 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	473.73 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2987511

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

PORTO @ Archivio Istituzionale della Ricerca

MATAR: Multi-Quantization-Aware Training for Accurate and Fast Hardware Retargeting

Pierpaolo Mori;Moritz Thoma;Lukas Frickenstein;Shambhavi Balamuthu Sampath;Nael Fasfous;Manoj Rohit Vemparala;Alexander Frickenstein;Walter Stechele;Daniel Mueller-Gritschneder;Claudio Passerone

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)