HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology

Fasfous, Nael; Vemparala, Manoj Rohit; Frickenstein, Alexander; Valpreda, Emanuele; Salihu, Driton; Doan, Nguyen Anh Vu; Unger, Christian; Nagaraja, Naveen Shankar; Martina, Maurizio; Stechele, Walter

doi:10.1145/3476997

Model compression through quantization is commonly applied to convolutional neural networks (CNNs) deployed on compute and memory-constrained embedded platforms. Different layers of the CNN can have varying degrees of numerical precision for both weights and activations, resulting in a large search space. Together with the hardware (HW) design space, the challenge of finding the globally optimal HW-CNN combination for a given application becomes daunting. To this end, we propose HW-FlowQ, a systematic approach that enables the co-design of the target hardware platform and the compressed CNN model through quantization. The search space is viewed at three levels of abstraction, allowing for an iterative approach for narrowing down the solution space before reaching a high-fidelity CNN hardware modeling tool, capable of capturing the effects of mixed-precision quantization strategies on different hardware architectures (processing unit counts, memory levels, cost models, dataflows) and two types of computation engines (bit-parallel vectorized, bit-serial). To combine both worlds, a multi-objective non-dominated sorting genetic algorithm (NSGA-II) is leveraged to establish a Pareto-optimal set of quantization strategies for the target HW-metrics at each abstraction level. HW-FlowQ detects optima in a discrete search space and maximizes the task-related accuracy of the underlying CNN while minimizing hardware-related costs. The Pareto-front approach keeps the design space open to a range of non-dominated solutions before refining the design to a more detailed level of abstraction. With equivalent prediction accuracy, we improve the energy and latency by 20% and 45% respectively for ResNet56 compared to existing mixed-precision search methods.

HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology / Fasfous, Nael; Vemparala, Manoj Rohit; Frickenstein, Alexander; Valpreda, Emanuele; Salihu, Driton; Doan, Nguyen Anh Vu; Unger, Christian; Nagaraja, Naveen Shankar; Martina, Maurizio; Stechele, Walter. - In: ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS. - ISSN 1539-9087. - ELETTRONICO. - 20:5s(2021), pp. 1-25. [10.1145/3476997]

HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology

Fasfous, Nael;Vemparala, Manoj Rohit;Frickenstein, Alexander;Valpreda, Emanuele;Salihu, Driton;Doan, Nguyen Anh Vu;Unger, Christian;Nagaraja, Naveen Shankar;Martina, Maurizio;Stechele, Walter

2021

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2021
			
	Codice DOI
	
				https://dx.doi.org/10.1145/3476997
			
	Titolo della Rivista
	
				ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
3476997.pdf accesso riservato Descrizione: Articolo pubblicato su ACM TECS Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 3.45 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.45 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
CODES2021___HW_FlowQ_accepted.pdf accesso aperto Descrizione: Versione accettata Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 4.67 MB Formato Adobe PDF Visualizza/Apri	4.67 MB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2927387

PORTO @ Archivio Istituzionale della Ricerca

HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology

Fasfous, Nael;Vemparala, Manoj Rohit;Frickenstein, Alexander;Valpreda, Emanuele;Salihu, Driton;Doan, Nguyen Anh Vu;Unger, Christian;Nagaraja, Naveen Shankar;Martina, Maurizio;Stechele, Walter

2021

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)