Accelerating Mixed-Precision QNN Inference on RISC-V MCUs With the STAR-MAC Unit

Manca, Edward; Urbinati, Luca; Casu, Mario R.

doi:10.1109/ACCESS.2025.3641382

The demand for efficient edge machine learning (ML) in Internet of Things (IoT) applications has driven interest in Microcontroller Unit (MCU)-based TinyML solutions, especially with the rise of RISC-V. Although MCUs are power efficient, their limited resources challenge the deployment of complex ML models. Mixed-Precision Quantization (MPQ) offers the best trade-off between model size, energy consumption, and accuracy, by using different weights and activations precision across model layers. Recently, an increasing interest in hardware support for Mixed-Precision Quantized Neural-Networks (MP-QNN) in MCU-class processors has emerged. In this paper, we present a novel precision-scalable Multiply-and-Accumulate (MAC) unit, named STAR-MAC, that supports MPQ on 16-, 8- and 4-bit integers. Combining Sum-Together and Sum-Apart subword-parallel multiplications in a reconfigurable architecture, STAR-MAC accelerates Fully-Connected, 2D-Convolution, and Depth-wise layers. We integrate STAR-MAC into the low-power RISC-V Ibex core and validate it on the four MLPerf Tiny MP-QNN models trained with QKeras and using a modified TensorFlow Lite for Microcontrollers (TFLM) enhanced with MPQ kernels, that we open-source. Compared to the 8-bit standard TFLM runtime, all this combined hardware-software approach results in a FlatBuffer smaller by 27%, an average latency reduction of 68% across the four MLPerf Tiny models - measured on a Field-Programmable Gate Array(FPGA)-based System-on-Chip setup -, and a negligible accuracy loss on the corresponding test sets. Synthesis on 28-nm CMOS technology shows that our STAR-based Ibex has the highest energy efficiency per unit of Silicon area (175.7 GOPS/W/mm2 at 4-bit) among various MPQ processors, with limited overhead over the original Ibex (+12.2% area, +7.3% power). Our solution enables low-power, low-latency inference of MP-QNNs on IoT nodes.

Accelerating Mixed-Precision QNN Inference on RISC-V MCUs With the STAR-MAC Unit / Manca, Edward; Urbinati, Luca; Casu, Mario R.. - In: IEEE ACCESS. - ISSN 2169-3536. - 13:(2025), pp. 208533-208548. [10.1109/ACCESS.2025.3641382]

Accelerating Mixed-Precision QNN Inference on RISC-V MCUs With the STAR-MAC Unit

Edward Manca;Luca Urbinati;Mario R. Casu

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2025
			
	Codice DOI
	
				https://dx.doi.org/10.1109/ACCESS.2025.3641382
			
	Titolo della Rivista
	
				IEEE ACCESS

File in questo prodotto:

Non ci sono file associati a questo prodotto.

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3006065

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

PORTO @ Archivio Istituzionale della Ricerca

Accelerating Mixed-Precision QNN Inference on RISC-V MCUs With the STAR-MAC Unit

Edward Manca;Luca Urbinati;Mario R. Casu

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Attenzione

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)