Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Wiese, Philip; İslamoğlu, Gamze; Scherer, Moritz; Macan, Luka; Jung, Victor J. B.; Burrello, Alessio; Conti, Francesco; Benini, Luca

doi:10.1109/mdat.2025.3527371

One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octacore cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154GOp/s (0.65 V, 22nm FD-SOI technology).

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow / Wiese, Philip; İslamoğlu, Gamze; Scherer, Moritz; Macan, Luka; Jung, Victor J. B.; Burrello, Alessio; Conti, Francesco; Benini, Luca. - In: IEEE DESIGN & TEST. - ISSN 2168-2356. - 42:5(2025), pp. 63-72. [10.1109/mdat.2025.3527371]

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Wiese, Philip;İslamoğlu, Gamze;Scherer, Moritz;Macan, Luka;Jung, Victor J. B.;Burrello, Alessio;Conti, Francesco;Benini, Luca

2025

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2025
			
	Codice DOI
	
				https://dx.doi.org/10.1109/mdat.2025.3527371
			
	Titolo della Rivista
	
				IEEE DESIGN & TEST
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
DT_TinyML_Paper__V2_.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 343.36 kB Formato Adobe PDF Visualizza/Apri	343.36 kB	Adobe PDF	Visualizza/Apri
Toward_Attention-Based_TinyML_A_Heterogeneous_Accelerated_Architecture_and_Automated_Deployment_Flow.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 693.29 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	693.29 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2996569

PORTO @ Archivio Istituzionale della Ricerca

Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow

Wiese, Philip;İslamoğlu, Gamze;Scherer, Moritz;Macan, Luka;Jung, Victor J. B.;Burrello, Alessio;Conti, Francesco;Benini, Luca

2025

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)