One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octacore cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154GOp/s (0.65 V, 22nm FD-SOI technology).
Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow / Wiese, Philip; İslamoğlu, Gamze; Scherer, Moritz; Macan, Luka; Jung, Victor J. B.; Burrello, Alessio; Conti, Francesco; Benini, Luca. - In: IEEE DESIGN & TEST. - ISSN 2168-2356. - 42:5(2025), pp. 63-72. [10.1109/mdat.2025.3527371]
Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow
Burrello, Alessio;
2025
Abstract
One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octacore cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154GOp/s (0.65 V, 22nm FD-SOI technology).| File | Dimensione | Formato | |
|---|---|---|---|
| DT_TinyML_Paper__V2_.pdf accesso aperto 
											Tipologia:
											2. Post-print / Author's Accepted Manuscript
										 
											Licenza:
											
											
												Pubblico - Tutti i diritti riservati
												
												
												
											
										 
										Dimensione
										343.36 kB
									 
										Formato
										Adobe PDF
									 | 343.36 kB | Adobe PDF | Visualizza/Apri | 
| Toward_Attention-Based_TinyML_A_Heterogeneous_Accelerated_Architecture_and_Automated_Deployment_Flow.pdf accesso riservato 
											Tipologia:
											2a Post-print versione editoriale / Version of Record
										 
											Licenza:
											
											
												Non Pubblico - Accesso privato/ristretto
												
												
												
											
										 
										Dimensione
										693.29 kB
									 
										Formato
										Adobe PDF
									 | 693.29 kB | Adobe PDF | Visualizza/Apri Richiedi una copia | 
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2996569
			
		
	
	
	
			      	