Optimizing Foundation Model Inference on a Many-Tiny-Core Open-Source RISC-V Platform

Potocnik, Viviane; Colagrande, Luca; Fischer, Tim; Bertaccini, Luca; Pagliari, Daniele Jahier; Burrello, Alessio; Benini, Luca

doi:10.1109/tcasai.2024.3459412

Transformer-based foundation models have become crucial for various domains, most notably natural language processing (NLP) or computer vision (CV). These models are predominantly deployed on high-performance GPUs or hardwired accelerators with highly customized, proprietary instruction sets. Until now, limited attention has been given to RISC-V-based general-purpose platforms. In our work, we present the first inference results of transformer models on an open-source many-tiny-core RISC-V platform implementing distributed Softmax primitives and leveraging ISA extensions for SIMD floating-point operand streaming and instruction repetition, as well as specialized DMA engines to minimize costly main memory accesses and to tolerate their latency. We focus on two foundational transformer topologies, encoder-only and decoder-only models. For encoder-only models, we demonstrate a speedup of up to 12.8× between the most optimized implementation and the baseline version. We reach over 79% FPU utilization and 294 GFLOPS/W, outperforming State-of-the-Art (SoA) accelerators by more than 2× utilizing the HW platform while achieving comparable throughput per computational unit. For decoder-only topologies, we achieve 16.1× speedup in the Non-Autoregressive (NAR) mode and up to 35.6× speedup in the Autoregressive (AR) mode compared to the baseline implementation. Compared to the best SoA dedicated accelerator, we achieve 2.04× higher FPU utilization.

Optimizing Foundation Model Inference on a Many-Tiny-Core Open-Source RISC-V Platform / Potocnik, Viviane; Colagrande, Luca; Fischer, Tim; Bertaccini, Luca; Pagliari, Daniele Jahier; Burrello, Alessio; Benini, Luca. - In: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ARTIFICIAL INTELLIGENCE. - ISSN 2996-6647. - 1:1(2024), pp. 37-52. [10.1109/tcasai.2024.3459412]

Optimizing Foundation Model Inference on a Many-Tiny-Core Open-Source RISC-V Platform

Potocnik, Viviane;Colagrande, Luca;Fischer, Tim;Bertaccini, Luca;Pagliari, Daniele Jahier;Burrello, Alessio;Benini, Luca

2024

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2024
			
	Codice DOI
	
				https://dx.doi.org/10.1109/tcasai.2024.3459412
			
	Titolo della Rivista
	
				IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR ARTIFICIAL INTELLIGENCE
			
	Appare nelle tipologie
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
TCASAI24___OccamyLLM.pdf accesso aperto Tipologia: 2. Post-print / Author's Accepted Manuscript Licenza: Pubblico - Tutti i diritti riservati Dimensione 2.74 MB Formato Adobe PDF Visualizza/Apri	2.74 MB	Adobe PDF	Visualizza/Apri
Optimizing_Foundation_Model_Inference_on_a_Many-Tiny-Core_Open-Source_RISC-V_Platform.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 1.89 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.89 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2996573

PORTO @ Archivio Istituzionale della Ricerca

Optimizing Foundation Model Inference on a Many-Tiny-Core Open-Source RISC-V Platform

Potocnik, Viviane;Colagrande, Luca;Fischer, Tim;Bertaccini, Luca;Pagliari, Daniele Jahier;Burrello, Alessio;Benini, Luca

2024

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)