This paper addresses the problem of deployment of LLMs on RISC-V-based CPU systems by optimizing LLM inference on the Sophon SG2042. We evaluate the inference performance of two state-of-the-art LLMs optimised for reasoning: DeepSeek R1 Distill Llama 8B and DeepSeek R1 Distill QWEN 14B. Thanks to our optimizations on top of the llama.cpp inference library, we achieve token generation speeds of 4.32/2.29 tokens per second and prompt processing speeds of 6.54/3.68 tokens per second, with a significant speedup of up to 2.9 × /3.0 × compared to a direct porting of the same library.
POSTER: V-Seek: Optimizing LLM Reasoning on A Server-Class General-Purpose RISC-V Platform / Poveda Rodrigo, Javier Jesus; Hamdi, Mohamed Amine; Koenig, Cyril; Burrello, Alessio; Jahier Pagliari, Daniele; Benini, Luca. - 1:(2025), pp. 224-225. (Intervento presentato al convegno CF '25: 22nd ACM International Conference on Computing Frontiers tenutosi a Cagliari (ITA) nel May 28 - 30, 2025) [10.1145/3719276.3727954].
POSTER: V-Seek: Optimizing LLM Reasoning on A Server-Class General-Purpose RISC-V Platform
Javier Jesus Poveda Rodrigo;Mohamed Amine Hamdi;Alessio Burrello;Daniele Jahier Pagliari;Luca Benini
2025
Abstract
This paper addresses the problem of deployment of LLMs on RISC-V-based CPU systems by optimizing LLM inference on the Sophon SG2042. We evaluate the inference performance of two state-of-the-art LLMs optimised for reasoning: DeepSeek R1 Distill Llama 8B and DeepSeek R1 Distill QWEN 14B. Thanks to our optimizations on top of the llama.cpp inference library, we achieve token generation speeds of 4.32/2.29 tokens per second and prompt processing speeds of 6.54/3.68 tokens per second, with a significant speedup of up to 2.9 × /3.0 × compared to a direct porting of the same library.File | Dimensione | Formato | |
---|---|---|---|
CF25___Milk_V_LLM.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
709.08 kB
Formato
Adobe PDF
|
709.08 kB | Adobe PDF | Visualizza/Apri |
3719276.3727954.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
683.2 kB
Formato
Adobe PDF
|
683.2 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3003742