This paper addresses the problem of deployment of LLMs on RISC-V-based CPU systems by optimizing LLM inference on the Sophon SG2042. We evaluate the inference performance of two state-of-the-art LLMs optimised for reasoning: DeepSeek R1 Distill Llama 8B and DeepSeek R1 Distill QWEN 14B. Thanks to our optimizations on top of the llama.cpp inference library, we achieve token generation speeds of 4.32/2.29 tokens per second and prompt processing speeds of 6.54/3.68 tokens per second, with a significant speedup of up to 2.9 × /3.0 × compared to a direct porting of the same library.

POSTER: V-Seek: Optimizing LLM Reasoning on A Server-Class General-Purpose RISC-V Platform / Poveda Rodrigo, Javier Jesus; Hamdi, Mohamed Amine; Koenig, Cyril; Burrello, Alessio; Jahier Pagliari, Daniele; Benini, Luca. - 1:(2025), pp. 224-225. (Intervento presentato al convegno CF '25: 22nd ACM International Conference on Computing Frontiers tenutosi a Cagliari (ITA) nel May 28 - 30, 2025) [10.1145/3719276.3727954].

POSTER: V-Seek: Optimizing LLM Reasoning on A Server-Class General-Purpose RISC-V Platform

Javier Jesus Poveda Rodrigo;Mohamed Amine Hamdi;Alessio Burrello;Daniele Jahier Pagliari;Luca Benini
2025

Abstract

This paper addresses the problem of deployment of LLMs on RISC-V-based CPU systems by optimizing LLM inference on the Sophon SG2042. We evaluate the inference performance of two state-of-the-art LLMs optimised for reasoning: DeepSeek R1 Distill Llama 8B and DeepSeek R1 Distill QWEN 14B. Thanks to our optimizations on top of the llama.cpp inference library, we achieve token generation speeds of 4.32/2.29 tokens per second and prompt processing speeds of 6.54/3.68 tokens per second, with a significant speedup of up to 2.9 × /3.0 × compared to a direct porting of the same library.
2025
979-8-4007-1528-0
File in questo prodotto:
File Dimensione Formato  
CF25___Milk_V_LLM.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 709.08 kB
Formato Adobe PDF
709.08 kB Adobe PDF Visualizza/Apri
3719276.3727954.pdf

accesso riservato

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 683.2 kB
Formato Adobe PDF
683.2 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3003742