Recently, platforms combining RISC-V processors with accelerators for deep-learning applications have gained popularity even for high-reliability applications such as avionics and space. However, for high-performance safety-critical systems, it is mandatory to couple high-performance architecture with reliable mechanisms for coping with errors and faults. We propose the first FPGA-based architecture that combines a RISC-V processor with a systolic array-based accelerator, a fault detection, fault correction, and an execution recovery mechanism. The proposed solution corrects faults in the systolic array datapath by exploiting a partial reconfiguration mechanism. When an error is detected, the RISC-V processor can trigger the accelerator reconfiguration, correcting the fault. Furthermore, the approach allows resuming the inference from the last correctly executed step, significantly reducing the availability overhead. The approach results in a high-performance and high-reliable platform that can autonomously detect and correct faults, providing execution continuity and minimal system downtime.

RePAIR: Reconfigurable Platform for AI Resilience within RISC-V Ecosystem / Cora, Giorgio; Vacca, Eleonora; De Sio, Corrado; Azimi, Sarah; Sterpone, Luca. - 15594:(2025), pp. 71-87. (Intervento presentato al convegno 21st International Symposium on Applied Reconfigurable Computing ARC 2025 tenutosi a Seville (ESP) nel April 9–11, 2025) [10.1007/978-3-031-87995-1_5].

RePAIR: Reconfigurable Platform for AI Resilience within RISC-V Ecosystem

Giorgio Cora;Eleonora Vacca;Corrado De Sio;Sarah Azimi;Luca Sterpone
2025

Abstract

Recently, platforms combining RISC-V processors with accelerators for deep-learning applications have gained popularity even for high-reliability applications such as avionics and space. However, for high-performance safety-critical systems, it is mandatory to couple high-performance architecture with reliable mechanisms for coping with errors and faults. We propose the first FPGA-based architecture that combines a RISC-V processor with a systolic array-based accelerator, a fault detection, fault correction, and an execution recovery mechanism. The proposed solution corrects faults in the systolic array datapath by exploiting a partial reconfiguration mechanism. When an error is detected, the RISC-V processor can trigger the accelerator reconfiguration, correcting the fault. Furthermore, the approach allows resuming the inference from the last correctly executed step, significantly reducing the availability overhead. The approach results in a high-performance and high-reliable platform that can autonomously detect and correct faults, providing execution continuity and minimal system downtime.
2025
File in questo prodotto:
File Dimensione Formato  
ARC_Camera_Ready.pdf

embargo fino al 04/04/2026

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 1.44 MB
Formato Adobe PDF
1.44 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
978-3-031-87995-1_5.pdf

accesso riservato

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 749.98 kB
Formato Adobe PDF
749.98 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2998583