Recently, platforms combining RISC-V processors with accelerators for deep-learning applications have gained popularity even for high-reliability applications such as avionics and space. However, for high-performance safety-critical systems, it is mandatory to couple high-performance architecture with reliable mechanisms for coping with errors and faults. We propose the first FPGA-based architecture that combines a RISC-V processor with a systolic array-based accelerator, a fault detection, fault correction, and an execution recovery mechanism. The proposed solution corrects faults in the systolic array datapath by exploiting a partial reconfiguration mechanism. When an error is detected, the RISC-V processor can trigger the accelerator reconfiguration, correcting the fault. Furthermore, the approach allows resuming the inference from the last correctly executed step, significantly reducing the availability overhead. The approach results in a high-performance and high-reliable platform that can autonomously detect and correct faults, providing execution continuity and minimal system downtime.
RePAIR: Reconfigurable Platform for AI Resilience within RISC-V Ecosystem / Cora, Giorgio; Vacca, Eleonora; De Sio, Corrado; Azimi, Sarah; Sterpone, Luca. - 15594:(2025), pp. 71-87. (Intervento presentato al convegno 21st International Symposium on Applied Reconfigurable Computing ARC 2025 tenutosi a Seville (ESP) nel April 9–11, 2025) [10.1007/978-3-031-87995-1_5].
RePAIR: Reconfigurable Platform for AI Resilience within RISC-V Ecosystem
Giorgio Cora;Eleonora Vacca;Corrado De Sio;Sarah Azimi;Luca Sterpone
2025
Abstract
Recently, platforms combining RISC-V processors with accelerators for deep-learning applications have gained popularity even for high-reliability applications such as avionics and space. However, for high-performance safety-critical systems, it is mandatory to couple high-performance architecture with reliable mechanisms for coping with errors and faults. We propose the first FPGA-based architecture that combines a RISC-V processor with a systolic array-based accelerator, a fault detection, fault correction, and an execution recovery mechanism. The proposed solution corrects faults in the systolic array datapath by exploiting a partial reconfiguration mechanism. When an error is detected, the RISC-V processor can trigger the accelerator reconfiguration, correcting the fault. Furthermore, the approach allows resuming the inference from the last correctly executed step, significantly reducing the availability overhead. The approach results in a high-performance and high-reliable platform that can autonomously detect and correct faults, providing execution continuity and minimal system downtime.File | Dimensione | Formato | |
---|---|---|---|
ARC_Camera_Ready.pdf
embargo fino al 04/04/2026
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
1.44 MB
Formato
Adobe PDF
|
1.44 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
978-3-031-87995-1_5.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
749.98 kB
Formato
Adobe PDF
|
749.98 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2998583