This research investigates the crucial integration of Neural Network (NN) models with the architecture of the hardware (HW) accelerator. Unlike existing approaches overlooking this interaction, we emphasize understanding the accelerator Datapath for reliability-focused algorithmic solutions. Focusing on Systolic Arrays Datapath, we theoretically evaluate the fault propagation from the HW layer to the NN. This analysis identifies variations in fault effects linked to various data mapping strategies. Considering the fault propagation model, we propose a novel reliability-oriented mapping strategy to mitigate fault effects based on resource rotation. Validation through HW fault injection demonstrates that an architecture-aware NN implementation reduces the impact of faults by up to 40%. Moreover, experimental results indicate that our proposed solution enhances the NN resilience, resulting in up to a 30% reduction in the error rate. Most importantly, these enhancements are attained without introducing performance or hardware overhead.

ZOR: Zero Overhead Reliability Strategies for AI Accelerators / Vacca, Eleonora; Azimi, Sarah; Sterpone, Luca. - ELETTRONICO. - (2024), pp. 248-252. (Intervento presentato al convegno 22nd IEEE International NEWCAS Conference 2024 tenutosi a Sherbrooke (CAN) nel 16-19 June 2024) [10.1109/NewCAS58973.2024.10666350].

ZOR: Zero Overhead Reliability Strategies for AI Accelerators

Vacca, Eleonora;Azimi, Sarah;Sterpone, Luca
2024

Abstract

This research investigates the crucial integration of Neural Network (NN) models with the architecture of the hardware (HW) accelerator. Unlike existing approaches overlooking this interaction, we emphasize understanding the accelerator Datapath for reliability-focused algorithmic solutions. Focusing on Systolic Arrays Datapath, we theoretically evaluate the fault propagation from the HW layer to the NN. This analysis identifies variations in fault effects linked to various data mapping strategies. Considering the fault propagation model, we propose a novel reliability-oriented mapping strategy to mitigate fault effects based on resource rotation. Validation through HW fault injection demonstrates that an architecture-aware NN implementation reduces the impact of faults by up to 40%. Moreover, experimental results indicate that our proposed solution enhances the NN resilience, resulting in up to a 30% reduction in the error rate. Most importantly, these enhancements are attained without introducing performance or hardware overhead.
2024
979-8-3503-6175-9
File in questo prodotto:
File Dimensione Formato  
newcas_24_pdfXpress.pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 337.9 kB
Formato Adobe PDF
337.9 kB Adobe PDF Visualizza/Apri
ZOR_Zero_Overhead_Reliability_Strategies_for_AI_Accelerators.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 420.06 kB
Formato Adobe PDF
420.06 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2990347