The integration of Artificial Intelligence (AI) in safety-critical systems raises concerns about reliability, particularly due to the inherent uncertainty of AI algorithms and the complexity of modern hardware, compromising billion of transistors. Existing solutions, such as Algorithm-Based Fault Tolerance often focus on running detection algorithms after every inference, introducing a not negligible overhead in the detection phase. This paper introduces a two-phase fault detection technique for Convolutional Neural Networks (CNNs) with floating-point precision. The first phase identifies easily detectable faults (such as those stemming from a bit-flip on the 30th of a floating-point representation), while the second one targets hard-to-detect critical faults-those producing a wrong prediction but having no visible effect during faults’ propagation. Although these faults constitute only 1.4% of all critical faults, their detection is crucial for ensuring system reliability. Validated on the CIFAR-10 dataset with a ResNet-20 model, the proposed method achieves up to 99.67% coverage of critical inferences while maintaining moderate computational overhead. This lightweight, real-time solution enhances the robustness of CNNs in safety-critical applications.
DOC: Detection of On-Line Failures in CNNs / Turco, Vittorio; Bellarmino, Nicolò; Ruospo, Annachiara; Cantoro, Riccardo; Sanchez, Ernesto. - ELETTRONICO. - (2025). (Intervento presentato al convegno Latin American Test Workshop, LATW tenutosi a San Andrés (COL) nel 11-14 March 2025) [10.1109/LATS65346.2025.10963935].
DOC: Detection of On-Line Failures in CNNs
Turco Vittorio;Nicolò Bellarmino;Annachiara Ruospo;Cantoro Riccardo;Ernesto Sanchez
2025
Abstract
The integration of Artificial Intelligence (AI) in safety-critical systems raises concerns about reliability, particularly due to the inherent uncertainty of AI algorithms and the complexity of modern hardware, compromising billion of transistors. Existing solutions, such as Algorithm-Based Fault Tolerance often focus on running detection algorithms after every inference, introducing a not negligible overhead in the detection phase. This paper introduces a two-phase fault detection technique for Convolutional Neural Networks (CNNs) with floating-point precision. The first phase identifies easily detectable faults (such as those stemming from a bit-flip on the 30th of a floating-point representation), while the second one targets hard-to-detect critical faults-those producing a wrong prediction but having no visible effect during faults’ propagation. Although these faults constitute only 1.4% of all critical faults, their detection is crucial for ensuring system reliability. Validated on the CIFAR-10 dataset with a ResNet-20 model, the proposed method achieves up to 99.67% coverage of critical inferences while maintaining moderate computational overhead. This lightweight, real-time solution enhances the robustness of CNNs in safety-critical applications.File | Dimensione | Formato | |
---|---|---|---|
LATS2025_On_line_detection (3).pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
940.29 kB
Formato
Adobe PDF
|
940.29 kB | Adobe PDF | Visualizza/Apri |
DOC_Detection_of_On-Line_Failures_in_CNNs.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
1 MB
Formato
Adobe PDF
|
1 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2999030