The integration of Artificial Intelligence (AI) in safety-critical systems raises concerns about reliability, particularly due to the inherent uncertainty of AI algorithms and the complexity of modern hardware, compromising billion of transistors. Existing solutions, such as Algorithm-Based Fault Tolerance often focus on running detection algorithms after every inference, introducing a not negligible overhead in the detection phase. This paper introduces a two-phase fault detection technique for Convolutional Neural Networks (CNNs) with floating-point precision. The first phase identifies easily detectable faults (such as those stemming from a bit-flip on the 30th of a floating-point representation), while the second one targets hard-to-detect critical faults-those producing a wrong prediction but having no visible effect during faults’ propagation. Although these faults constitute only 1.4% of all critical faults, their detection is crucial for ensuring system reliability. Validated on the CIFAR-10 dataset with a ResNet-20 model, the proposed method achieves up to 99.67% coverage of critical inferences while maintaining moderate computational overhead. This lightweight, real-time solution enhances the robustness of CNNs in safety-critical applications.

DOC: Detection of On-Line Failures in CNNs / Turco, Vittorio; Bellarmino, Nicolò; Ruospo, Annachiara; Cantoro, Riccardo; Sanchez, Ernesto. - ELETTRONICO. - (In corso di stampa). (Intervento presentato al convegno 26th IEEE Latin American Test Symposium 2025 tenutosi a San Andrés (Colombia) nel 11-14 Marzo 2025).

DOC: Detection of On-Line Failures in CNNs

Turco Vittorio;Nicolò Bellarmino;Annachiara Ruospo;Cantoro Riccardo;Ernesto Sanchez
In corso di stampa

Abstract

The integration of Artificial Intelligence (AI) in safety-critical systems raises concerns about reliability, particularly due to the inherent uncertainty of AI algorithms and the complexity of modern hardware, compromising billion of transistors. Existing solutions, such as Algorithm-Based Fault Tolerance often focus on running detection algorithms after every inference, introducing a not negligible overhead in the detection phase. This paper introduces a two-phase fault detection technique for Convolutional Neural Networks (CNNs) with floating-point precision. The first phase identifies easily detectable faults (such as those stemming from a bit-flip on the 30th of a floating-point representation), while the second one targets hard-to-detect critical faults-those producing a wrong prediction but having no visible effect during faults’ propagation. Although these faults constitute only 1.4% of all critical faults, their detection is crucial for ensuring system reliability. Validated on the CIFAR-10 dataset with a ResNet-20 model, the proposed method achieves up to 99.67% coverage of critical inferences while maintaining moderate computational overhead. This lightweight, real-time solution enhances the robustness of CNNs in safety-critical applications.
In corso di stampa
File in questo prodotto:
File Dimensione Formato  
LATS2025_On_line_detection (3).pdf

accesso aperto

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Pubblico - Tutti i diritti riservati
Dimensione 940.29 kB
Formato Adobe PDF
940.29 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2999030