Soft errors in General Purpose Graphics Processing Units (GPGPUs) result in data or control flow errors. Error detection and correction methods for data and control flow errors are orthogonal, and these methods incur separate area, power, and performance overheads. This paper proposes a low-overhead predictive control flow error detection method called GPGPU Predictive Detector (GPD), which leverages data flow error detection and correction methods to detect control flow errors. GPD is non-intrusive to application software and transparent to users. GPD is built on earlier work on data flow error detection and correction methods DDSR and TREFU. In GPD, DDSR and TREFU combined architecture protects all non-control flow instructions. The control flow error is detected by calculating the address of the instruction that succeeds the control flow instruction in advance and comparing it with the actual address it accesses. The effectiveness of GPD has been shown through a set of ISPASS-2009 and RODINIA benchmarks. Relative to a non-fault-tolerant GPGPU architecture, GPD has a performance overhead of 5% and average and peak power overheads of 4% and 3%, respectively. We prove through induction that the GPD provides fault coverage against GPGPU control flow and data flow errors.

GPD: Predictive Control Flow Error Detection Leveraging Data Flow Error Detection Methods / Raghunandana K, K; Yogesh Prasad, K R; Sonza Reorda, M.; Virendra, Singh. - (2025), pp. 1-5. (Intervento presentato al convegno 31st IEEE International Symposium on On-Line Testing and Robust System Design, IOLTS 2025 tenutosi a Ischia (ITA) nel 07-09 July 2025) [10.1109/iolts65288.2025.11116971].

GPD: Predictive Control Flow Error Detection Leveraging Data Flow Error Detection Methods

M. Sonza Reorda;
2025

Abstract

Soft errors in General Purpose Graphics Processing Units (GPGPUs) result in data or control flow errors. Error detection and correction methods for data and control flow errors are orthogonal, and these methods incur separate area, power, and performance overheads. This paper proposes a low-overhead predictive control flow error detection method called GPGPU Predictive Detector (GPD), which leverages data flow error detection and correction methods to detect control flow errors. GPD is non-intrusive to application software and transparent to users. GPD is built on earlier work on data flow error detection and correction methods DDSR and TREFU. In GPD, DDSR and TREFU combined architecture protects all non-control flow instructions. The control flow error is detected by calculating the address of the instruction that succeeds the control flow instruction in advance and comparing it with the actual address it accesses. The effectiveness of GPD has been shown through a set of ISPASS-2009 and RODINIA benchmarks. Relative to a non-fault-tolerant GPGPU architecture, GPD has a performance overhead of 5% and average and peak power overheads of 4% and 3%, respectively. We prove through induction that the GPD provides fault coverage against GPGPU control flow and data flow errors.
2025
979-8-3315-3334-2
File in questo prodotto:
File Dimensione Formato  
GPD_Predictive_Control_Flow_Error_Detection_Leveraging_Data_Flow_Error_Detection_Methods.pdf

accesso riservato

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 506.65 kB
Formato Adobe PDF
506.65 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3004367