Graphics processing units are available solutions for high-performance safety-critical applications, such as self-driving cars. In this application domain, functional-safety and reliability are major concerns. Thus, the adoption of fault tolerance techniques is mandatory to detect or correct faults, since these devices must work properly, even when faults are present. GPUs are designed and implemented with cutting-edge technologies, which makes them sensitive to faults caused by radiation interference, such as single event upsets. These effects can lead the system to a failure, which is unacceptable in safety-critical applications. Therefore, effective detection and mitigation strategies must be adopted to harden the GPU operation. In this paper, we analyze transient effects in the pipeline registers of a GPU architecture. We run four applications at three GPU configurations, considering the source of the fault, its effect on the GPU, and the use of software-based hardening techniques. The evaluation was performed using a general-purpose soft-core GPU based on the NVIDIA G80 architecture. Results can guide designers in building more resilient GPU architectures.

Analyzing the Sensitivity of GPU Pipeline Registers to Single Events Upsets / Rodriguez Condia, Josie E.; Goncalves, Marcio M.; Azambuja, Jose Rodrigo; Sonza Reorda, Matteo; Sterpone, Luca. - ELETTRONICO. - (2020), pp. 380-385. (Intervento presentato al convegno 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) tenutosi a Limassol, Cyprus nel 6-8 July 2020) [10.1109/ISVLSI49217.2020.00076].

Analyzing the Sensitivity of GPU Pipeline Registers to Single Events Upsets

Rodriguez Condia, Josie E.;Sonza Reorda, Matteo;Sterpone, Luca
2020

Abstract

Graphics processing units are available solutions for high-performance safety-critical applications, such as self-driving cars. In this application domain, functional-safety and reliability are major concerns. Thus, the adoption of fault tolerance techniques is mandatory to detect or correct faults, since these devices must work properly, even when faults are present. GPUs are designed and implemented with cutting-edge technologies, which makes them sensitive to faults caused by radiation interference, such as single event upsets. These effects can lead the system to a failure, which is unacceptable in safety-critical applications. Therefore, effective detection and mitigation strategies must be adopted to harden the GPU operation. In this paper, we analyze transient effects in the pipeline registers of a GPU architecture. We run four applications at three GPU configurations, considering the source of the fault, its effect on the GPU, and the use of software-based hardening techniques. The evaluation was performed using a general-purpose soft-core GPU based on the NVIDIA G80 architecture. Results can guide designers in building more resilient GPU architectures.
2020
978-1-7281-5775-7
File in questo prodotto:
File Dimensione Formato  
PID6470519.pdf

accesso aperto

Descrizione: post-print version of the manuscript
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.68 MB
Formato Adobe PDF
1.68 MB Adobe PDF Visualizza/Apri
09155054.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 887.44 kB
Formato Adobe PDF
887.44 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2843260