Graphics Processing Units (GPUs) are today adopted in several domains for which reliability is fundamental, such as self-driving cars and autonomous machines. Unfortunately, on one side GPUs have been shown to have a high error rate and, on the other side, the constraints imposed by real-time safety-critical applications make traditional, costly, replication-based hardening solutions inadequate. This paper proposes an effective microarchitectural selective hardening of GPU modules to mitigate those faults that affect instructions correct execution. We first characterize, through Register-Transfer Level (RTL) fault injections, the architectural vulnerabilities of a GPU model (FlexGripPlus). We specifically target transient faults in the functional units and pipeline registers of a GPU core. Then, we apply selective hardening by triplicating the locations in each module that we found to be more critical. The results show that selective hardening using Triple Modular Redundancy (TMR) can correct 85% to 99% of faults in the pipeline registers and from 50% to 100% of faults in the functional units. The proposed selective TMR strategy reduces the hardware overhead by up to 65% when compared with traditional TMR.

Protecting GPU's Microarchitectural Vulnerabilities via Effective Selective Hardening / Rodriguez Condia, Josie Esteban; Rech, Paolo; Fernandes dos Santos, Fernando; Carro, Luigi; Sonza Reorda, Matteo. - ELETTRONICO. - (2021), pp. 1-7. (Intervento presentato al convegno 2021 IEEE 27th International Symposium on On-Line Testing and Robust System Design (IOLTS) tenutosi a Torino, Italy nel 28-30 June 2021) [10.1109/IOLTS52814.2021.9486703].

Protecting GPU's Microarchitectural Vulnerabilities via Effective Selective Hardening

Rodriguez Condia, Josie Esteban;Rech, Paolo;Carro, Luigi;Sonza Reorda, Matteo
2021

Abstract

Graphics Processing Units (GPUs) are today adopted in several domains for which reliability is fundamental, such as self-driving cars and autonomous machines. Unfortunately, on one side GPUs have been shown to have a high error rate and, on the other side, the constraints imposed by real-time safety-critical applications make traditional, costly, replication-based hardening solutions inadequate. This paper proposes an effective microarchitectural selective hardening of GPU modules to mitigate those faults that affect instructions correct execution. We first characterize, through Register-Transfer Level (RTL) fault injections, the architectural vulnerabilities of a GPU model (FlexGripPlus). We specifically target transient faults in the functional units and pipeline registers of a GPU core. Then, we apply selective hardening by triplicating the locations in each module that we found to be more critical. The results show that selective hardening using Triple Modular Redundancy (TMR) can correct 85% to 99% of faults in the pipeline registers and from 50% to 100% of faults in the functional units. The proposed selective TMR strategy reduces the hardware overhead by up to 65% when compared with traditional TMR.
2021
978-1-6654-3370-9
File in questo prodotto:
File Dimensione Formato  
IOLTS_2021_camera_Ready.pdf

accesso aperto

Descrizione: Postprint version of the manuscript
Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 897 kB
Formato Adobe PDF
897 kB Adobe PDF Visualizza/Apri
Protecting_GPUs_Microarchitectural_Vulnerabilities_via_Effective_Selective_Hardening.pdf

non disponibili

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 954.99 kB
Formato Adobe PDF
954.99 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2915678