In the past, in most General-Purpose Graphic Processing Units (GPGPUs) application fields (e.g., multimedia and gaming), the reliability features were not so relevant. Nowadays, GPGPUs are used in new domains, such as the automotive one, where reliability plays a significant role. In this work, we describe a dynamic duplication with a comparison (DDWC) mechanism intended to harden the Scalar Processor (SP) units located in the Streaming multiprocessors (SM) of a GPGPU. The proposed mechanism targets the permanent faults that may arise inside the SPs. One additional SP unit is included in the system to compute redundantly the same operations of a selected SP. Results are compared, and possible failures detected. A custom reconfiguration instruction allows the dynamic selection of the target SP to be monitored. Experimental results show that the proposed mechanism introduces a limited area overhead while it provides a significant increase in the in-field fault detection capabilities of the GPGPU. Its flexibility allows selecting the best trade-off between fault detection latency and performance overhead.

A dynamic hardware redundancy mechanism for the in-field fault detection in cores of GPGPUs / Rodriguez Condia, Josie E.; Narducci, Pierpaolo; Reorda, M. Sonza; Sterpone, L.. - ELETTRONICO. - (2020), pp. 1-6. (Intervento presentato al convegno 2020 23rd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS) tenutosi a Novi Sad, Serbia, Serbia nel 22-24 April 2020) [10.1109/DDECS50862.2020.9095665].

A dynamic hardware redundancy mechanism for the in-field fault detection in cores of GPGPUs

Rodriguez Condia, Josie E.;Reorda, M. Sonza;Sterpone, L.
2020

Abstract

In the past, in most General-Purpose Graphic Processing Units (GPGPUs) application fields (e.g., multimedia and gaming), the reliability features were not so relevant. Nowadays, GPGPUs are used in new domains, such as the automotive one, where reliability plays a significant role. In this work, we describe a dynamic duplication with a comparison (DDWC) mechanism intended to harden the Scalar Processor (SP) units located in the Streaming multiprocessors (SM) of a GPGPU. The proposed mechanism targets the permanent faults that may arise inside the SPs. One additional SP unit is included in the system to compute redundantly the same operations of a selected SP. Results are compared, and possible failures detected. A custom reconfiguration instruction allows the dynamic selection of the target SP to be monitored. Experimental results show that the proposed mechanism introduces a limited area overhead while it provides a significant increase in the in-field fault detection capabilities of the GPGPU. Its flexibility allows selecting the best trade-off between fault detection latency and performance overhead.
2020
978-1-7281-9938-2
File in questo prodotto:
File Dimensione Formato  
09095665.pdf

non disponibili

Descrizione: post-print version of the manuscript
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 952.96 kB
Formato Adobe PDF
952.96 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2827573