In the past, in most General-Purpose Graphic Processing Units (GPGPUs) application fields (e.g., multimedia and gaming), the reliability features were not so relevant. Nowadays, GPGPUs are used in new domains, such as the automotive one, where reliability plays a significant role. In this work, we describe a dynamic duplication with a comparison (DDWC) mechanism intended to harden the Scalar Processor (SP) units located in the Streaming multiprocessors (SM) of a GPGPU. The proposed mechanism targets the permanent faults that may arise inside the SPs. One additional SP unit is included in the system to compute redundantly the same operations of a selected SP. Results are compared, and possible failures detected. A custom reconfiguration instruction allows the dynamic selection of the target SP to be monitored. Experimental results show that the proposed mechanism introduces a limited area overhead while it provides a significant increase in the in-field fault detection capabilities of the GPGPU. Its flexibility allows selecting the best trade-off between fault detection latency and performance overhead.
A dynamic hardware redundancy mechanism for the in-field fault detection in cores of GPGPUs / Rodriguez Condia, Josie E.; Narducci, Pierpaolo; Reorda, M. Sonza; Sterpone, L.. - ELETTRONICO. - (2020), pp. 1-6. (Intervento presentato al convegno 2020 23rd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS) tenutosi a Novi Sad, Serbia, Serbia nel 22-24 April 2020) [10.1109/DDECS50862.2020.9095665].
A dynamic hardware redundancy mechanism for the in-field fault detection in cores of GPGPUs
Rodriguez Condia, Josie E.;Reorda, M. Sonza;Sterpone, L.
2020
Abstract
In the past, in most General-Purpose Graphic Processing Units (GPGPUs) application fields (e.g., multimedia and gaming), the reliability features were not so relevant. Nowadays, GPGPUs are used in new domains, such as the automotive one, where reliability plays a significant role. In this work, we describe a dynamic duplication with a comparison (DDWC) mechanism intended to harden the Scalar Processor (SP) units located in the Streaming multiprocessors (SM) of a GPGPU. The proposed mechanism targets the permanent faults that may arise inside the SPs. One additional SP unit is included in the system to compute redundantly the same operations of a selected SP. Results are compared, and possible failures detected. A custom reconfiguration instruction allows the dynamic selection of the target SP to be monitored. Experimental results show that the proposed mechanism introduces a limited area overhead while it provides a significant increase in the in-field fault detection capabilities of the GPGPU. Its flexibility allows selecting the best trade-off between fault detection latency and performance overhead.File | Dimensione | Formato | |
---|---|---|---|
09095665.pdf
non disponibili
Descrizione: post-print version of the manuscript
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
952.96 kB
Formato
Adobe PDF
|
952.96 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2827573