In the past, in most General-Purpose Graphic Processing Units (GPGPUs) application fields (e.g., multimedia and gaming), the reliability features were not so relevant. Nowadays, GPGPUs are used in new domains, such as the automotive one, where reliability plays a significant role. In this work, we describe a dynamic duplication with a comparison (DDWC) mechanism intended to harden the Scalar Processor (SP) units located in the Streaming multiprocessors (SM) of a GPGPU. The proposed mechanism targets the permanent faults that may arise inside the SPs. One additional SP unit is included in the system to compute redundantly the same operations of a selected SP. Results are compared, and possible failures detected. A custom reconfiguration instruction allows the dynamic selection of the target SP to be monitored. Experimental results show that the proposed mechanism introduces a limited area overhead while it provides a significant increase in the in-field fault detection capabilities of the GPGPU. Its flexibility allows selecting the best trade-off between fault detection latency and performance overhead.
A dynamic hardware redundancy mechanism for the in-field fault detection in cores of GPGPUs / Rodriguez Condia, Josie E.; Narducci, Pierpaolo; Reorda, M. Sonza; Sterpone, L.. - ELETTRONICO. - (2020), pp. 1-6. ((Intervento presentato al convegno 2020 23rd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS) tenutosi a Novi Sad, Serbia, Serbia nel 22-24 April 2020.
Titolo: | A dynamic hardware redundancy mechanism for the in-field fault detection in cores of GPGPUs |
Autori: | |
Data di pubblicazione: | 2020 |
Abstract: | In the past, in most General-Purpose Graphic Processing Units (GPGPUs) application fields (e.g., ...multimedia and gaming), the reliability features were not so relevant. Nowadays, GPGPUs are used in new domains, such as the automotive one, where reliability plays a significant role. In this work, we describe a dynamic duplication with a comparison (DDWC) mechanism intended to harden the Scalar Processor (SP) units located in the Streaming multiprocessors (SM) of a GPGPU. The proposed mechanism targets the permanent faults that may arise inside the SPs. One additional SP unit is included in the system to compute redundantly the same operations of a selected SP. Results are compared, and possible failures detected. A custom reconfiguration instruction allows the dynamic selection of the target SP to be monitored. Experimental results show that the proposed mechanism introduces a limited area overhead while it provides a significant increase in the in-field fault detection capabilities of the GPGPU. Its flexibility allows selecting the best trade-off between fault detection latency and performance overhead. |
ISBN: | 978-1-7281-9938-2 |
Appare nelle tipologie: | 4.1 Contributo in Atti di convegno |
File in questo prodotto:
File | Descrizione | Tipologia | Licenza | |
---|---|---|---|---|
09095665.pdf | post-print version of the manuscript | 2a Post-print versione editoriale / Version of Record | Non Pubblico - Accesso privato/ristretto | Administrator Richiedi una copia |
http://hdl.handle.net/11583/2827573