General Purpose Graphic Processing Units (GPGPUs) are effective solutions for high-demanding data processing applications. Recently, they started to be used even in safety-critical applications, such as autonomous car driving systems. GPGPUs are implemented using the latest semiconductor technologies, which are more prone to faults arising during the lifetime operation. However, until now fault mitigation solutions were not extensively included in GPGPUs, due to the limited reliability requirements of the applications they were originally intended for (e.g., gaming or multimedia). This work proposes a dynamically configurable self- repairing mechanism aimed at mitigating the impact of permanent faults in the Scalar Processor (SP) cores in GPGPUs. The mechanism is based on spare modules that can be used to replace faulty SPs when a fault is detected. A configuration instruction allows dynamically controlling in software the selection of the set of active SPs in the SM. The method is extremely flexible since it does not require any change in the application software. Experimental results show that the solution introduces a moderate area overhead while allowing continue working even in the case of any permanent faults affecting the SPs.

A dynamic reconfiguration mechanism to increase the reliability of GPGPUs / Rodriguez Condia, Josie E.; Narducci, Pierpaolo; Reorda, M. Sonza; Sterpone, L.. - ELETTRONICO. - (2020), pp. 1-6. (Intervento presentato al convegno 2020 IEEE 38th VLSI Test Symposium (VTS) tenutosi a San Diego, USA nel 5-8 April 2020) [10.1109/VTS48691.2020.9107572].

A dynamic reconfiguration mechanism to increase the reliability of GPGPUs

Rodriguez Condia, Josie E.;Reorda, M. Sonza;Sterpone, L.
2020

Abstract

General Purpose Graphic Processing Units (GPGPUs) are effective solutions for high-demanding data processing applications. Recently, they started to be used even in safety-critical applications, such as autonomous car driving systems. GPGPUs are implemented using the latest semiconductor technologies, which are more prone to faults arising during the lifetime operation. However, until now fault mitigation solutions were not extensively included in GPGPUs, due to the limited reliability requirements of the applications they were originally intended for (e.g., gaming or multimedia). This work proposes a dynamically configurable self- repairing mechanism aimed at mitigating the impact of permanent faults in the Scalar Processor (SP) cores in GPGPUs. The mechanism is based on spare modules that can be used to replace faulty SPs when a fault is detected. A configuration instruction allows dynamically controlling in software the selection of the set of active SPs in the SM. The method is extremely flexible since it does not require any change in the application software. Experimental results show that the solution introduces a moderate area overhead while allowing continue working even in the case of any permanent faults affecting the SPs.
2020
978-1-7281-5359-9
File in questo prodotto:
File Dimensione Formato  
09107572.pdf

accesso riservato

Descrizione: post-print version of the manuscript
Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.28 MB
Formato Adobe PDF
1.28 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2833232