Microarchitectural Reliability Evaluation of a Block Scheduling Controller in GPUs

Rodriguez Condia, Josie E.; Faggiano, R; Reorda, Ms

doi:10.1109/ISVLSI54635.2022.00018

Graphic Processing Units (GPUs) are currently adopted in several domains with substantial reliability requirements, such as in automotive and robotics. Thus, evaluating the impact of possible faults affecting the internal components of a device is a crucial step towards developing certified products according to industrial standards (i.e., ISO26262). The block scheduling controllers play an important role in resource management and task operation in GPUs. However, understanding the sensitivity to faults of such modules is crucial in the development of mitigation mechanisms and effective countermeasures. This work evaluates the impact of transient faults on the block controller in a GPU. For this purpose, we extended a low-level micro-architecture GPU model (FlexGripPlus) to support the management of the different execution cores (i.e., the Streaming Multiprocessors or SIMD Engines) and allow the analysis of fault effects. A set of typical workloads were employed in the reliability evaluation. The experimental results show that the most critical stages for faults in the scheduler are those arising during the device's configuration and the exchange of tasks from an application. Moreover, when considering faults in the controller, multi-core GPUs appear to be less sensitive to faults than single-core GPUs. Finally, the parallel distribution of tasks (in blocks) also plays a significant role in the vulnerability to faults of the scheduler.

Microarchitectural Reliability Evaluation of a Block Scheduling Controller in GPUs / Rodriguez Condia, J.E., Faggiano, R., Reorda, M.s.. - (2022), pp. 26-31. (IEEE Computer Society Annual Symposium on VLSI (ISVLSI 2022) Nicosia, Cyprus 04-06 July 2022) [10.1109/ISVLSI54635.2022.00018].

Microarchitectural Reliability Evaluation of a Block Scheduling Controller in GPUs

Rodriguez Condia, Josie E.;Faggiano, R;Reorda, MS

2022

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Anno del prodotto

2022

Appare nelle tipologie

4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Microarchitectural_Reliability_Evaluation_of_a_Block_Scheduling_Controller_in_GPUs.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 857.35 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	857.35 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2978951

PORTO @ Archivio Istituzionale della Ricerca

Microarchitectural Reliability Evaluation of a Block Scheduling Controller in GPUs

Rodriguez Condia, Josie E.;Faggiano, R;Reorda, MS

2022

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)