General Purpose Graphics Processing Units (GPGPUs) have been extensively used in the last decade as accelerators in high demanding applications, such as multimedia processing and high-performance computing. Nowadays, these devices are becoming popular even in safety-critical applications, such as in autonomous and semi-autonomous vehicles. However, these devices can suffer from the effects of transient faults, such as those produced by radiation effects. Among those effects, Single Event Upsets (SEUs), which are the focus of this paper, can cause application misbehaviors, which may lead to catastrophic consequences. In this work, we first describe how we extended the capabilities of an open-source VHDL GPGPU model (FlexGrip) and developed a new version named FlexGripPlus to study and analyze the effects of SEUs in a GPGPU in a much more detailed manner. We also performed extensive fault injection campaigns using FlexGripPlus, which allowed identifying the most critical effects within the GPGPU architecture. We finally focused on the scheduler controller since it represents a module that is specific to the GPGPU architecture and showed that it has different levels of SEU sensibility depending on the affected location. Moreover, the results of additional analyses varying the number of parallel execution units in the system are presented, demonstrating the correlation between the number of execution units in a GPGPU and the system reliability.

FlexGripPlus: An improved GPGPU model to support reliability analysis / Rodriguez Condia, Josie E.; Du, Boyang; Sonza Reorda, Matteo; Sterpone, Luca. - In: MICROELECTRONICS RELIABILITY. - ISSN 0026-2714. - 109:(2020), pp. 1-14. [10.1016/j.microrel.2020.113660]

FlexGripPlus: An improved GPGPU model to support reliability analysis

Rodriguez Condia, Josie E.;Du, Boyang;Sonza Reorda, Matteo;Sterpone, Luca
2020

Abstract

General Purpose Graphics Processing Units (GPGPUs) have been extensively used in the last decade as accelerators in high demanding applications, such as multimedia processing and high-performance computing. Nowadays, these devices are becoming popular even in safety-critical applications, such as in autonomous and semi-autonomous vehicles. However, these devices can suffer from the effects of transient faults, such as those produced by radiation effects. Among those effects, Single Event Upsets (SEUs), which are the focus of this paper, can cause application misbehaviors, which may lead to catastrophic consequences. In this work, we first describe how we extended the capabilities of an open-source VHDL GPGPU model (FlexGrip) and developed a new version named FlexGripPlus to study and analyze the effects of SEUs in a GPGPU in a much more detailed manner. We also performed extensive fault injection campaigns using FlexGripPlus, which allowed identifying the most critical effects within the GPGPU architecture. We finally focused on the scheduler controller since it represents a module that is specific to the GPGPU architecture and showed that it has different levels of SEU sensibility depending on the affected location. Moreover, the results of additional analyses varying the number of parallel execution units in the system are presented, demonstrating the correlation between the number of execution units in a GPGPU and the system reliability.
File in questo prodotto:
File Dimensione Formato  
journal-version-V20.pdf

accesso aperto

Descrizione: Pre-print version of manuscript (without format of the Journal)
Tipologia: 1. Preprint / submitted version [pre- review]
Licenza: PUBBLICO - Tutti i diritti riservati
Dimensione 1.47 MB
Formato Adobe PDF
1.47 MB Adobe PDF Visualizza/Apri
1-s2.0-S0026271419307978-main.pdf

accesso aperto

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Creative commons
Dimensione 1.91 MB
Formato Adobe PDF
1.91 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/2820716