This work proposes a comprehensive ISA extension to improve GPU reliability to transient effects. Three additional instructions are proposed, implemented, and combined with software-based datapath duplication. Modified program codes are compared to state-of-the-art software-based fault tolerance techniques in terms of execution time. The circuit area is evaluated against the original GPU architecture, and a fault injection campaign is performed to assess reliability. Results show that this comprehensive ISA extension improves performance and fault detection capabilities of software-based approaches at negligible costs in terms of circuit area. This work can help engineers in designing more efficient and resilient GPU architectures.
Improving GPU register file reliability with a comprehensive ISA extension / Gonçalves, M. M.; Rodriguez Condia, Josie E.; Reorda, M. Sonza; Sterpone, L.; Azambuja, J. R.. - In: MICROELECTRONICS RELIABILITY. - ISSN 0026-2714. - ELETTRONICO. - (2020), pp. 113768-113776.
Titolo: | Improving GPU register file reliability with a comprehensive ISA extension |
Autori: | |
Data di pubblicazione: | 2020 |
Rivista: | |
Digital Object Identifier (DOI): | http://dx.doi.org/10.1016/j.microrel.2020.113768 |
Appare nelle tipologie: | 1.1 Articolo in rivista |
File in questo prodotto:
File | Descrizione | Tipologia | Licenza | |
---|---|---|---|---|
1-s2.0-S0026271420305631-main.pdf | post-print version of the manuscript | 2a Post-print versione editoriale / Version of Record | Non Pubblico - Accesso privato/ristretto | Administrator Richiedi una copia |
http://hdl.handle.net/11583/2851141