This work proposes a comprehensive ISA extension to improve GPU reliability to transient effects. Three additional instructions are proposed, implemented, and combined with software-based datapath duplication. Modified program codes are compared to state-of-the-art software-based fault tolerance techniques in terms of execution time. The circuit area is evaluated against the original GPU architecture, and a fault injection campaign is performed to assess reliability. Results show that this comprehensive ISA extension improves performance and fault detection capabilities of software-based approaches at negligible costs in terms of circuit area. This work can help engineers in designing more efficient and resilient GPU architectures.
Improving GPU register file reliability with a comprehensive ISA extension / Gonçalves, M. M.; Rodriguez Condia, Josie E.; Reorda, M. Sonza; Sterpone, L.; Azambuja, J. R.. - In: MICROELECTRONICS RELIABILITY. - ISSN 0026-2714. - ELETTRONICO. - (2020), pp. 113768-113776.
|Titolo:||Improving GPU register file reliability with a comprehensive ISA extension|
|Data di pubblicazione:||2020|
|Digital Object Identifier (DOI):||http://dx.doi.org/10.1016/j.microrel.2020.113768|
|Appare nelle tipologie:||1.1 Articolo in rivista|