We investigate the sources of detected unrecoverable errors (DUEs) in graphics processing units (GPUs) exposed to a neutron beam. Illegal memory accesses and interface errors are among the more likely sources of DUEs. Error-correcting code (ECC) increases the launch failure events. Our test procedure has shown that ECC can reduce the DUEs caused by Illegal Address access up to 92% for Kepler and up to 98% for Volta. In addition, we analyze whether the compiler optimizations can impact the DUE sources distribution for the matrix multiplication. We found that the machine codes generated by the different optimization levels can change the DUE source by no more than 24% on average.
Experimental Findings on the Sources of Detected Unrecoverable Errors in GPUs / Fernandes dos Santos, Fernando; Malde, Sujit; Cazzaniga, Carlo; Frost, Cris; Carro, Luigi; Rech, Paolo. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 69:3(2022), pp. 436-443. [10.1109/TNS.2022.3141341]
Experimental Findings on the Sources of Detected Unrecoverable Errors in GPUs
Carro, Luigi;Rech, Paolo
2022
Abstract
We investigate the sources of detected unrecoverable errors (DUEs) in graphics processing units (GPUs) exposed to a neutron beam. Illegal memory accesses and interface errors are among the more likely sources of DUEs. Error-correcting code (ECC) increases the launch failure events. Our test procedure has shown that ECC can reduce the DUEs caused by Illegal Address access up to 92% for Kepler and up to 98% for Volta. In addition, we analyze whether the compiler optimizations can impact the DUE sources distribution for the matrix multiplication. We found that the machine codes generated by the different optimization levels can change the DUE source by no more than 24% on average.File | Dimensione | Formato | |
---|---|---|---|
FINAL_VERSION.pdf
accesso aperto
Tipologia:
2. Post-print / Author's Accepted Manuscript
Licenza:
Pubblico - Tutti i diritti riservati
Dimensione
2.59 MB
Formato
Adobe PDF
|
2.59 MB | Adobe PDF | Visualizza/Apri |
Experimental_Findings_on_the_Sources_of_Detected_Unrecoverable_Errors_in_GPUs.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
824.03 kB
Formato
Adobe PDF
|
824.03 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/2948379