Mature computing formats, such as Floating-Point (FP), provide optimal accuracy to process real values and are essential in most scientific domains. However, the massive market adoption of highly parallel systems, with advanced technology nodes, in several domains exacerbates the need for highly reliable systems. Formerly, most reliability evaluations targeted FP hardware. Unfortunately, fine-grain assessments on cores with recent arithmetic format alternatives, such as Posit (particularly suited for Artificial Intelligence), have remained partially unexplored. Similarly, the effects of corruption on operations due to faulty hardware are not well-known, which may prevent the proposal of effective mitigation mechanisms. This work exhaustively evaluates the fine-grain effects of permanent faults in the hardware of arithmetic cores for the three most extensively used operations in modern applications (Add, Multiply, and Multiply and Add), including machine learning, implemented in Posit and FP. Our results indicate that Posit cores are less fault-vulnerable than FP ones. However, Posit cores are more prone to induce significant operational corruption than FP ones (5.2% to 7.5%). We also found that absolute errors in faulty FP cores are higher by up to 2 orders of magnitude than in Posit ones. Finally, we applied and evaluated three mitigation mechanisms (Self-Check and repair, Dual Modular Redundancy, and Triple Modular Redundancy), effectively reducing the most critical errors with moderate area costs (20% to 110%).

Investigating and Mitigating Critical Faults in Floating-Point and Posit Arithmetic Hardware / Rodriguez Condia, Josie Esteban; Guerrero-Balaguera, Juan-David; Sierra, Robert Limas; Reorda, Matteo Sonza. - In: IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING. - ISSN 2168-6750. - ELETTRONICO. - (2025), pp. 1-12. [10.1109/tetc.2025.3615827]

Investigating and Mitigating Critical Faults in Floating-Point and Posit Arithmetic Hardware

Rodriguez Condia, Josie Esteban;Guerrero-Balaguera, Juan-David;Sierra, Robert Limas;Reorda, Matteo Sonza
2025

Abstract

Mature computing formats, such as Floating-Point (FP), provide optimal accuracy to process real values and are essential in most scientific domains. However, the massive market adoption of highly parallel systems, with advanced technology nodes, in several domains exacerbates the need for highly reliable systems. Formerly, most reliability evaluations targeted FP hardware. Unfortunately, fine-grain assessments on cores with recent arithmetic format alternatives, such as Posit (particularly suited for Artificial Intelligence), have remained partially unexplored. Similarly, the effects of corruption on operations due to faulty hardware are not well-known, which may prevent the proposal of effective mitigation mechanisms. This work exhaustively evaluates the fine-grain effects of permanent faults in the hardware of arithmetic cores for the three most extensively used operations in modern applications (Add, Multiply, and Multiply and Add), including machine learning, implemented in Posit and FP. Our results indicate that Posit cores are less fault-vulnerable than FP ones. However, Posit cores are more prone to induce significant operational corruption than FP ones (5.2% to 7.5%). We also found that absolute errors in faulty FP cores are higher by up to 2 orders of magnitude than in Posit ones. Finally, we applied and evaluated three mitigation mechanisms (Self-Check and repair, Dual Modular Redundancy, and Triple Modular Redundancy), effectively reducing the most critical errors with moderate area costs (20% to 110%).
File in questo prodotto:
File Dimensione Formato  
Investigating_and_Mitigating_Critical_Faults_in_Floating-Point_and_Posit_Arithmetic_Hardware.pdf

accesso riservato

Tipologia: 2. Post-print / Author's Accepted Manuscript
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 4.69 MB
Formato Adobe PDF
4.69 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3003818