FT-Sparse: Algorithm-Based Fault Tolerance for Sparse CNNs Using Structured Sparsity in GPUs

Rodriguez Condia, Josie Esteban; Ahmadilivani, Mohammad Hasan; Raik, Jaan; Jenihhin, Maksim; Reorda, Matteo Sonza

doi:10.1109/vts69484.2026.11563359

The widespread adoption of Convolutional Neural Networks (CNNs) and their effective integration in several safety-critical applications have been driven by sophisticated optimizations, including advanced pruning techniques that preserve accuracy and performance, as well as specialized hardware accelerators, such as GPU Tensor Cores (TCs) with support for structural sparsity. Nonetheless, the analysis of the overall impact of soft errors on sparse CNNs due to corrupted sparsity mechanisms in GPUs, as well as the development of mitigation strategies, has not yet been sufficiently investigated. This work evaluates the impact of soft errors on the structured sparsity mechanism in GPU TCs, affecting the operation of CNN models. The effect of transient faults is evaluated at two levels: i) the micro-architecture of sparsity mechanisms in TCs, and ii) sparse CNN applications. In particular, several fault injection campaigns targeting the sparsity mechanism (errors in sparse indices and compressed weights) enable the characterization of errors in sparse workloads. Moreover, we introduce FT-Sparse, an algorithm-based fault tolerance (ABFT) solution to enhance the resilience of sparse CNNs against soft errors in the structural sparsity of GPUs. The results indicate that FT-Sparse can reduce the critical Silent Data Corruptions (SDCs) by up to 4.87×, with merely up to 7.59% average kernel execution overhead on GPUs.

FT-Sparse: Algorithm-Based Fault Tolerance for Sparse CNNs Using Structured Sparsity in GPUs / Rodriguez Condia, J.E., Ahmadilivani, M.H., Raik, J., Jenihhin, M., Reorda, M.S.. - ELETTRONICO. - (2026), pp. 1-7. (44th VLSI Test Symposium (VTS) Napa, CA (USA) 27-29 April 2026) [10.1109/vts69484.2026.11563359].

FT-Sparse: Algorithm-Based Fault Tolerance for Sparse CNNs Using Structured Sparsity in GPUs

Rodriguez Condia, Josie Esteban;Ahmadilivani, Mohammad Hasan;Raik, Jaan;Jenihhin, Maksim;Reorda, Matteo Sonza

2026

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del prodotto
	
				2026
			
	Codice ISBN
	
				979-8-3315-6337-0
			
	Appare nelle tipologie
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
_VTS_26__checksum_for_sparsity_in_DNNs_old.pdf accesso riservato Tipologia: 1. Preprint / submitted version [pre- review] Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 578.13 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	578.13 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
FT-Sparse_Algorithm-Based_Fault_Tolerance_for_Sparse_CNNs_Using_Structured_Sparsity_in_GPUs.pdf accesso riservato Tipologia: 2a Post-print versione editoriale / Version of Record Licenza: Non Pubblico - Accesso privato/ristretto Dimensione 476.24 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	476.24 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3012344

PORTO @ Archivio Istituzionale della Ricerca

FT-Sparse: Algorithm-Based Fault Tolerance for Sparse CNNs Using Structured Sparsity in GPUs

Rodriguez Condia, Josie Esteban;Ahmadilivani, Mohammad Hasan;Raik, Jaan;Jenihhin, Maksim;Reorda, Matteo Sonza

2026

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Pubblicazioni consigliate

Informazioni

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)