IIn recent years, the observed Failure-in-Time (FIT) rates of semiconductor devices in data center fleets by cloud service providers have drastically increased, pinpointing that defective devices are escaping during manufacturing tests. Although semiconductor manufacturers minimize FIT rates of their devices, the exponential deployment at scale, combined with growing design complexity and increasing transistor density, results in a non-negligible number of defective devices in server fleets. High-end processors undergo different manufacturing test phases to ensure low defect-part-per-million (DPPM) of shipped defective devices. However, it is always difficult, if not impossible, to cover the full spectrum of all possible defects in a device. This paper proposes a grading methodology for manufacturing test escapes of a permanent nature, such as Stuck-At faults, Transition Delay faults, and Small Delay Faults, that would be very likely to produce Silent Data Errors (SDEs). The proposed methodology combines several structural measurements that describe a fault's likelihood to create a silent data error. If the faults that are at high risk of being SDE can be identified early in the product lifecycle, design decisions can be made to prevent these faults from actually creating silent data errors when the device is deployed. Experimental results are carried out on arithmetic modules used frequently in High Performance Computing.

From Structural Test Escapes to Silent Data Errors: A preliminary analysis / Angione, Francesco; Bernardi, Paolo; Sinha, Arani. - ELETTRONICO. - (2025). (Intervento presentato al convegno 2025 IEEE 9th International Test Conference India (ITC India) tenutosi a Bangalore (IND) nel 20-22 July 2025) [10.1109/ITCIndia66078.2025.11141623].

From Structural Test Escapes to Silent Data Errors: A preliminary analysis

Francesco Angione;Paolo Bernardi;
2025

Abstract

IIn recent years, the observed Failure-in-Time (FIT) rates of semiconductor devices in data center fleets by cloud service providers have drastically increased, pinpointing that defective devices are escaping during manufacturing tests. Although semiconductor manufacturers minimize FIT rates of their devices, the exponential deployment at scale, combined with growing design complexity and increasing transistor density, results in a non-negligible number of defective devices in server fleets. High-end processors undergo different manufacturing test phases to ensure low defect-part-per-million (DPPM) of shipped defective devices. However, it is always difficult, if not impossible, to cover the full spectrum of all possible defects in a device. This paper proposes a grading methodology for manufacturing test escapes of a permanent nature, such as Stuck-At faults, Transition Delay faults, and Small Delay Faults, that would be very likely to produce Silent Data Errors (SDEs). The proposed methodology combines several structural measurements that describe a fault's likelihood to create a silent data error. If the faults that are at high risk of being SDE can be identified early in the product lifecycle, design decisions can be made to prevent these faults from actually creating silent data errors when the device is deployed. Experimental results are carried out on arithmetic modules used frequently in High Performance Computing.
2025
979-8-3315-0129-7
File in questo prodotto:
File Dimensione Formato  
From_Structural_Test_Escapes_to_Silent_Data_Errors_A_Preliminary_Analysis.pdf

accesso riservato

Tipologia: 2a Post-print versione editoriale / Version of Record
Licenza: Non Pubblico - Accesso privato/ristretto
Dimensione 1.15 MB
Formato Adobe PDF
1.15 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11583/3002461