IIn recent years, the observed Failure-in-Time (FIT) rates of semiconductor devices in data center fleets by cloud service providers have drastically increased, pinpointing that defective devices are escaping during manufacturing tests. Although semiconductor manufacturers minimize FIT rates of their devices, the exponential deployment at scale, combined with growing design complexity and increasing transistor density, results in a non-negligible number of defective devices in server fleets. High-end processors undergo different manufacturing test phases to ensure low defect-part-per-million (DPPM) of shipped defective devices. However, it is always difficult, if not impossible, to cover the full spectrum of all possible defects in a device. This paper proposes a grading methodology for manufacturing test escapes of a permanent nature, such as Stuck-At faults, Transition Delay faults, and Small Delay Faults, that would be very likely to produce Silent Data Errors (SDEs). The proposed methodology combines several structural measurements that describe a fault's likelihood to create a silent data error. If the faults that are at high risk of being SDE can be identified early in the product lifecycle, design decisions can be made to prevent these faults from actually creating silent data errors when the device is deployed. Experimental results are carried out on arithmetic modules used frequently in High Performance Computing.
From Structural Test Escapes to Silent Data Errors: A preliminary analysis / Angione, Francesco; Bernardi, Paolo; Sinha, Arani. - ELETTRONICO. - (2025). (Intervento presentato al convegno 2025 IEEE 9th International Test Conference India (ITC India) tenutosi a Bangalore (IND) nel 20-22 July 2025) [10.1109/ITCIndia66078.2025.11141623].
From Structural Test Escapes to Silent Data Errors: A preliminary analysis
Francesco Angione;Paolo Bernardi;
2025
Abstract
IIn recent years, the observed Failure-in-Time (FIT) rates of semiconductor devices in data center fleets by cloud service providers have drastically increased, pinpointing that defective devices are escaping during manufacturing tests. Although semiconductor manufacturers minimize FIT rates of their devices, the exponential deployment at scale, combined with growing design complexity and increasing transistor density, results in a non-negligible number of defective devices in server fleets. High-end processors undergo different manufacturing test phases to ensure low defect-part-per-million (DPPM) of shipped defective devices. However, it is always difficult, if not impossible, to cover the full spectrum of all possible defects in a device. This paper proposes a grading methodology for manufacturing test escapes of a permanent nature, such as Stuck-At faults, Transition Delay faults, and Small Delay Faults, that would be very likely to produce Silent Data Errors (SDEs). The proposed methodology combines several structural measurements that describe a fault's likelihood to create a silent data error. If the faults that are at high risk of being SDE can be identified early in the product lifecycle, design decisions can be made to prevent these faults from actually creating silent data errors when the device is deployed. Experimental results are carried out on arithmetic modules used frequently in High Performance Computing.File | Dimensione | Formato | |
---|---|---|---|
From_Structural_Test_Escapes_to_Silent_Data_Errors_A_Preliminary_Analysis.pdf
accesso riservato
Tipologia:
2a Post-print versione editoriale / Version of Record
Licenza:
Non Pubblico - Accesso privato/ristretto
Dimensione
1.15 MB
Formato
Adobe PDF
|
1.15 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/11583/3002461