# **Research on Reliability of Nanoscale System on Chips**

A dissertation submitted to Xi'an Jiaotong University and Politecnico di Torino in partial fulfillment of the requirements for the degree of Doctor of Philosophy

> Weitao Yang Supervisor: Prof. Chaohui He Prof. Luca Sterpone Nuclear Science and Technology March 2022

## 摘要

纳米级系统芯片具有诸多优势,如体积小,重量轻,功耗低,集成度高等,使其在 多种应用中不断受到青睐,如航空航天,高能物理等。不过,对于应用于航空航天,高 能物理等环境下的电子系统,其所要面临的一个重要挑战是高能粒子入射电子系统诱 发的可靠性问题。尤其是随着半导体制造工艺的不断提升,各种高能粒子入射造成的 电子系统单粒子效应问题愈发显著。

为了探究纳米级系统芯片在不同粒子辐射环境下的可靠性问题,首先选取了两款 纳米级系统芯片: Xilinx Zyng-7000 All Programmable System on Chip (Xilinx 28nm CMOS SoC)和 Xilinx Ultrascale+ Multi-Processor Programmable System on Chip (Xilinx 16nm FinFET MPSoC)为研究对象。其中,前者为 28nm 互补金属氧化物制造工艺(CMOS), 后者为 16nm FinFET 制造工艺。针对两款芯片,采用多种手段对高能粒子诱发单粒子 效应问题开展了研究。研究手段主要包括加速器辐照实验,GEANT4 蒙特卡洛模拟仿 真,软件故障注入和概率安全分析等。对于 Xilinx 28nm CMOS SoC, 主要进行了质子, 大气中子和重离子的单粒子效应加速器辐照测试以及蒙特卡洛模拟。其中,质子辐照 中,利用 70 和 90 MeV 质子对非加固和加固状态下的片上存储器模块进行了单粒子效 应辐照测试,指出了70和90MeV质子诱发单粒子效应能力较为接近的原因,验证了 基于非对称双核模式下的单粒子效应加固能力。大气中子辐照中,通过对比分析不同 能量段中子造成的单粒子效应情况,指出 1-10 MeV 中子对于 Xilinx 28nm CMOS SoC 单粒子效应贡献不能忽略,热中子导致的单粒子效应亦需要考虑。在重离子加速器辐 照测试中,对于不同处理器模式下的单粒子效应进行了测试,指出处理器模式对单粒 子翻转效应无影响,发现了高线性能量转移值(LET)离子会诱发 Xilinx Zynq-7000 SoC 供电接口电流阶梯上升。

对于 Xilinx 16nm FinFET MPSoC,以多种图像应用处理算法为例,主要涉及图像 拉伸、边沿处理和深度神经网络(DNN)处理等,开发了不同的单粒子效应测试以及 软件故障注入系统。设计了基于软错误缓解(SEM)IP 核,动态部分重配置(DPR) 和动态重配置(DR)的故障注入系统,并且针对不同图像应用处理算法进行了故障注 入。结合概率安全分析方法,采用故障树分析方法对 SEM IP 故障注入结果进行了分 析,指出其敏感模块。采用失效模式和效应分析方法对 DPR 故障注入结果进行了分析, 指出了不同模块和错误对系统的可靠性造成威胁的严酷性程度。此外,通过 DNN 故障 注入,提出了一种提升 SRAM FPGA 上 DNN 识别精度的方法。

通过对两组系统级芯片单粒子效应进行测试,对不同应用环境下的可靠性问题进行了评估,为该系列芯片在强辐射环境下的应用提供了参考和支持。

Ι

关键词:系统芯片;单粒子效应;可靠性;加速器;蒙特卡洛模拟;故障注入 论文类型:应用基础

# ABSTRACT

Nanoscale system on chip (SoC) has many advantages, such as small size, light weight, low power consumption, high integration, etc., making them increasingly popular in a variety of applications, such as aerospace, high-energy physics, etc. However, electronic systems used in aerospace, high-energy physics, and other environments must face an important challenge: the reliability problem under strong radiation environments. Smaller technology suffers more seriously.

In order to explore the reliability of nanoscale SoCs in different particle radiation environments, two SoCs: Xilinx Zynq-7000 All Programmable SoC (Xilinx 28nm CMOS SoC) and Xilinx Ultrascale+ Multi-Processor Programmable SoC (Xilinx 16nm FinFET MPSoC) are used as devices under tests (DUTs). The former is a 28nm complementary metal oxide manufacturing process (CMOS) product, and the latter is manufactured with 16nm FinFET technology. For the two SoCs, various methods were employed to evaluate the single event effects (SEEs). The research methods include accelerator irradiation, GEANT4 Monte Carlo simulation, software fault injection, and probabilistic safety analysis. For SEE on Xilinx 28nm CMOS SoC, the accelerator irradiations and Monte Carlo simulations for protons, atmospheric neutrons, and heavy ions were carried out. In proton irradiation, 70 and 90 MeV protons were used to perform SEE irradiation tests on the on-chip memory (OCM) block under the nonhardening and hardening conditions. It was pointed out why the 70 and 90 MeV protons were more similar in inducing SEE. Meanwhile, the SEE hardening capability of the design based on asymmetric dual-core mode was verified. During atmospheric neutron irradiation tests, SEEs caused by neutrons in different energy ranges were investigated. The results indicated that the contribution of neutrons from 1 to 10 MeV to SEE of Xilinx 28nm CMOS SoC can not be ignored, and the SEE caused by thermal neutrons should be considered. SEE under different processor modes was tested in the heavy ion accelerator irradiation test. It was pointed out that the processor mode did not affect the single event upset (SEU). It was found that high linear energy transfer (LET) particles can induce Xilinx 28nm CMOS SoC power supply interface step-up current.

Aiming at the Xilinx 16nm FinFET MPSoC, a variety of image application processing algorithms are applied as test objects, mainly involving image stretching, edge processing, and deep neural network (DNN) processing. For different algorithms, various SEE tests and software fault injection (FI) systems have been developed. Specifically, the FI systems involved soft error mitigation (SEM) IP, dynamic partial reconfiguration (DPR), and dynamic reconfiguration (DR). Different FI results were analyzed, taking advantage of the diverse probability safety analysis methods. For instance, the fault tree analysis (FTA) method was used to analyze the SEM IP FI results, and the modules' sensitivity was investigated. The

failure modes and effects analysis (FMEA) method was employed to analyze the DPR fault injection results. The severity of the threat to the system reliability caused by different modules and errors was observed. In addition, through FI on DNN implementation on SRAM-based FPGA, a method was proposed to improve the accuracy of DNN identification.

By assessing SEEs on two SoCs, reliability issues in different application environments were evaluated, which provided reference and support for the applications in the strong irradiation environments.

**KEY WORDS**: System on chip; Single event effect; Reliability; Accelerator; Monte carlo simulation; Fault injection

TYPE OF DISSERTATION: Application Fundamentals

# CONTENTS

| ABSTRACT (Chinese)                                |    |
|---------------------------------------------------|----|
| ABSTRACT (English)                                |    |
| 1 Preface                                         |    |
| 1.1 Research Background                           |    |
| 1.1.1 SoC Development and Application             |    |
| 1.1.2 Radiation Environment                       |    |
| 1.1.3 Radiation Effects                           |    |
| 1.2 Research Status of Nanoscale SoC SEE          |    |
| 1.2.1 SEE Research on Xilinx 28nm CMOS SoC        | .7 |
| 1.2.2 SEE Research on Xilinx 16nm FinFET MPSoC    | .8 |
| 1.3 Layout of the Dissertation                    | .9 |
| 2 SEE Evaluation on SoCs                          | 11 |
| 2.1 SEE Evaluation on Xilinx 28nm CMOS SoC        | 11 |
| 2.1.1 Xilinx Zynq-7000 SoC                        | 11 |
| 2.1.2 Tested Block                                | 12 |
| 2.1.3 Test System1                                | 2  |
| 2.1.4 SEE Test Facilities in China                | 3  |
| 2.1.5 Monte Carlo Simulation                      | 3  |
| 2.1.6 SEE Hardening                               | 4  |
| 2.2 SEE Evaluation on Xilinx 16nm FinFET MPSoC    | 6  |
| 2.2.1 Test Benchmarks                             | 17 |
| 2.2.2 Fault Injection Implementations             | 17 |
| 2.2.3 PSA Analysis                                | 20 |
| 2.3 Summary                                       | 21 |
| 3 Proton SEE on Xilinx 28nm CMOS SoC              | 22 |
| 3.1 SEE Induced by Proton                         | 22 |
| 3.2 Proton Irradiation Setup                      | 22 |
| 3.2.1 Proton Beam Terminal                        | 22 |
| 3.2.2 Test Layout                                 | 23 |
| 3.2.3 Test Implementation                         | 23 |
| 3.3 Irradiation Results and Analysis              | 24 |
| 3.3.1 Proton Irradiation Results                  |    |
| 3.3.2 Monte Carlo Simulation Analysis             |    |
| 3.4 Summary                                       |    |
| 4 Atmospheric Neutron SEE on Xilinx 28nm CMOS SoC |    |
| 4.1 Atmospheric Neutron SEE                       |    |

| 4.2 Irradiation Examination                           | 32 |
|-------------------------------------------------------|----|
| 4.2.1 CSNS Spectrum                                   | 32 |
| 4.2.2 Test Implementation                             | 32 |
| 4.3 Irradiation Results                               |    |
| 4.4 Test Results Analysis                             |    |
| 4.4.1 E>1 and E>10 MeV Neutron Contribution           |    |
| 4.4.2 Mono-energy Neutron Geant4 Simulation           | 35 |
| 4.5 Thermal Neutron Influence Evaluation              | 37 |
| 4.5.1 BL09-2 Irradiation Results                      |    |
| 4.5.2 Thermal Neutron Contribution                    |    |
| 4.5.3 Elements Interaction                            | 40 |
| 4.6 Equivalence with Medium-energy Proton             | 42 |
| 4.7 Summary                                           | 43 |
| 5 Multi Patterns SEE on Xilinx 28nm CMOS SoC          | 44 |
| 5.1 Patterns Examination and Irradiation Setup        | 44 |
| 5.1.1 Patterns Examination                            | 44 |
| 5.1.2 Irradiation Setup                               | 44 |
| 5.2 Irradiation Results                               | 46 |
| 5.2.1 SEE of HI-13 Irradiation                        | 46 |
| 5.2.2 SEE of HIRFL Irradiation                        | 47 |
| 5.3 Different Test Modes Influence                    | 49 |
| 5.4 Summary                                           | 51 |
| 6 Single Event Effect Hardening by Multi-Layer Design | 53 |
| 6.1 System-Level SEE Hardening                        | 53 |
| 6.1.1 Redundancy                                      | 53 |
| 6.1.2 Watchdog                                        | 53 |
| 6.1.3 Checkpoint Rollback Recovery                    | 53 |
| 6.2 Multi-Layer Hardening Design                      | 55 |
| 6.2.1 Redundancy Layer                                | 55 |
| 6.2.2 Watchdog Monitor Layer                          | 55 |
| 6.2.3 AMP Layer                                       | 56 |
| 6.3 Irradiation Tests                                 | 57 |
| 6.3.1 Test Setup                                      | 57 |
| 6.3.2 Proton Beam                                     | 57 |
| 6.4 Irradiation Results and Discussions               | 58 |
| 6.4.1 Irradiation Results                             | 58 |
| 6.4.2 Results Analysis                                | 59 |
| 6.5 Summary                                           | 61 |
| 7 SEM-based FI and FTA on Xilinx 16nm FinFET MPSoC    | 62 |
| 7.1 Overall Framework of SEM-based FI                 | 62 |
| 7.2 FI Design                                         | 63 |

| 7.2.1 Test Design in SEM-based FI                   | 63  |
|-----------------------------------------------------|-----|
| 7.2.2 FI and Outcome Terminal Design                | 64  |
| 7.3 FI Implementation                               | 65  |
| 7.4 Detected FI Results                             | 68  |
| 7.5 FTA on the Detected Errors                      | 70  |
| 7.5.1 Events in Fault Trees                         | 70  |
| 7.5.2 Failure Rates of Events                       | 71  |
| 7.6 Summary                                         | 73  |
| 8 DPR-based FI and FMEA on Xilinx 16nm FinFET MPSoC | 74  |
| 8.1 DPR-based FI Overall Structure                  | 74  |
| 8.2 DPR Design                                      | 75  |
| 8.3 FI in DPR                                       | 77  |
| 8.4 Detected Errors in FB and PB injections         | 79  |
| 8.4.1 Detected Errors                               | 79  |
| 8.4.2 SES of Errors                                 | 80  |
| 8.5 FMEA on the FI results                          | 81  |
| 8.5.1 FMEA Construction                             |     |
| 8.5.2 System Risk Assessment                        |     |
| 8.6 Summary                                         |     |
| 9 DR-based FI on DNN in Xilinx 16nm FinFET MPSoC    | 85  |
| 9.1 DR-based FI on DNN Realization Diagram          | 85  |
| 9.2 ZyNet DNN Implementation on Ultrascale+ MPSoC   |     |
| 9.2.1 Tested DNN                                    |     |
| 9.2.2 DNN Training and Implementation on MPSoC      |     |
| 9.3 FI on DNN                                       |     |
| 9.4 DNN FI Results                                  | 91  |
| 9.4.1 EIA on DNN                                    | 93  |
| 9.4.2 Optimal EIA on DNN                            | 94  |
| 9.5 DNN enhancement based on DR                     | 95  |
| 9.6 Summary                                         | 96  |
| 10 Conclusions and Suggestions                      | 97  |
| 10.1 Conclusions                                    | 97  |
| 10.2 Innovations                                    |     |
| 10.3 Suggestions                                    |     |
| Acknowledgments                                     |     |
| References                                          |     |
| Achievements                                        | 127 |
|                                                     |     |

# Abbreviation

| AXI         | Advanced extensible interface                             |
|-------------|-----------------------------------------------------------|
| APU         | Application process unit                                  |
| AMP         | Asymmetric multiprocessing                                |
| BL          | Beamline                                                  |
| BLF         | Bitstream load failure                                    |
| BRAM        | Block random access memory                                |
| BPSG        | Boro-phospho-silicate glass                               |
| CRE         | Calculation-result error                                  |
| CPU         | Central processing unit                                   |
| CY CIAE-100 | China institute of atomic energy 100 MeV proton cyclotron |
| CSNS        | China spallation neutron source                           |
| COTS        | Commercial-off-the-shelf                                  |
| CMOS        | Complementary metal oxide semiconductor                   |
| CLB         | Configuration logic block                                 |
| CRAM        | Configuration random access memory                        |
| CSU         | Configuration security unit                               |
| CRC         | Cyclic redundancy check                                   |
| DPHM        | Decoupled and poisoned hydrogen moderator                 |
| DNN         | Deep neural network                                       |
| DIA         | Degradation of the identification accuracy                |
| DCP         | Design checkpoints                                        |
| DRC         | Design rule check                                         |
| DUT         | Device under test                                         |
| DSP         | Digital signal processor                                  |
| DMA         | Direct memory access                                      |
| DD          | Displacement damage                                       |
| DFI         | DMA failed at initialization                              |
| DDR         | Double data read                                          |
| DCL         | Dual-core lockstep                                        |
| DPR         | Dynamic partial reconfiguration                           |
| DR          | Dynamic reconfiguration                                   |
| EIAS        | EIA sensitivity                                           |
| ERI         | Electronics resurgence initiative                         |
| EDF         | Energy-depletion film                                     |
| EIA         | Enhancement of the identification accuracy                |
| EDC         | Error detection and correction                            |
| ECC         | Error-correcting code                                     |
|             |                                                           |

| EBC    | Essential bits configuration                        |
|--------|-----------------------------------------------------|
| EBD    | Essential bits data                                 |
| ETA    | Event tree analysis                                 |
| FIT    | Failure in time                                     |
| FITH   | Fault injection terminal hang                       |
| FMEA   | Failure mode and effect analysis                    |
| FT     | Faraday tube                                        |
| FI     | Fault injection                                     |
| FTA    | Fault tree analysis                                 |
| FPGA   | Field-programmable gate array                       |
| FinFET | Fin field effect transistor                         |
| FF     | Flip flop                                           |
| FB     | Full bitstream                                      |
| BFUG   | Global clock buffer                                 |
| HI-13  | Heavy ion 13                                        |
| HIRFL  | Heavy ion research facility in lanzhou              |
| HKMG   | High-k metal gate                                   |
| IAC    | Identification accuracy changed                     |
| IO     | Input and output ports                              |
| ICAP   | Internal configuration access port                  |
| IRDS   | International roadmap for devices and systems       |
| IOB    | IO block                                            |
| JTAG   | Joint test action group                             |
| LGAA   | Lateral gate all around                             |
| LET    | Linear energy transfer                              |
| LFA    | Linear frame addresses                              |
| LUT    | Look up table                                       |
| MN     | Misidentification numbers                           |
| MBU    | Multi-bit upset                                     |
| MCU    | Multi-cell upset                                    |
| MP     | Multi-processor                                     |
| MPSoC  | Multi-processor system on chip                      |
| NICRA  | National innovation center of radiation application |
| OCM    | On chip memory                                      |
| OEIA   | Optimal EIA                                         |
| OTH    | Outcome terminal hang                               |
| PB     | Partial bitstream                                   |
| PSA    | Probability safety analysis                         |
| PS     | Processing system                                   |
| PCAP   | Processor configuration access port                 |
| PL     | Programmable logic                                  |
|        |                                                     |

| RHBP    | Radiation hardening by process                 |
|---------|------------------------------------------------|
| RAM     | Random access memory                           |
| ROM     | Read-only memory                               |
| RM      | Reconfiguration modules                        |
| ReLU    | Rectified linear unit                          |
| RISC    | Reduced instruction set computer               |
| RR      | Reprogrammable region                          |
| RPN     | Risk priority number                           |
| SEEM    | Secondary-electron emission monitor            |
| SD      | Secure digital                                 |
| SDC     | Silent data corruption                         |
| SBU     | Single bit upset                               |
| SEB     | Single event burnout                           |
| SEE     | Single event effect                            |
| SEFI    | Single event functional interruption           |
| SEGR    | Single event functional metruption             |
| SHE     | Single event fard error                        |
| SEL     | Single event latch-up                          |
| SET     | Single event transient                         |
| SEU     | Single event upset                             |
| SCU     | Snoop control unit                             |
| SEM     | Soft error mitigation                          |
| SER     | Soft error rate                                |
| SES     | Soft error sensitivity                         |
| SEP     | Solar energetic particle                       |
| SP      | Sole-processor                                 |
| SAA     | South atlantic anomaly                         |
| SRIM    | Stopping and range of ions in matter           |
| SB      | Switch box                                     |
| SMP     | Symmetric multiprocessing                      |
| SH      | System halt                                    |
| SoC     | System on chip                                 |
| SVT     | System validation tool                         |
| TID     | Total ionization dose                          |
| TMR     | Triple modular redundancy                      |
| TCL     | Triple-core lockstep                           |
| UART    | Universal asynchronous receiver-transmitter    |
| USB     | Universal serial bus                           |
| $N_b$   | Total tested bits                              |
| Ni      | Number of injected faults                      |
| $SER_b$ | Soft error rate of Mbit/FIT·Mbit <sup>-1</sup> |
|         |                                                |

## ABBREVIATION

| $SER_d$        | Soft error rate of device/FIT                                               |
|----------------|-----------------------------------------------------------------------------|
| ξ              | Error                                                                       |
| $\sigma_b$     | Bit cross section/cm <sup>2</sup> ·bit <sup>-1</sup>                        |
| $\sigma_d$     | Device cross section/cm <sup>2</sup>                                        |
| $\sigma_{sat}$ | Stature cross section/cm <sup>2</sup> or cm <sup>2</sup> ·bit <sup>-1</sup> |
| $L_{th}$       | LET threshold/MeV·cm <sup>2</sup> ·mg <sup>-1</sup>                         |
| Ø              | Cumulative fluence/cm <sup>-2</sup>                                         |
|                |                                                                             |

# 1 Preface

The system on chip (SoC) is a chip that integrates various electronic components. As a highly integrated advanced electronic system, the SoC rapidly has wider and wider applications. However, when the SoC is adopted in a hazardous environment, radiation effects, such as single event effect (SEE) caused by energetic particles, can not be ignored. This chapter briefly introduces the commercial-off-the-shelf (COTS) SoC development and application, harsh environment and radiation effects, and SEE research status on nanoscale SoCs.

## 1.1 Research Background

## 1.1.1 SoC Development and Application

The International Roadmap for Devices and Systems (IRDS) 2020 released the next generations' logic core device technology roadmap <sup>[1]</sup>. Table 1-1 and Figure 1-1 are extracted from the report. They indicate the future trend of semiconductor technology and nanoscale COTS SoC integration. And it also declares the advanced COTS SoC will be continually and prosperously applied in the future. At the same time, it evidences the necessity and urgency of constantly studying relative issues, such as reliability, on the advanced processes COTS SoC.

| finFET: fin field-effect transistor, LGAA: lateral gate-all-around device |        |        |       |       |          |          |  |
|---------------------------------------------------------------------------|--------|--------|-------|-------|----------|----------|--|
| Year                                                                      | 2020   | 2022   | 2025  | 2028  | 2031     | 2034     |  |
| "Node<br>range"/nm                                                        | "5"    | "3"    | "2.1" | "1.5" | "1.0 eq" | "0.7 eq" |  |
| -                                                                         | finFET | finFET | LGAA  | LGAA  | LGAA-3D  | LGAA-3D  |  |
| Mainstream<br>device                                                      | Oxide  | Oxide  | Oxide | Oxde  | Oxide    | Oxide    |  |
| Vdd/V                                                                     | 0.70   | 0.70   | 0.65  | 0.65  | 0.60     | 0.60     |  |
| Gate<br>length/nm                                                         | 18     | 16     | 14    | 12    | 12       | 12       |  |

Table 1-1 IRDS 2020 logic core device technology roadmap <sup>[1]</sup>,

As a highly integrated electronic system, the COTS SoC keeps pace with the advanced and updated technology all the time since it was first released in the 1970s<sup>[2]</sup>. Especially since it enters the ultra-deep sub-micron technology, paradigms of the SoC rapidly shift and constantly update <sup>[3]</sup>. Nowadays, the old generation micron technology, which integrated reduced instruction set computer (RISC) processors, digital signal processor (DSP), and others, has developed into the nanoscale technology hybrid all programmable multi-processor (MP) SoCs. Figure 1-2 (a) and 1-2 (b) present the schematics of the traditional SoC and newer COTS FinFET Ultrascale+ MPSoC, respectively.



Figure 1-1 Core in SoC reported in IRDS 2020<sup>[1]</sup>





(b) Newer FinFET Ultrascale+ MPSoC [4-6]

Figure 1-2 Schematics of the micron SoC and FinFET Ultrascale+ MPSoC

Nanoscale COTS SoC always enjoys excellent performance in manufacturing, integration, and power dissipation. That's why different nanoscale COTS SoCs continuously gain much attention and are widely applied in various applications. Besides the conventional applications, such as multimedia processing, communication, biomedical, the nanoscale COTS SoC applications currently also involve aerospace vehicles, artificial intelligence, self-driving, high energy physics equipment, and so on <sup>[7-15]</sup>. Moreover, the nanoscale COTS SoC also gains popularity in the radiation community compared to similar rad-hard parts considering the trade-off between the cost and the performance, especially when it comes to the Cubesats and Nanosats <sup>[16]</sup>. The electronics resurgence initiative (ERI) (2017) takes the nanoscale COTS 3DSoC as the partial 2025-2030 research plan, and the flight avionics hardware roadmap (2014) regards nanoscale avionics COTS SoC spanning 2017 to 2026 <sup>[17-18]</sup>. These facts signify the development and applications of nanoscale COTS SoC and

demonstrate the urgency and necessity of reliability research.

However, when the nanoscale COTS SoC is employed in these platforms and scenes, it will encounter different radiation environments.

### 1.1.2 Radiation Environment

The spacecraft electronics may suffer from energetic particles, i.e., protons, heavy ions, and electrons, from outer space. In detail, they come from the Van Allen belt, solar cosmic rays, or galactic cosmic rays <sup>[19-21]</sup>. Figure 1-3 shows the diagram of the earth's radiation environment <sup>[22]</sup>.



Figure 1-3 The diagram of the earth's radiation environment <sup>[22]</sup>

## 1) Van Allen Radiation Belt

In 1958, the Van Allen radiation belts were discovered <sup>[23]</sup>. The belts are dynamic regions where the earth's magnetic field traps charged particles. They are composed of two belts. One is the inner belt, and the other is the outer one. The former location is about 1.2R to 2R (R is the earth radius), and energetic protons dominate it. The energy of the proton is up to 100s MeV. However, the region of the outer belt is about 3R to 10R, and the majority of the particle is the electron. The maximum energy of the electron is about 7 MeV <sup>[24-25]</sup>. In the inner belt, a region is named South Atlantic Anomaly (SAA), where the magnetic is reduced, and the proton's flux is rather higher than the same altitude regions <sup>[26]</sup>. Figure 1-4 presents the trapped proton differential flux spectrum of the AP8 max model in OMERE <sup>[27]</sup>.



Figure 1-4 The differential flux of trapped proton for AP8 max model at 800km altitude and 98° inclination <sup>[27]</sup>

2) Solar Cosmic Rays

Solar cosmic rays are also called solar energetic particles (SEPs). It was first reported in the 1940s <sup>[28]</sup>. The rays are associated with solar flares. Most of the particle is protons, and the energy is up to GeV. Alpha particles, heavy ions, and electrons make up a small part of the rays. SEPs are episodic, and their cycle is about 11 years. That contains four low solar years and seven high solar years. Especially in the high years, solar flare frequency skyrockets and pose serious risks to various space and terrestrial electronic systems. Figure 1-5 draws the SEP differential fluence spectra for the 1989/10 Tylka model <sup>[29]</sup>.



Figure 1-5 SEP differential fluence spectra for the 1989/10 Tylka model <sup>[29]</sup>

## 3) Galactic Cosmic Rays

The galactic cosmic rays originate from the outside of the solar system <sup>[30]</sup>. However, it is an inverse correlation with solar activity. That means the galactic cosmic rays are intensive at a solar minimum <sup>[31]</sup>. The galactic cosmic rays are mainly composed of protons, which account for about 87%. The proton energy ranges from MeV to GeV. What's more, alpha particles and heavy ions take up approximately 12% and 1%, respectively <sup>[31]</sup>. Figure 1-6 (a) and (b) show the relative contribution and flux of different elements in galactic cosmic rays (Z=1 to Z=28), respectively.

Besides that, as the leap scaling of the nanoscale electronics, SEEs induced by atmospheric neutron also become significant to the terrestrial electronics system <sup>[32-34]</sup>. Simultaneously, SEEs caused by high energy electrons are gaining attention <sup>[35]</sup>.



Figure 1-6 Relative contribution and flux of different elements in galactic cosmic rays (Z=1 to Z=28)

The cosmic rays can interact with the atoms, such as <sup>14</sup>N or <sup>16</sup>O, in the atmosphere. And these processes will generate plenty of secondary particles. The majority of the generated particles are neutrons, and atmospheric neutrons' flux correlates with the altitude, latitude, and other factors <sup>[36]</sup>. Proton, electron, muon, pion, and others are also generated in the processes. Figure 1-7 displays the schematic of the atmospheric neutron environment.



Figure 1-7 Schematic of the atmospheric neutron environment<sup>[36]</sup>

From aerospace to terrestrial, different energetic particles appear in different environments, and their incidents in nanoscale COTS SoCs may influence their reliabilities and cause various radiation effects.

## 1.1.3 Radiation Effects

The transient and cumulative radiation effects occur in electronics due to energetic particles hitting. The cumulative effect is the result of long-term irradiation. It includes total ionization dose (TID) and displacement damage (DD) <sup>[37-40]</sup>. And The transient effect is the single event effect (SEE) induced by a single energetic particle <sup>[41-43]</sup>.

#### 1) TID

The TID effect comes from the energy deposited by ionizing particles <sup>[44]</sup>. It leads to electron-hole pairs, resulting in trapped charges in the oxides and the interfaces in semiconductor devices. The following steps are summarized for TID <sup>[45-46]</sup>.

--Generation electron-hole pairs

- --Partial electron-hole pairs recombination
- --Carriers transport in the oxide

--Traps formation

2) DD

DD is the non-ionizing effect caused by energetic particles. Atoms are dislodged by the collision of the hitting particles. The collided atom is displaced from its original position, resulting in vacancy defects and interstitial defects. These defects can form cluster defects further <sup>[39]</sup>.

As semiconductors scale down, the oxide dielectric layers continuously shrink, the complementary-metal-oxide-semiconductor (CMOS) technologies turn more resilient to cumulative effects. Contrarily, reliability problems caused by SEE become more rigorous <sup>[47]</sup>.

3) SEE

An energetic particle passes through the semiconductor, and it can directly or indirectly deposit energy and generate electron-hole pairs along its trajectory <sup>[48]</sup>. For example, Figure 1-8 depicts an energetic particle's direct and indirect mechanisms generating electron-hole pairs in a MOS by heavy ion and neutron, respectively.



(a) Direct generating by heavy ion (b) Indirect generating by a neutron

Figure 1-8 The mechanisms of direct and indirect generate electron-hole pairs

Following processes, such as recombination, drift, and diffusion, carriers drift, or diffuse to opposite polarity under the intense electric-filed, the charge collection and a pulse current appear at the node. SEE emerges if the collected charge exceeds the critical charge, which is the minimum amount charge forcing node state change <sup>[49]</sup>.

Figure 1-9 shows a brief schematic of SEE in a complicated SoC. When an energetic particle strikes the cell nodes, it can deposit energy along the trajectory. It can also generate electron-hole pairs along the track and cause a glitch at the node, namely a single event transient (SET). Then, if the pulse of the glitch is wide enough or captured by a memory cell, it possibly leads to datum change. Under this case, the single event upset occurs. Subsequently, if the processor uses the changed datum, it can cause results to error even SoC function fails.



Figure 1-9 Schematic of SEE in MPSoC

SEE may be non-destructive or destructive to the target electronics. The non-destructive and destructive effects are also named soft and hard errors. The soft errors are transient ones and can be recovered or processed. In comparison, the hard errors are permanent and result in the device being unavailable. Figure 1-10 describes the detail of different kinds of SEEs. For example, the SEU and single event functional interruption (SEFI) are soft errors, while the single event burnout (SEB) and single event gate rupture (SEGR) are hard errors.

For the SEU, according to the upset bit information, it can be the single bit upset (SBU), multi-bit upset (MBU), or multi-cell upset (MCU). What's more important is the MBU and MCU are turned out to be more serious as the semiconductor technology scaled down.

This work is dedicated to nanoscale COTS SoCs SEEs' reliability evaluations. To date, some efforts have also been conducted on this issue by other researchers.





Figure 1-10 Details of different kinds of SEEs [50]

## 1.2 Research Status of Nanoscale SoC SEE

It was a tremendous evolution that nanoscale SoC integrates the field-programmable gate array (FPGA) and ARM processor <sup>[51]</sup>. This evolution attracts industrial and academic interest, especially from the harsh environment applications, since the related products are released <sup>[52]</sup>. That's why the SEEs reliability assessment on the newer Xilinx COTS 28nm CMOS SoC and 16nm FinFET MPSoC continuously updates to now.

## 1.2.1 SEE Research on Xilinx 28nm CMOS SoC

For the SEE evaluations on Xilinx 28nm CMOS SoC, the effort involves SEE tests and mitigation techniques.

Austin set out to quantify the soft error rate (SER) of a COTS multi-core microprocessor SoC produced by Xilinx for the first time. And a 64MeV proton beam was used to measure the SEU susceptibility of the Xilinx Zynq processor sub-system <sup>[53]</sup>. It laid the foundation for the later SEE tests on nanoscale COTS SoCs. Giovanni discussed the temperature influence on atmospheric neutron inducing SER on Xilinx Zynq programmable logic sub-system <sup>[54]</sup>. Regarding SEE on Xilinx 28nm Zyng-7000 SoC, other researchers also conducted various tests and analyses. For example, Lucas performed multiple SEE tests based on various particles and designs [55-57]. Specifically, heavy ions and protons were adopted to examine SEE sensitivity under the condition of supply voltage and temperature variations <sup>[55]</sup>. Different memory organizations' SEE sensitivity was compared [56]. Trade-offs among different HLSbased designs' performance and reliability were analyzed <sup>[57]</sup>. Gennaro presented an analysis of traditional fault tolerance on parallel and Linux systems <sup>[58]</sup>. The reliability of a convolutional neural network implementation was discussed in [59]. Fault injections on 13 benchmarks were executed on Gem5<sup>[60]</sup>. Mehran measured heavy ion inducing multiple blocks SEE cross sections <sup>[61]</sup>. Libano proposed to distinguish critical and tolerable errors in artificial neural networks <sup>[62]</sup>. Mostafa investigated the delay changes of a routing network in heavy ion irradiation [63]. Vasileios characterized SEE vulnerability using very/ultra highenergy heavy ions <sup>[64]</sup>. David estimated the space SER based on the proton irradiation <sup>[65]</sup>.

Eduardo proposed software-implemented hardware fault tolerance techniques, simulation and heavy ion radiations were applied to verify the performance <sup>[66]</sup>. A generic model was presented to compute an implementation SEU sensitivity in [67]. A compiler-assisted software fault tolerance tool was developed, and the hardening performance was also examined <sup>[68-69]</sup>. A hybrid scrubber was built-in software to scrub configurations in [70]. Adria applied a dualcore lockstep design to mitigate soft errors <sup>[71]</sup>. Igor updated the bitstream-based SEU emulators and proposed a mathematical model <sup>[72-73]</sup>. Aaron presented a novel form of high speed internal processor configuration access port (PCAP) configuration port scrubbing strategy <sup>[74]</sup>. Farah designed a lightweight and fully testable SEU mitigation system to repair flips in configuration <sup>[75]</sup>. Ludovica reported a self rerouting and dynamically reconfiguration technique <sup>[76]</sup>.

Apart from Xilinx 28nm CMOS SoC, SEEs on similar devices from other vendors are also be examined. For instance, SEEs on Microsemi SoC were evaluated in neutron beam <sup>[77]</sup>. In [78], the authors investigated how the configuration of the processing system influences the reliability of the SmartFusion2' SoC.

Concerning studies of SEE on Xilinx 28nm CMOS SoC in China, efforts have also been made. Besides the alpha, proton radiation tests, Du analyzed the SoC reliability in the probability safety analysis (PSA) method, too <sup>[79-82]</sup>. In [79], seven hardware blocks' SEE susceptibilities of the SoC were investigated. In [80], low-energy proton beams were utilized to measure blocks' SEE vulnerability. At the same, fault injection and PSA were also applied in SoC sensitivity analysis based on the obtained irradiation results <sup>[81-82]</sup>. Liu observed the SEE sensitivity based on laser irradiation <sup>[83]</sup>. Microbeam irradiation was applied to investigate SEE sensitivity locations in [84]. Wu analyzed the SEE vulnerability using Soft Error Mitigation (SEM) IP <sup>[85]</sup>. Cui hardened SEU through dual-core mutual-check and recovery mechanisms <sup>[86]</sup>. A direct memory access (DMA) channel-redundant hardening method was proposed to enhance the reliability of DMA against soft errors <sup>[87]</sup>.

In general, these researches include five categories. The first one is the SEE sensitivity test on blocks or elements of the SoC directly in normal conditions using different accelerator irradiation. While the second is the SEE test in different operation conditions, for example, in different supply voltages or temperatures. The third examines SEE vulnerability in different application workloads, such as convolutional neural networks. And the fourth is software-based fault injections. Meanwhile, the last one is different SEE mitigating techniques relying on various strategies. Even though these efforts get some results, they are not comprehensive. For example, the particle energy is limited in the SEE test. Some proposed measures are only examined using the software. It's necessary to do further system-level SEE research on 28nm CMOS SoC.

#### 1.2.2 SEE Research on Xilinx 16nm FinFET MPSoC

Compared with the Xilinx 28nm CMOS SoC, the 16nm FinFET MPSoC integrates more components and enjoys a higher performance. The FinFET process is different from the CMOS, and researchers are also interested in how the SEE vulnerability is different from that

of the 28nm SoC.

In [88], the 1<sup>st</sup> Xilinx 16nm FinFET processor SEE results were presented, and SEEs were examined with neutrons, 64 MeV protons, and thermal neutrons. In addition. Christian implemented a fault-tolerant MPSoC for small satellites <sup>[89]</sup>. In [90], SEU reliability of neural networks was investigated with mitigation techniques against upsets for two case studies. Oscar presented a methodology to quantify multi metrics to SEE <sup>[91]</sup>. Additionally, three neutron beam tests were performed to characterize the SEE in [92]. David investigated SEE cross sections in proton beams and estimated the SER in space radiation <sup>[93]</sup>. Heavy-ion and neutron induced single event latch-up and SEU events were investigated in [94]. Maximilien observed the SEU induced by ultra-high energy heavy ion irradiation <sup>[95]</sup>. Pierre presented a test methodology using the Xilinx system validation tool (SVT) design suite to characterize SEE <sup>[96]</sup>. Philip examined the SEL and SEU susceptibility in proton irradiation <sup>[97]</sup>. The SEU response to SEM IP was investigated using 64MeV mono-energetic proton irradiation <sup>[98]</sup>.

The nanoscale COTS SoCs are rather complicated, and they can be applied in diverse circumstances and encounter various SEEs. Although some studies have been performed about SEEs, many questions are still not solved, and efforts need to be done further.

This study mainly focuses on SEE evaluations on two nanoscale COTS SoCs: the Xilinx 28nm CMOS SoC and 16nm FinFET Ultrascale+ MPSoC. Various irradiation tests, software simulations, fault injections, and analysis methods are adopted.

## 1.3 Layout of the Dissertation

This dissertation takes two nanoscale COTS SoCs as the study objects based on the introduction and efforts aforementioned. It presents SEE evaluations on them taking advantage of various solutions. According to the research objects and assessment methods, this dissertation is divided into ten chapters, and the main research contents of each chapter are as follows:

The 1<sup>st</sup> chapter is the preface. It introduces the COTS SoC development and application, harsh environment and radiation effects, and SEE research status on nanoscale SoCs.

The 2<sup>nd</sup> chapter is the SoC SEE test methodology. It briefs the two target nanoscale COTS SoCs and the used test methodologies in this article. For the SEE on Xilinx 28nm CMOS SoC, the adopted study methodologies are mainly irradiation tests and Monte Carlo simulations. While for the SEE on Xilinx 16nm FinFET MPSoC, the research methods are primarily involved fault injections and probability safety analyses.

The 3<sup>rd</sup> chapter is the proton SEE on Xilinx 28nm CMOS SoC. It introduces the 70 and 90 MeV proton beams' SEE irradiation tests and the Monte Carlo simulations on the chip.

The 4<sup>th</sup> chapter is the atmospheric neutron SEE on Xilinx 28nm CMOS SoC. It describes multi SEE irradiation tests on the SoC using the China spallation neutron source and points out the SEE contributions from different energy range neutrons, especially the contribution from 1MeV and thermal neutrons.

The 5<sup>th</sup> chapter is the multi patterns SEE on Xilinx 28nm CMOS SoC. It implements multi patterns in the SoC and examines the SEE sensitivities of different patterns using heavy

ion irradiations.

The 6<sup>th</sup> chapter is the single event effect hardening by multi-layer design. It proposes the multi-layer design to immune SEE on the Xilinx 28nm CMOS SoC and verifies the performance of the design taking advantage of proton irradiations.

The 7<sup>th</sup> chapter is the SEM-based FI and FTA on Xilinx 16nm FinFET MPSoC. It involves the fault injections on the MPSoC based on SEM IP. Meanwhile, it analyzes the fault injection results using fault tree analysis (FTA) and figures out the SEE sensitivity of each tested algorithm and SEM subsystem.

The 8<sup>th</sup> chapter is the DPR-based FI and FMEA on Xilinx 16nm FinFET MPSoC. It implements two DPR designs on the MPSoC and performs fault injections in the full and partial bitstreams. At the same time, the failure modes and effects analysis (FMEA) method is employed to analyze the obtained fault injection results in DPR fault injection, too. The SEE severity series of modules and errors are analyzed.

The 9<sup>th</sup> chapter is the DR-based FI on DNN in Xilinx 16nm FinFET MPSoC. It implants an open-source DNN on the MPSoC. Then, fault injection based on DR is executed to observe the performance of the DNN. And a solution is proposed in improving the DNN performance implemented on SRAM-based MPSoCs.

The 10<sup>th</sup> chapter is the conclusions and suggestions. It concludes the research findings of this dissertation and provides some suggestions for future studies.

# 2 SEE Evaluation on SoCs

As mentioned above, two typical nanoscale COTS SoCs were tested in this study. One is manufactured with 28nm CMOS technology, and the other is the 16 nm FinFET technology. Aiming at SEEs on the two SoCs, various irradiation tests, hardening designs, fault injections, and analysis methodologies are executed and verified.

# 2.1 SEE Evaluation on Xilinx 28nm CMOS SoC

## 2.1.1 Xilinx Zynq-7000 SoC

Xilinx Zynq-7000 SoC is an all programmable architecture SoC. It is built on state-ofthe-art, low power, high performance, 28nm, high-k metal gate (HKMG), and CMOS technology. This series of products embed a dual-core ARM® Cortex<sup>TM</sup>-A9 processor based processing system (PS) and programmable logic (PL) parts in a single die. Besides the heart processors, PS also includes the on-chip memory (OCM), Data/Instruction Cache, other memory interfaces, and plenty of peripherals. What's more, a flexible and scalable FPGA locates in the PL. Between the PS and PL, various buses provide communication. This SoC can serve the following applications: automotive driving, industrial control, smart camera, medical imaging, and others <sup>[99]</sup>.

Figure 2-1 draws the diagram of the Xilinx 28nm CMOS SoC [100].



Figure 2-1 Diagram of the Xilinx 28nm CMOS SoC<sup>[100]</sup>

Memory blocks, for example, the OCM and Cache in PS, and the block random access memory (BRAM) in the PL, are critical components of the SoC. Their vulnerabilities in different radiation environments significantly influence the reliability of the SoC. For the Xilinx 28nm CMOS SoC, SEEs on memory blocks were evaluated and analyzed in multiple irradiation sources.

## 2.1.2 Tested Block

SEE on OCM, D-Cache, BRAM, and other memory blocks were studied in different conditions. In other blocks' tests, D-Cache is disabled. In some irradiations, several blocks are tested using one irradiation source. But, limited by the accelerator hours, a separate block is tested only in some irradiation tests.

1) OCM Block

The OCM block contains 256 KB random access memory (RAM) and 128 KB read-only memory (ROM) (BootROM). It supports two advanced extensible interface (AXI) slave ports (64-bit). One is dedicated to the central processing unit (CPU) access through the application process unit (APU) snoop control unit (SCU), and the other is shared by other masters within the PS and PL. The boot process uses the BootROM memory and is not visible to users <sup>[99]</sup>.

During the irradiation, SEU examination on OCM is executed with the following operations. Firstly, a data pattern, for instance, 0xA5A5A5A5, 0x5A5A5A5A, 0xFFFFFFF, or others, is written into the target range, and then data are read back. Finally, the processor compares the read-back with the expected to determine SEU occurrence.

#### 2) D-Cache Block

Each of the Cortex-A9 processors has a separate 32 KB L1 instruction and data Cache. And the L1 data Cache (D-Cache) plays the role of holding data that the Cortex-A9 processor uses. Key features of the L1 D-Cache are the following, for example, physically indexed and tagged, supporting two 32-byte line-fill buffers and one 32-byte eviction buffer <sup>[99]</sup>.

The data pattern, for instance, 0xA5A5A5A5, 0x5A5A5A5A, 0xFFFFFFFF, or others, is written into the target range of D-Cache. After the operations, such as flushing, writing, and invalidating the corresponding ranges, the host determines whether SEEs emerge during the irradiation.

## 3) BRAM Block

The BRAM is an important part of the PL. It locates in the PL, storing up to 36 KB of data. It can be configured as either two independent 18 KB RAMs or a sole 36 KB RAM. The writing and reading are synchronous operations for BRAM <sup>[99]</sup>.

PS is responsible for writing and reading BRAM data via the AXI interface. During the irradiation, the data pattern, for instance, 0xA5A5A5A5, 0x5A5A5A5A, 0xFFFFFFFF, or others, is written into the target range, and then data are read back. Finally, the processor compares the read back one with the expected to determine SEU.

#### 2.1.3 Test System

The test system is composed of the host and tested device. The host is in charge of sending instructions and recording messages in real-time in a terminal. The tested device is the Xilinx 28nm CMOS SoC. The SoC is irradiated by particles during irradiation, and messages displayed on the terminal indicate the SEE occurrences on the SoC. The fiber universal serial bus (USB) cable provides communication between the host and SoC through the device's UART interface. Usually, the following information is required for the universal asynchronous receiver-transmitter (UART) communication. They include the communication port, baud rate,

parity bit, data with, and stop bit. Figure 2-2 shows the simplified architecture of the test system.



Figure 2-2 The architecture of the test system. Tested blocks are visible in the device block

## 2.1.4 SEE Test Facilities in China

The irradiation test is highly effective in checking the SoC's SEE vulnerabilities. To date, diverse advanced particle accelerators are available in China. They involve beams of heavy ion, proton, atmospheric neutron, and electron. Among them, some have served in electronic systems irradiation tests for decades. For example, the heavy ion 13 (HI-13) accelerator in the National Innovation Center of Radiation Application (NICRA), China Institution Atomic Energy (CIAE), was commissioned in the 1980s <sup>[101]</sup>. Several particle parameters of the HI-13 are shown in Table 2-1. The Heavy Ion Research Facility in Lanzhou (HIRFL) was another crucial accelerator in the same period <sup>[102]</sup>. In addition, some are put into application in the latest years, such as the China Spallation Neutron Source (CSNS) and the China Institute of Atomic Energy 100 MeV proton cyclotron (CY CIAE-100) <sup>[103-104]</sup>. These facilities provide accelerator beams in SEE tests in China, and these accelerators are utilized in the Xilinx 28nm CMOS SoC SEE evaluations.

| Particle           | Energy/MeV | Surface LET<br>/MeV·cm <sup>2</sup> ·mg <sup>-1</sup> | Ranges in silicon/µm |  |
|--------------------|------------|-------------------------------------------------------|----------------------|--|
| $^{1}\mathrm{H}$   | 23         | 0.018                                                 | 3060                 |  |
| $^{12}C$           | 80         | 1.73                                                  | 127.1                |  |
| <sup>16</sup> O    | 104        | 3.03                                                  | 100.8                |  |
| <sup>19</sup> F    | 104        | 4.33                                                  | 76.6                 |  |
| <sup>28</sup> Si   | 126        | 9.6                                                   | 46.6                 |  |
| <sup>35</sup> Cl   | 138        | 13.9                                                  | 38.9                 |  |
| <sup>48</sup> Ti   | 149        | 22.6                                                  | 30.8                 |  |
| <sup>63</sup> Cu   | 161        | 33.4                                                  | 26.4                 |  |
| <sup>79</sup> Br   | 210        | 42                                                    | 29.4                 |  |
| $^{127}\mathrm{I}$ | 238        | 62.8                                                  | 27                   |  |

Table 2-1 The frequently used particles in the HI-13<sup>[105]</sup>

## 2.1.5 Monte Carlo Simulation

The Monte Carlo simulation helps to further analyze the investigated SEEs during irradiation tests. Especially, the physics-based Monte Carlo simulation tools gain much interest

and popularity in radiation effects simulation. Currently, software tools, such as Geant4, CREME96, and others, are widely used in devices' SEE simulations and evaluations. These simulations usually involve the devices' sensitive volumes and critical charges <sup>[106-108]</sup>. Figure 2-3 shows the SEE simulation workflow in these tools.



Figure 2-3 Monte Carlo simulation workflow

The vertical structure information of the Xilinx 28nm CMOS SoC is extracted to construct the simulation model. Figure 2-4 displays a cut-in photo of its vertical structure and the detail of each layer.



Figure 2-4 Photo of the extracted vertical cut-in of the Xilinx Zynq-7000 SoC

## 2.1.6 SEE Hardening

The SEE vulnerability of each tested block is investigated from the irradiation tests. In

some missions, the raw device's reliability cannot meet requirements, and the SEE hardening is necessary according to the mission duration and operation environment. For the established nanoscale COTS SoC, it's unfeasible to immune SEE via radiation hardening by process (RHBP) <sup>[109]</sup>. And that's why more efforts focus on system-level hardening techniques regarding SoC SEEs.

SEE hardening usually depends on the concept of redundancy, and redundancy can be achieved in hardware, software, information, or time <sup>[110]</sup>. Hardware redundancy incorporates replicated hardware or designs. Software redundancy is reached via checkpoint and recovery or other techniques. Information redundancy is executed by operations, such as error detection and correction (EDAC), cyclic redundancy check (CRC), etc. Time redundancy is repeating the same program execution <sup>[111]</sup>. Figure 2-5 presents the common triple modular redundancy (TMR), which takes the OCM in the SoC as an example. The TMR technique costs a large number of resources. So, some optimal TMR designs are also introduced. In [112], Luis proposed the automated design flow in implementing TMR. The author designed the partial TMR in [113].



Figure 2-5 The architecture of the triple modular redundancy implementation on OCM

The Xilinx 28nm CMOS SoC integrates the dual-ARM cores and FPGA inner the chip, and this characteristic provides more solutions for SoC hardening. For example, the hardening solutions depend on dual-core lockstep (DCLS), symmetric or asymmetric multiprocessing (SMP or AMP) mode. In [114], the DCLS was applied in the SoC, and the heavy ions' irradiation and fault injection were conducted to check the hardening performance. In [115], a novel triple-core lockstep (TCLS) approach was presented, incorporating the software level mitigation measures.

In general, in the current dissertation, multi irradiation tests involving proton, neutron, and heavy ion, are performed on the Xilinx 28nm CMOS SoC to evaluate SEEs in different radiation environments. Then, Monte Carlo simulations provide more detail to analyze the investigated phenomenon in irradiations. In addition, an AMP-based hardening solution is proposed and verified. Figure 2-6 outlines the research contents of SEE studies on Xilinx 28nm CMOS SoC in this work. It can be seen for each kind of particle, accelerator irradiation test and Monte Carlo simulation are performed. Moreover, SEE hardening performance is also examined using medium energy proton irradiation. The failure rate in space is also estimated, relying on the heavy ion irradiation results.



Figure 2-6 The SEE study schematic of this work on Xilinx 28nm CMOS SoC

Unlike the SEE evaluation on Xilinx 28nm CMOS SoC, the effort mainly focuses on the PS part and irradiation test. For the Xilinx 16nm FinFET MPSoC, SEEs are evaluated and analyzed on several specific applications primarily implemented in the PL section taking advantage of fault injection and probability safety analysis methods.

## 2.2 SEE Evaluation on Xilinx 16nm FinFET MPSoC

The Xilinx Zynq Ultrascale+ MPSoC is the first application-specific integrated circuit class programmable architecture. That enables multi-hundred gigabit per second levels of system performance with smart processing while efficiently routing and processing data on a chip <sup>[116]</sup>. The MPSoC is manufactured with the high-performance 16nm FinFET+ technology. The device retains a 2X increment compared with the planar device's watt and performance <sup>[117]</sup>. Moreover, the Xilinx 16nm FinFET MPSoC extends the processor scalability from 32 to 64 bits. And a 64-bit quad-core ARM® Cortex®-A53 processor, a 32-bit dual-core ARM Cortex-R5 real-time processor, and an ARM® Mali<sup>TM</sup>-400MP are integrated inner the processing system. It strongly supports hardware virtualization, graphics acceleration, video processing, waveform and packet processing, advanced power management, etc. <sup>[117]</sup>. Figure 2-7 displays the diagram of the Xilinx Zynq Ultrascale+ MPSoC.



Figure 2-7 Diagram of the Xilinx 16nm FinFET MPSoC<sup>[100]</sup>

## 2.2.1 Test Benchmarks

The image processing algorithms, such as edge processing and deep neural network (DNN), are prosperously extended in advanced SoC applications <sup>[118-120]</sup>. Their reliabilities are critical in some applications, for instance, self-driving. To explore SEE sensitivities on the Xilinx 16nm FinFET MPSoC in these applications, multiple image processing test benchmarks are designed, and fault injections are performed.

The image processing involves multiple algorithms, such as Histogram, Stretch, Sobel, Gaussian, etc. The DNN algorithm achieves handwritten digit identification. All of them are implemented in the PL parts in the Xilinx 16nm FinFET MPSoC. These algorithm designs can be implemented as an IP introduced in the block design. Or, they can be directly added to the block design as a separate block. The Vivado and Vitis 2019.2 software are employed in the designs <sup>[121]</sup>.

In Vivado 2019.2, the block design is built firstly. After the operations, such as synthesis, implementation, bitstream files are generated. The bitstreams and related files are the basements of fault injections. In Vitis 2019.2, programs are created, and necessary settings must also be done. For instance, in the SEM-based fault injection, the PL configuration logic interface should be transferred to the internal configuration access port (ICAP), and it's achieved via clearing the configuration security unit (CSU) pcap ctrl register.

## 2.2.2 Fault Injection Implementations

Fault injection (FI) is an effective and feasible way to explore MPSoC's reliability. The FI platforms can be reached in several ways, spanning hardware-based to software-based <sup>[122]</sup>. It should be pointed out the vulnerability analysis of designs is prior to the expensive accelerator particle striking in SEE assessments and checking the hardening solutions' performance.

The application and configuration layers are two abstractions in FPGA. Figure 2-8 displays the abstracted layers in FPGA. The IO block (IOB), BRAM, DSP, configuration logic block (CLB), switch box (SB), and others are the hardware resources in the applications layer <sup>[122]</sup>. The configuration layer contains the configuration memory, in which different frames correspond to different hardware resources. If the bitstream in configuration memory encounters an energetic particle, the bit information may be flipped, and the function of the design may fail. Via modifying the bit information in the configuration random access memory (CRAM), it mimics the SEE in the bitstream and can investigate the consequences of SEE in CRAM.



Figure 2-8 Abstracted layers in FPGA, including the application and configuration layers

Bitstream lengths for different series devices vary, table 2-2 lists several devices' CRAM configuration parameters. It can be seen the least bitstream length is 44,549,344 bits, which means that FI on the entire bitstreams is extremely time-consuming. Therefore, the fault injection is usually randomly executed.

| Device | Configuration<br>Bitstream Length/bit | Configuration<br>Frames | Words per Frame |
|--------|---------------------------------------|-------------------------|-----------------|
| XCZU2  | 44,549,344                            | 14,964                  | 93              |
| XCZU3  | 44,549,344                            | 14,964                  | 93              |
| XCZU4  | 61,269,888                            | 20,956                  | 93              |
| XCZU5  | 61,269,888                            | 20,956                  | 93              |
| XCZU6  | 212,086,240                           | 71,260                  | 93              |
| XCZU7  | 154,488,736                           | 51,906                  | 93              |
| XCZU9  | 212,086,240                           | 71,260                  | 93              |

Table 2-2 Configuration bitstream parameters of different devices [123]

To keep the efficiency of random fault injection, the essential bit of the target design is necessary. Essential bit is the bits that are essential to the proper operation of any specific design loaded into the device <sup>[124]</sup>. However, if an energetic particle corrupts an essential bit, the FPGA may be malfunction. The essential bit files can be generated using the following property setting in Vivado design <sup>[124]</sup>.

set\_property bitstream.seu.essentialbits yes [current\_design]

The essential bits data (EBD) and essential bits configuration (EBC) files are created with the above proper setting, and they are the files related to the essential bit. The '1' in the EBD file signifies the corresponding location is essential, and this location's bit in EBC is the essential bit information. The fault injection script can be created based on the EBD file in different solutions. For the target device in this work, the pad word length is 118 words for the EBD and EBC files except for the header messages. Figure 2-9 (a) and (b) show the EBD and EBC file screenshot for a design.

| 📄 xx | . ebd🗵                                  | 🗎 xx | .ebc×                                   |
|------|-----------------------------------------|------|-----------------------------------------|
| 206  | 010010001010010000000000000000000000000 | 206  | 000000000000000000000000000000000000000 |
| 207  | 000000100001000000000000000000000000000 | 207  | 000000000000000000000000000000000000000 |
| 208  | 000000000010000100100010100100          | 208  | 000000000000000000000000000000000000000 |
| 209  | 010010001010010000000000000000000000000 | 209  | 000000000000000000000000000000000000000 |
| 210  | 000000100001000000000000000000000000000 | 210  | 000000000000000000000000000000000000000 |
| 211  | 0000000000100000100100010100100         | 211  | 000000000000000000000000000000000000000 |
| 212  | 000000000000000000000000000000000000000 | 212  | 000000000000000000000000000000000000000 |
| 213  | 000000000000000000000000000000000000000 | 213  | 000000000000000000000000000000000000000 |
| 214  | 000000000000000000000000000000000000000 | 214  | 000000000000000000000000000000000000000 |
| 215  | 10001000001001000010001100001000        | 215  | 000000000100000000000000000000000000000 |
| 216  | 10100011001010000000010000000000        | 216  | 000000000000000000000000000000000000000 |
| 217  | 00000100001000101000100000101100        | 217  | 000000000000000000000000000000000000000 |
|      | (a) EBD file screenshot                 |      | (b) EBC file screenshot                 |

Figure 2-9 Screenshot from the EBD and EBC file

The SEM controller and dynamic partial reconfiguration (DPR) are two solutions among various injection ways. They are employed in this study to execute FI on Xilinx 16nm FinFET MPSoC.

a) SEM-based Fault Injection

Fault injection using SEM IP is a common approach. SEM IP provided by Xilinx is

capable of injecting and mitigating errors in bitstreams <sup>[125]</sup>. Figure 2-10 depicts the picture of the SEM block in Vivado. Based on the essential bit script, SEM IP can inject one or more bits error in the frame via ICAP. What's more, it can mitigate SEE cooperating with the techniques such as error-correcting code (ECC) or CRC. The SEM controller can execute the following six modes, totally <sup>[124]</sup>. In the current study, injection and mitigation are required, so the chosen mode is mitigation and testing.

- Mitigation and Testing
- · Mitigation only
- Detect and Testing
- Detect only
- Emulation
- · Monitoring only



Figure 2-10 The SEM block in Vivado

During SEM fault injection, the essential bits are transferred into the frame, the smallest addressable cell in the bitstream, with the linear frame addresses (LFA). The length of LFA is 44 bits for the target devices, whose format is shown in Table 2-3.

|                                         | 18 | able $2-3$ | The format of LFA in Xilinx Zynq Ultrascale+ MPSoC |   |                    |            |           |  |
|-----------------------------------------|----|------------|----------------------------------------------------|---|--------------------|------------|-----------|--|
| #Bit 43-40 39-36 35-32 31-30 29-12 11-5 |    |            |                                                    |   | 11-5               | 4-0        |           |  |
| Content                                 | С  | 0          | 0                                                  | 0 | Linear frame index | Word index | Bit index |  |

 Table 2-3
 The format of LFA in Xilinx Zyng Ultrascale+ MPSoC

Even though the SEM IP is a low-cost solution to emulate and mitigate SEE in design, it fails if the fault is injected into the SEM corresponding frame. Hence, other ways are also adopted in fault injection, for instance, the DPR, DR.

b) DPR-based Fault Injection

The SRAM-based FPGA can be reprogrammed fully or partially many times, especially a region inner the FPGA can be flexibly reconfigured without disturbing other rest designs' operations in DPR <sup>[126-127]</sup>. It provides a way for fault injection based on DPR.

In DPR, a reconfigure block is designed in a reprogrammable region (RR), the full and partial bitstreams are generated finally. Then, the generated bitstreams are restored in secure

digital (SD) card and transmitted to double data read (DDR), via PCAP, they are loaded into CRAM implementing functions. Before loading the bitstreams from DDR to CRAM, fault can be injected by modifying the target bit.

It's different from the SEM-based fault injection, DPR fault injection doesn't require an extra block design. What's more important is it can indicate failures caused by full or partial bitstream upset. Moreover, Xilinx provides the library functions for PCAP transmitting bitstreams.

Compared with DPR, DR doesn't need to create the RR and generate partial bitstreams. It resembles a part of procedures in DPR. And the fault injection based on DR is similar to making fault injection in the full bitstream in DPR.

## 2.2.3 PSA Analysis

The fault injection campaign is capable of investigating SEEs while the CRAM bit flips. The injected bit may lead to SEM or customized design failure in SEM-based fault injection. And in the DPR-based fault injection, the effects from full or partial bitstreams' injections can be observed. Similarly, for the DR-based fault injection, SEE influences on the design can be viewed directly. In order to analyze the obtained results further, PSA is adopted in some fault injections.

Probabilistic safety analysis is an effective approach to evaluate the reliability and safety of complex systems. It's extensively applied in nuclear power plants and spacecraft <sup>[128-129]</sup>. The analysis modes, i. e. event tree analysis (ETA), FMEA, and FTA, are several standard commonly employed modules in PSA <sup>[130]</sup>. In this work, some modules are utilized to analyze the obtained results, for example, FTA and FMEA.

Regarding the Xilinx 16nm FinFET MPSoC, in this work, several test benchmarks are designed. Then, fault injection is executed depending on SEM, DPR, and DR, respectively. Last but not least, the probability safety analysis method is applied to analyze some investigated results in fault injections. Meanwhile, a solution to improve DNN performance based on DR fault injection is proposed, too. Figure 2-11 presents the research architecture on Xilinx 16nm FinFET MPSoC.



Figure 2-11 The research architecture on Xilinx 16nm FinFET MPSoC

On the whole, this study focuses SEE assessments on the Xilinx 28nm CMOS SoC and 16nm FinFET MPSoC. The proton, neutron, and heavy ion irradiations as well as different kinds of fault injections are employed to evaluate SEE vulnerability under different conditions. Figure 2-12 describes the key research roadmap furthermore of the dissertation.

#### 2 SEE Evaluation on SoCs



Figure 2-12 The key research roadmap of the dissertation

# 2.3 Summary

It briefly introduced the research sketch of SEE evaluation on two series of SoCs: Xilinx Zynq-7000 SoC (28nm CMOS SoC) and Xilinx Zynq Ultrascale+ MPSoC (16nm FinFET MPSoC). For each SoC, the utilized test and evaluation methodologies are introduced. The SEE evaluations for Xilinx 28nm CMOS SoC are mainly about irradiation tests and Monte Carlo simulation. While for the Xilinx 16nm FinFET MPSoC, the SEE evaluations primarily involve fault injection and PSA analysis.

# 3 Proton SEE on Xilinx 28nm CMOS SoC

Energetic protons dominate the cosmic rays. As the semiconductor technology scales, it's inevitable to suffer SEE caused by energetic protons for nanoscale aerospace electronic systems. Therefore, SEE evaluations induced by energetic protons becomes extremely necessary. This chapter investigates 70 and 90 MeV protons inducing SEEs on Xilinx 28nm CMOS SoC at the CY CIAE-100 platform, the first medium energy proton SEE test terminal in China <sup>[104]</sup>. At the same time, the Geant4 and CREME-MC Monte Carlo simulations are also employed to analyze the mechanisms further of the investigated SEE.

## 3.1 SEE Induced by Proton

SEE induced by proton comes from two mechanisms. One is the direct ionization, and the other is the nuclear reaction. Recent research indicated low energy protons could induce SEE in nanoscale devices by direct ionization <sup>[131-134]</sup>, as the linear energy transfer (LET) of low-energy protons is greater than the devices' LET thresholds. The LET of the proton is shown in Figure 3-1 from stopping and range of ions in matter (SRIM) <sup>[135]</sup>. It can be seen in Figure 3-1, the LET of medium energy (20-100 MeV) proton is less than that of the low energy approximately two orders of magnitudes. It means the medium energy proton is impossible to lead to SEE through direct ionizing, only depending on the nuclear reaction.



Figure 3-1 The LET of different energy protons

## 3.2 Proton Irradiation Setup

## 3.2.1 Proton Beam Terminal

The proton beam is produced by the cyclotron accelerator. Then, the beam is extracted from the tube. Before hitting the device under test (DUT), the particle energy is adjusted by the energy-depletion film (EDF). At the same time, the collimator is utilized to regulate the beam collimation and spot area. The proton flux is monitored by the secondary-electron emission monitor (SEEM) and faraday tube (FT). Figure 3-2 (a) shows the terminal layout of the CY CIAE-100 platform <sup>[104]</sup>. And Figure 3-2 (b) is the photo of the test site layout.



Figure 3-2 Terminal layout of the CY CIAE-100 platform

The DUT is mounted on the platform sample holder, and the holder can move in different directions: up, down, left, and right. During irradiation, different regions of the DUT can be irradiated by moving the holder directly without disturbing the beam spot. The available beam size spans from  $1 \times 1$  to  $10 \times 10$  cm<sup>2</sup>. The proton energy varies from 30 to 100 MeV, and the flux is adjustable in the range of  $10^5$  to  $10^{12}$  p·cm<sup>-2</sup>·s<sup>-1</sup>.

## 3.2.2 Test Layout

The irradiation campaign involves three rooms, including the control room, the SEE room, and the irradiation room. The programmable power remotely supplies the voltage of the test board in the control room, meanwhile, the current on the tested board is visible through the power. And the host computer in the control room also remotely communicates with the DUT. The platform holding the DUT stands in the irradiation room. Between them, the SEE room provides the transit connection. Figure 3-3 presents the schematic of the test layout in irradiation.



Figure 3-3 The schematic of the irradiation connection

## 3.2.3 Test Implementation

The DUT is the Xilinx 28nm CMOS SoC. The tested block is the OCM block. The chip size is about 1.8 cm×1.8 cm, and the beam covers the chip during the irradiation. Two test boards are mounted on the holder. One is irradiated with the 90 MeV proton, and the 70 MeV proton beam strikes another. 64 KB OCM were tested dynamically with the check pattern of 0xA5A5A5A5. Information including flipped addresses and bits are recorded in real-time once

| Table 3-1         | The parameters of two irradiations       |                            |
|-------------------|------------------------------------------|----------------------------|
| Proton Energy/MeV | Flux/p·cm <sup>-2</sup> ·s <sup>-1</sup> | Fluence/p·cm <sup>-2</sup> |
| 90                | $1.3 \times 10^{8}$                      | $1.0 \times 10^{11}$       |
| 70                | $2.3 \times 10^{8}$                      | 1.0×10 <sup>11</sup>       |

a SEE event is investigated.

Table 3-1 lists the flux and fluence of both irradiations. It can be seen the cumulative fluence  $1.0 \times 10^{11}$  cm<sup>-2</sup> in two irradiations. And which is the same in the two tests, although a little discrepancy exists in the fluxes.

## 3.3 Irradiation Results and Analysis

#### 3.3.1 Proton Irradiation Results

During both irradiations, SEU and SEFI events are detected. In the 90 MeV irradiation test, 143 SEU and 7 SEFI events are observed. And 118 SEU and 6 SEFI events were observed during the 70 MeV irradiation test. The SEUs contain SBU, DCU, and MCU (number of upset cells more than 2) events.

Besides the detected SBU and DCU events, the MCUs include 3, 4, 5, and 9 cell upsets in the 90 MeV irradiation test. Table 3-2 presents the details of SEE. It can be seen SBU dominates the events. Although the MCU number is once for 4, 5, and 9 cells upset, the fact, especially the detected nine-cell upsets, illustrates the device is vulnerable to medium energy proton.

|       | Ta  | ble 3-2 | Upset events in t | he 90 MeV proto | n irradiation |               |
|-------|-----|---------|-------------------|-----------------|---------------|---------------|
|       | SBU | DCU     | 3 cell upsets     | 4 cell upsets   | 5 cell upsets | 9 cell upsets |
| Total | 102 | 27      | 11                | 1               | 1             | 1             |
| 0→1   | 54  | 12      | 4                 | 0               | 1             | 1             |
| 1→0   | 48  | 15      | 7                 | 1               | 0             | 0             |

T11 22 II ( (1 00 M M ( ) ) )

SEEs of the 70 MeV irradiation test are shown in Table 3-3. It can be seen the majority of the event is SBU, too. It's different from the MCU in the 90 MeV irradiation test, the maximum MCU is the 4 cell upsets in the 70 MeV irradiation test while it occurs not once only.

 Table 3-3
 SEE in the 70 MeV proton irradiation

|       |     |     | / · · · · · · · · · · · · · · · · · |               |
|-------|-----|-----|-------------------------------------|---------------|
|       | SBU | DCU | 3 cell upsets                       | 4 cell upsets |
| Total | 88  | 18  | 8                                   | 3             |
| 0→1   | 46  | 10  | 4                                   | 1             |
| 1→0   | 42  | 8   | 4                                   | 2             |

In two tables, numbers of  $0 \rightarrow 1$  and  $1 \rightarrow 0$  upsets for each kind of SEE are also presented. Apart from SBU, the numbers of the DCU are larger than that of other MCUs in both irradiations. Therefore, the  $0 \rightarrow 1$  and  $1 \rightarrow 0$  upset ratios of SBU and DCU in both irradiations are compared. Table 3-4 shows the ratios. Since the upset numbers of SBU are larger than those of DCU, the ratios of  $0 \rightarrow 1$  and  $1 \rightarrow 0$  upset for SBU are closer to 50%. This fact verifies that the  $0 \rightarrow 1$  and  $1 \rightarrow 0$  upset ratio is almost 50% for the unhardened nanoscale COTS SoC since the SRAM is the symmetric six transistors structure <sup>[136]</sup>.

| 1401        | $C J \rightarrow Katlos of C U \rightarrow$ | T and $1 \rightarrow 0$ upset I | of SDU and DCU                 |        |  |
|-------------|---------------------------------------------|---------------------------------|--------------------------------|--------|--|
| Lingst syst | The ratio                                   | of $0 \rightarrow 1$            | The ratio of $1 \rightarrow 0$ |        |  |
| Upset event | 90 MeV                                      | 70 MeV                          | 90 MeV                         | 70 MeV |  |
| SBU         | 52.90%                                      | 52.20%                          | 47.10%                         | 47.80% |  |
| DCU         | 44.40%                                      | 55.60%                          | 55.60%                         | 44.40% |  |

Table 3-4 Ratios of e  $0 \rightarrow 1$  and  $1 \rightarrow 0$  upset for SBU and DCU

The cross section is an important metric to assess the SEE vulnerability of the target device. It contains bit and device cross section, and the bit one can be calculated with the formula (3-1). And the device cross section can be obtained from the formula (3-2).

$$\sigma_b = \frac{n}{\emptyset \times N_b} \tag{3-1}$$

$$\sigma_d = \frac{n}{\emptyset} \tag{3-2}$$

where  $\sigma_b$ --bit cross section/cm<sup>2</sup>·bit<sup>-1</sup>; *n*--SEE number; ø--cumulative fluence/cm<sup>-2</sup>; and  $N_b$ --total tested bits;  $\sigma_d$ --device cross section/cm<sup>2</sup>.

The SBU event numbers of 70 and 90 MeV proton irradiation tests are 88 and 102, respectively. The number of the tested bit is 524288 bits. Compared with the tested bit number, the number of SEE is small. Formula (3-3) is adopted to calculate the error of the bit cross section <sup>[50]</sup>. Compared with fluency, the SEE number is limited in terms of device cross section, and the error can also be obtained with formula (3-4) <sup>[137-138]</sup>. Based on (3-4), another way to obtain bit cross section deviation is (3-5).

$$\xi_b = \emptyset^{-1}\left(\frac{\alpha}{2}\right) \times \frac{1}{\sqrt{n}} \times \sqrt{\frac{N_b - n}{N_b - 1}} \tag{3-3}$$

where  $\xi_b$ —error in bit cross section;  $\phi^{-1}(\frac{\alpha}{2})$ --inverse cumulative standard normal distribution function; *n*--the number of detected SEE;  $N_b$ --the number of tested bits.

$$\xi_d = \frac{\sqrt{n}}{\Phi} \tag{3-4}$$

$$\xi_{db} = \frac{\sqrt{n}}{\Phi \times N_b} \tag{3-5}$$

where  $\xi_d$ —error in device cross section; *n*--number of detected SEE;  $\Phi$ --fluence value/cm<sup>-2</sup>;  $\xi_{db}$ —bit cross section error;  $N_b$ --the number of tested bits.

Figures 3-4 and 3-5 show the device and bit cross sections of the SBU events in both irradiations. For the 70 MeV irradiation tests, the device and bit cross sections are  $(8.80\pm0.94)\times10^{-10}$  cm<sup>2</sup>,  $(1.67\pm0.35)\times10^{-15}$  cm<sup>2</sup>·bit<sup>-1</sup>, respectively. They are  $(1.02\pm0.10)\times10^{-9}$  cm<sup>2</sup> and  $(1.95\pm0.38)\times10^{-15}$  cm<sup>2</sup>·bit<sup>-1</sup> for the 90 MeV irradiation test, respectively. In [134], 50 and 90 MeV proton inducing SEE on 28nm CMOS BRAM is also examined using the same facility. The investigated SEE cross sections are about  $(2\sim3)\times10^{-15}$  cm<sup>2</sup>·bit<sup>-1</sup> for the two proton

beams. The results are similar to the obtained one, and some discrepancy is because one tested block is the OCM and the other is the BRAM. The utilized proton beams' energies are 70 and 90 MeV, and the SEE events are mainly produced by secondary particles in nuclear reactions to depositing energy. For the 70 and 90 MeV protons, their capability of reacting with silicon atoms is almost close, so it's plausible that the cross sections are similar.



Figure 3-4 SBU device cross section of proton irradiation tests



Figure 3-5 SBU bit cross section of proton irradiation tests

In addition, the cumulative dose during proton irradiation can be obtained using formula (3-6) <sup>[139]</sup>. Table 3-5 shows the deposited dose in the irradiations. For the 70 and 90 MeV irradiation tests, the cumulative doses are 12.16 and 10.11 krad, respectively. Meanwhile, the current of the test board during irradiation is 0.35-0.37A without any obvious fluctuation. It can be speculated the tolerable dose of the Xilinx 28nm CMOS SoC is higher than 12.16 krad.

$$D = \mathscr{O} \times L_l \times 1.6 \times 10^{-5} \tag{3-6}$$

where *D*--deposited dose/rad;  $\emptyset$ --fluence/cm<sup>-2</sup>;  $L_l$ --LET/MeV·cm<sup>2</sup>·mg<sup>-1</sup>.

|            |                      | ned dose in two infadiations       |            |
|------------|----------------------|------------------------------------|------------|
| Test case  | ø/cm <sup>2</sup>    | $L_l/MeV \cdot cm^2 \cdot mg^{-1}$ | D/krad     |
| <br>70 MeV | 1.0×10 <sup>11</sup> | 0.0076                             | 12.16±0.12 |
| 90 MeV     | $1.0 \times 10^{11}$ | 0.00632                            | 10.11±0.10 |

 Table 3-5
 The deposited dose in two irradiations

Cross sections of DCU and MCU events are drawn in Figure 3-6. The DCU cross sections are  $(1.80\pm0.42)\times10^{-10}$  cm<sup>2</sup> and  $(2.70\pm0.52)\times10^{-10}$  cm<sup>2</sup> for the 70 and 90 MeV irradiation tests, respectively. And  $(8.00\pm2.83)\times10^{-11}$  cm<sup>2</sup> and  $(1.10\pm0.33)\times10^{-10}$  cm<sup>2</sup> for them at the 3 cell upsets, respectively. Compared with dual and 3 cell upsets, others' numbers are small, so they are not compared further. What's more, the SEFI cross sections for the 70 and 90 MeV irradiations tests are  $(7.00\pm2.65)\times10^{-11}$  cm<sup>2</sup> and  $(6.00\pm2.45)\times10^{-11}$  cm<sup>2</sup>, respectively.

From the obtained SEEs, it can be preliminarily drawn that 70 and 90 MeV protons cause SEEs on Xilinx 28nm CMOS SoC similarly.



Figure 3-6 The cross sections of multi-cell upset events

In [140], SEE in OCM was examined using the 18 MeV proton beam, but no other MCUs were detected. The obtained SBU bit cross section is  $8.00 \times 10^{-15}$  cm<sup>2</sup>·bit<sup>-1</sup>. Compared with that of 70 and 90 MeV from this work, it can be seen they are at the same level of magnitudes. However, that requires further effort to analyze the discrepancy between [140] and this work.

3.3.2 Monte Carlo Simulation Analysis

a) Geant4 Simulation

As a simulation toolkit, Geant4 can simulate particle passage information through matters. It's a hierarchical and modular structure toolkit, and more detail is presented in Figure 3-7. The key modules in Geant4 involve geometry, particle, processes, track and event management, and others <sup>[141-143]</sup>.

Geant4 (Version: Geant4.9.05) simulations are performed to discuss the secondary particles of 18, 70, and 90 MeV protons interacting with silicon. A 7  $\mu$ m×3  $\mu$ m×20  $\mu$ m target silicon detector is constructed, a total of 5×10<sup>6</sup> particles hit the target, and the elastic and inelastic processes are considered.

Table 3-6 summarizes the majority of secondary heavy ions of 18, 70, and 90 MeV protons interacting with silicon. It can be obtained <sup>27</sup>Al, and <sup>28</sup>Si are the majority of heavy ions in the 18 MeV simulation. However, six dominating heavy ions are detected in 70 and 90 MeV simulations.



Figure 3-7 The hierarchical and modular structure of Geant4<sup>[141]</sup>

| Table 3-6 Secondary particles of 18, 70, and 90MeV protons interacting with silicon |                  |                  |                 |                  |                  |                  |                  |                  |
|-------------------------------------------------------------------------------------|------------------|------------------|-----------------|------------------|------------------|------------------|------------------|------------------|
| Proton energy/MeV                                                                   | 1                | 8                |                 |                  | 70/9             | 90               |                  |                  |
| Secondary particle                                                                  | <sup>28</sup> Si | <sup>27</sup> Al | <sup>4</sup> He | <sup>23</sup> Na | <sup>24</sup> Mg | <sup>27</sup> Al | <sup>27</sup> Si | <sup>20</sup> Ne |

In addition, the intervals of LET and range in silicon are obtained in Geant4 simulations and drawn in Figures 3-8 and 3-9, respectively. In which, 70-Mg stands for the obtained Mg in 70 MeV simulation, others are similar.



Figure 3-8 LET intervals of secondary heavy ions in simulations



Figure 3-9 Ranges in silicon intervals of secondary heavy ions in Geant4 simulations

Figure 3-8 shows that the LET intervals are almost close for the 70 and 90 MeV proton simulations. In Figure 3-9, the maximum ranges for 90 MeV are larger than that of the 70 MeV simulation except for the Al ion. These details explain why the obtained SEEs in the irradiation tests are similar, but there are a few discrepancies.

b) CREME-MC Simulation

The CREME-MC developed by Vanderbilt University is a Geant4-based application. In the CREME-MC simulation, the multi-layer structure of the device can be constructed, and sensitive volume and critical charge are also required <sup>[144-146]</sup>.

In CREME-MC simulations, the built multi-layer structure is shown in Figure 3-10, and the thicknesses of each layer are extracted from reverse engineering in the vertical direction, as mentioned in Figure 2-4. The feature size of the constructed model is  $0.7 \,\mu$ m×0.3  $\mu$ m×19.78  $\mu$ m, and the size of the sensitive volume is 160 nm×160 nm×160 nm. Meanwhile, the critical charge is 0.18 fC, and 10<sup>7</sup> particles are simulated.



Figure 3-10 The built structure of the SoC in CRÈME-MC simulation

The CREME-MC simulation results for the 70 and 90 MeV protons are presented in Figure 3-11. The obtained cross sections for the 70 and 90 MeV simulations are  $1.41 \times 10^{-15}$  and  $1.68 \times 10^{-15}$  cm<sup>2</sup>·bit<sup>-1</sup>, respectively. The simulation results are credible compared with the obtained cross sections in irradiation. At the same time, the 18 MeV simulation result is also compared with that in [140]. All of these demonstrate the built structure and the parameters are reliable. Therefore, more simulations can be performed based on the built simulation model. And the simulations from several MeV to hundreds of MeV protons are also executed.



Figure 3-11 CRÈME-MC simulation results

From Figure 3-11, it can be obtained that the cross sections at low energy proton are higher than that of the high energies by 4-5 orders. The peak cross section is reached at 2 MeV proton for the SoC. The simulation implies the proton direct ionization energy threshold is about 1.4 MeV for the SoC. The SEE threshold range and LET are 27.22  $\mu$ m and 0.142 MeV·cm<sup>2</sup>·mg<sup>-1</sup> for the Xilinx 28nm CMOS SoC <sup>[147]</sup>.

#### 3.4 Summary

The 70 and 90 MeV proton irradiation tests on Xilinx 28nm CMOS SoC are performed to evaluate its SEE vulnerability in proton-rich environments, such as cosmic and solar rays. The obtained SEE events during irradiation demonstrate the capability of inducing SEE on 28nm SoC are similar for two proton beam irradiations. Further, the Geant4 and CREME-MC Monte Carlo simulations are also performed to analyze the investigated SEE events. Finally, the irradiation tests and Monte Carlo simulation results indicate the generated secondary heavy ions are close for the 70 and 90 MeV proton irradiation tests. The SEE LET threshold for the Xilinx 28nm CMOS SoC is also predicted.

# 4 Atmospheric Neutron SEE on Xilinx 28nm CMOS SoC

SEE caused by atmospheric neutron gains much attention as the semiconductor device constant miniaturization. To exactly explore atmospheric neutron SEE, the ideal test facility is the spallation neutron source <sup>[148-149]</sup>. Lately, the CSNS has been built and commissioned, making atmospheric neutron SEE evaluation more convenient in China. Depending on the CSNS, atmospheric neutron SEE evaluation on Xilinx 28nm CMOS SoC is performed <sup>[150-151]</sup>. Different energy range neutrons' contributions to SEE vulnerability are analyzed in conjunction with Monte Carlo simulations.

#### 4.1 Atmospheric Neutron SEE

Atmospheric neutron inducing SEE becomes challengeable to the nanoscale electronics, although some measures have been taken, such as getting rid of the boro-phospho-silicate glass (BPSG) package <sup>[152-153]</sup>.

The atmospheric neutron spectrum is rather extensive, covering meV to GeV. Besides the energy parameter, the atmospheric neutron flux is also affected by altitude, latitude, and others <sup>[154]</sup>. Neutrons are uncharged, and they lead to SEE via colliding with nucleons in semiconductor and producing secondary charged particles <sup>[155]</sup>. Figure 4-1 (a) and (b) show cross sections of neutron interaction with diverse nucleons, including <sup>10</sup>B, <sup>11</sup>B, <sup>14</sup>N, <sup>16</sup>O, <sup>28</sup>Si, <sup>27</sup>Al, and <sup>184</sup>W <sup>[156]</sup>.



Figure 4-1. Cross sections of neutrons interacting with different nucleons

Figure 4-1 demonstrates different energy neutrons experience nuclear interactions with different probabilities. The discrepancy may be over three orders of magnitudes <sup>[156]</sup>. Hence, it is necessary to discuss SEE contributions from different energy neutrons. The CSNS provides a valuable platform to reach this goal. Rely on this platform, atmospheric neutron SEE vulnerability, above 1 and 10 MeV neutron and thermal neutron, contributions to Xilinx 28nm CMOS SoC are explored and analyzed, taking advantage of various solutions.

# 4.2 Irradiation Examination

# 4.2.1 CSNS Spectrum

1.6 GeV proton beam is extracted from the accelerator to bombard a tungsten metal target to produce spallation neutrons at CSNS. Then, the generated neutrons are processed by diverse moderators to meet different requirements at different beamlines (BLs), and the irradiation of Xilinx 28nm CMOS SoC is conducted at BL-09. Figure 4-2 draws the differential fluxes of Peking terrestrial (×10<sup>9</sup>) and CSNS-BL09. It can be viewed the CSNS-BL09 spectrum is rather close to the real one.



Figure 4-2 Spectrums of CSNS-BL09 and Peking terrestrial

## 4.2.2 Test Implementation

As Figure 4-3 shows, the neutron beam is extracted from the target station and shielded with various materials, such as decoupled and poisoned hydrogen moderator (DPHM), concrete, etc., in the irradiation room. And just a 20 mm opening is visible to users in the irradiation room.

The test board is the Xilinx 28nm CMOS SoC. The chip is aligned with the 20 mm opening before irradiation. The programmable power supplies it, and the host communicates with the board via UART-USB fiber.



Figure 4-3 The layout of the neutron irradiation test

Blocks, including OCM, D-Cache, and BRAM, are examined, respectively. Table 4-1 lists the details of each block. During irradiation, the upset information is updated and recorded on the host computer terminal, and the programmable power also monitors the possible SEL on the test board.

| 4 | Atmospheric | Neutron | SEE | on Xilinx | 28nm | CMOS | SoC |
|---|-------------|---------|-----|-----------|------|------|-----|
|---|-------------|---------|-----|-----------|------|------|-----|

|         | Table 4-1         Tested blocks in neutron in | rradiations  |
|---------|-----------------------------------------------|--------------|
| Block   | Tested volume/KB                              | Data pattern |
| OCM     | 64                                            | 0xA5A5A5A5   |
| D-Cache | 32                                            | 0xA5A5A5A5   |
| BRAM    | 8                                             | 0xA5A5A5A5   |

111 1. ......

Before irradiation, the entire system operates for 45 hours to eliminate the influence from the irradiation room circumstance and check the status of the test system. Finally, no error is detected. It demonstrates the impinging neutrons from the terminal induce the detected errors in experiments.

# **4.3 Irradiation Results**

The detected SEE includes multiple types: SBU, DCU, MCU, SEFI, and others. The SEE types and numbers of each block test vary. It should be noticed that one cluster error and one UART-USB unknown character output are observed during the BRAM test. The chip temperature shifts 46.98 to 48.86 °C during the irradiation. Table 4-2 shows the SEE numbers of each tested block.

|              | Table 4-2 | SEE of different blocks. |      |   |
|--------------|-----------|--------------------------|------|---|
| <br>SEE type | OCM       | D-Cache                  | BRAM | - |
| <br>SBU      | 21        | 5                        | 3    | - |
| DCU          | 4         | /                        | /    |   |
| MCU          | 2         | /                        | /    |   |
| SEFI         | 5         | 5                        | 12   |   |
|              |           |                          |      |   |

Table 4-2 SEE of different blocks

Conventionally, high energy neutrons (E>10 MeV) are only considered in evaluating devices' SEE sensitivity [32, 50]. However, as the device technology scaling down, the contribution from 1-10 MeV neutron cannot be ignored anymore [49, 157-158].

# 4.4 Test Results Analysis

The status of the neutron beam is monitored in real-time during irradiations. Hence, each block irradiation test's beam flux and fluence at different energy ranges can be obtained.

## 4.4.1 E>1 and E>10 MeV Neutron Contribution

The average fluxes and fluences of E>10 MeV neutron during the blocks' irradiation tests are listed in Table 4-3. And Table 4-4 shows the flux and fluences considering E>1 MeV neutrons. The corresponding SEU cross sections under both conditions are calculated with (3-1) and (3-2), respectively. Figure 4-4 displays the device and bit cross sections under the case of E>10 MeV and E>1 MeV, respectively.

|                                          | OCM                  | D-Cache              | BRAM                 |
|------------------------------------------|----------------------|----------------------|----------------------|
| Flux/n·cm <sup>-2</sup> ·s <sup>-1</sup> | 5.33×10 <sup>4</sup> | 5.06×10 <sup>4</sup> | $5.31 \times 10^{4}$ |
| Fluence/n·cm <sup>-2</sup>               | 1.63×10 <sup>9</sup> | 1.73×10 <sup>9</sup> | $1.91 \times 10^{9}$ |

Table 4-3 Neutron flux and fluence at E>10 MeV

|                                                                                                                  | Table 4-4        | Neutron flux                | and flue          | nce at E>1   | MeV              |                             |
|------------------------------------------------------------------------------------------------------------------|------------------|-----------------------------|-------------------|--------------|------------------|-----------------------------|
|                                                                                                                  |                  | OCM                         |                   | D-Ca         | iche             | BRAM                        |
| $Flux/n \cdot cm^{-2} \cdot s^{-1}$                                                                              |                  | 7.24×10                     | ) <sup>5</sup>    | 6.86>        | <10 <sup>5</sup> | 7.21×10 <sup>5</sup>        |
| Fluence/n·cm <sup>-2</sup>                                                                                       |                  | 2.22×10                     | ) <sup>10</sup>   | 2.35×        | 10 <sup>10</sup> | $2.60 \times 10^{10}$       |
| <sup>10<sup>7</sup></sup><br><sup>10<sup>9</sup></sup><br><sup>10<sup>-12</sup></sup><br><sup>0CM</sup><br>(a) D | D-Cache<br>Block | E>10 MeV<br>E>1 MeV<br>BRAM | 10 <sup>-13</sup> | OCM<br>(b) B | D-Cache<br>Block | E>10 MeV<br>E>1 MeV<br>BRAM |

Figure 4-4 The cross sections of test blocks at E>10 MeV and E>1 MeV

From Figure 4-4, it can be seen the cross section differences between E>10 MeV and E>1MeV, which approaches an order of magnitudes. For example, the OCM bit cross section for E>10 MeV and E>1 MeV is 2.46×10<sup>-14</sup> and 1.80×10<sup>-15</sup> cm<sup>2</sup>·bit<sup>-1</sup>. The discrepancy exceeds 12 times. It is similar to that of D-Cache and BRAM. In [159], Xilinx released the atmospheric neutron SEE test results of different series devices from Rosetta and Los Alamos Neutron Science Center <sup>[53]</sup>. The atmospheric neutron SEU cross section of 28nm CMOS SRAM is about  $6.32 \times 10^{-15}$  cm<sup>2</sup>·bit<sup>-1</sup>. In Figure 4-2, it can be obtained that the utilized neutron spectrum is a little softer than that of the Peking terrestrial. It means the observed SEE cross sections should be a little smaller than the released. Therefore, it is more reasonable to consider neutron at E>1 MeV to comprehensively analyze Xilinx data and spectrum status in assessing the atmospheric neutron SEE vulnerability.

What's more, the SER in failure in time (FIT) of each block can also be calculated with the formula (4-1) and (4-2), respectively. The atmospheric neutron flux is 9.50 and 14.80 cm<sup>-</sup> <sup>2</sup>·h<sup>-1</sup> for E>10 MeV and E>1 MeV neutrons at Peking terrestrial. Figure 4-5 draws the SERs of SEU under two cases.

$$SER_b = \sigma_b \times \nu \times 10^9 \times 10^6 \tag{4-1}$$

$$SER_d = \sigma_d \times \nu \times 10^9 \tag{4-2}$$

where  $SER_b$  --soft error rate of Mbit/FIT·Mbit<sup>-1</sup>;  $\sigma_b$  --bit cross section/cm<sup>2</sup>·bit<sup>-1</sup>;  $\nu$  -atmospheric neutron flux/cm<sup>-2</sup>·h<sup>-1</sup>; SER<sub>d</sub>--soft error rate of device/FIT device<sup>-1</sup>;  $\sigma_d$ --device cross section/cm<sup>2</sup>.

Figure 4-5 (a) and (b) depict the SERs for E>10 MeV and E>1 MeV, respectively. In Figure 4-5 (b), it can be seen the SER is 233.45 FIT Mbit<sup>-1</sup> and 26.70 FIT Mbit<sup>-1</sup> for neutron at E>10 MeV and E>1 MeV, respectively. In [27], the recommended SER measured from realtime in various altitudes in New York City is 72 FIT Mbit<sup>-1</sup>. It is about 52 FIT Mbit<sup>-1</sup> when converted to the Peking terrestrial. The SER illustrates that it is more plausible to consider E>1 MeV neutron, again.



Figure 4-5 SERs of SEU under E>10 and E>1 MeV condition

The irradiation results indicate that 1-10 MeV neutron contribution should be considered in estimating Xilinx 28nm CMOS SoC atmospheric neutron SEE rather than the conventional consideration (neutrons E>10 MeV). At the same time, the Geant4 simulation is also performed to verify this fact.

# 4.4.2 Mono-energy Neutron Geant4 Simulation

The mechanism of SEE induced by atmospheric neutrons is the nuclear reaction generating secondary particles to deposit energy in the sensitive volume <sup>[160]</sup>. Following the vertical architecture shown in Figure 2-4, a Geant4 simulation module is constructed, and the simulation is performed. In the current simulation, the sensitive volume is 130 nm×130 nm×130 nm, and the critical charge is 0.21 fC <sup>[161]</sup>. And in the current simulation, several mono energies neutrons are simulated, respectively. As Table 4-5 shows, 13 mono-energy neutrons are emulated, and each corresponding incident particle number is 10<sup>7</sup>.

|            | -energy neutron in Geani4 |
|------------|---------------------------|
| Energy/MeV | Particle Number           |
| 1          | 107                       |
| 2          | 107                       |
| 5          | 107                       |
| 8          | 107                       |
| 10         | 107                       |
| 20         | 107                       |
| 50         | 107                       |
| 100        | 107                       |
| 200        | 107                       |
| 500        | 107                       |
| 600        | 107                       |
| 700        | 107                       |
| 1000       | 107                       |

Table 4-5The simulated mono-energy neutron in Geant4

Figure 4-6 displays the mono-energy neutron simulation results. Compared with the

CSNS test results, cross sections at E>10 MeV are less. However, that is a bit larger than the tested at 1-10 MeV neutron simulation. The neutron spectrum irradiation covers different energy neutrons, but if we consider the energy E>10 MeV only, it can be seen the simulation results would be less than the irradiation test. This phenomenon demonstrates that the neutron contribution from 1-10 MeV cannot be ignored again.



Figure 4-6 Mono-energy neutron simulation results

Comprehensively analyzing the results of spallation neutron irradiation and Geant4 simulation, it can be concluded that it's more plausible to consider neutron with energy E>1 MeV in estimating atmospheric neutron SEE on the Xilinx 28nm CMOS SoC.

In the OCM irradiation test, DCU and MCU events are detected. The two kinds of MCU events include 3 and 4 cell upset, respectively. Obviously, it illustrates the energetic secondary particles pass through multi cells and deposit energy simultaneously. Figure 4-7 shows the cross section of DCU and MCU in the OCM irradiation test for E> 1 MeV. They are  $1.80 \times 10^{-10}$  and  $9.01 \times 10^{-11}$  cm<sup>2</sup>, respectively.



Figure 4-7 DCU and MCU cross sections of OCM irradiation test for E>1 MeV

Si-28, Si-29, Si-30, Mg-25, Mg-26, Al-27, Alpha, and proton are the detected secondary particles in simulations. And ranges of the secondary heavy ions reach micrometers level. They are able to go through multi cells to deposit energy. Figure 4-8 shows the schematic of secondary particle passes through multi sensitive volumes.



Figure 4-8 Schematic of secondary particle depositing energy in multiple sensitive volumes

It is the same as the SBU events. SEFI events are detected in each block irradiation test. Figure 4-9 draws the cross sections of SEFI events of blocks. For OCM, D-Cache, and BRAM, they are  $2.25 \times 10^{-10}$ ,  $2.13 \times 10^{-10}$ , and  $4.62 \times 10^{-10}$  cm<sup>2</sup> for E>1 MeV, respectively. The processor reads and writes the block during the irradiation, and the UART is responsible for communication. The process involves diverse components and registers. If one of them is corrupted, SEFI may occur.



Figure 4-9 SEFI cross section of tested blocks

In addition, the atmospheric neutron covers the thermal neutron. Even though the BPSG is removed from the package, boron still exists in semiconductor contact and doping manufacturing processes. Therefore, researchers are still concerned about thermal neutron contribution to nanoscale SRAM SEE vulnerability <sup>[162-166]</sup>.

## 4.5 Thermal Neutron Influence Evaluation

To examine the influence of thermal neutron, the irradiation test is executed once again using the same facility and setting up as described in 4.2. The most significant difference is a 2 mm cadmium slice placed between the opening and the irradiated chip. Figure 4-10 draws the neutron fluences at two conditions. One corresponds to the original spectrum (referred to BL09-0), and the other is related to the proceeded with a 2 mm cadmium slice (referred to BL09-2). It can be seen thermal neutron is absorbed obviously.



Figure 4-10 Fluence spectrum with and without 2mm cadmium slice

Compared with other blocks, most SEE events are observed in OCM in the BL09-0 irradiation test, so the comparative test is mainly carried out on the OCM block. The 64 KB OCM written with 0xA5A5A5A5 is examined again.

#### 4.5.1 BL09-2 Irradiation Results

Like the irradiation test in the BL09-0, SEE events, including SBU, DCU, MCU, and SEFI, are investigated in the irradiation with a 2 mm cadmium slice. Table 4-6 summarizes the detected SEEs in the BL09-2 test, and Table 4-7 lists the neutron flux and fluence for this test at E>1 MeV.

|        | fuele i e Beleeleu                  | BEES III III BEO, 2        |                     |
|--------|-------------------------------------|----------------------------|---------------------|
| SBU    | DCU                                 | MCU                        | SEFI                |
| 13     | 2                                   | 2                          | 2                   |
| Tab    | le 4-7 Neutron flux and flue        | ence in the BL09-2 for E>2 | l MeV               |
| Flux/1 | n·cm <sup>-2</sup> ·s <sup>-1</sup> | Fluence                    | /n·cm <sup>-2</sup> |
| 6.8    | 6.85×10 <sup>5</sup>                |                            | 10 <sup>10</sup>    |

Table 4-6 Detected SEEs in the BL09-2

Compared with the cumulative fluence of the OCM test in Table 4-4, which is  $2.22 \times 10^{10}$  cm<sup>-2</sup>, the fluence of BL09-2 is higher than that by 11.26%. However, when it is compared to the observed SEEs in Table 4-2, SBU, DCU, MCU, and SEFI are 21, 4, 2, and 5, they are lower in Table 4-6.

### 4.5.2 Thermal Neutron Contribution

The SBU cross sections of BL09-2 are calculated with (3-1) and (3-2). Figure 4-11 depicts the SBU cross sections of BL09-0 and BL09-2 for E>1 MeV. Take the bit cross section in Figure 4-11 (b) for discussion, it can be seen that it is  $1.80 \times 10^{-15}$  and  $1.00 \times 10^{-15}$  cm<sup>2</sup>·bit<sup>-1</sup> for BL09-0 and BL09-2 irradiation tests, respectively. The discrepancy is  $0.80 \times 10^{-15}$  cm<sup>2</sup>·bit<sup>-1</sup>. It preliminarily indicates thermal neutron contribution reaches 44.4% in BL09-0. In [164], it points out thermal neutron contributions approach almost 50% for the 45nm CMOS SRAM atmospheric neutron SEE. These facts indicate that thermal neutrons cannot be ignored even though their packages are without the BPSG.

4 Atmospheric Neutron SEE on Xilinx 28nm CMOS SoC



Figure 4-11 Cross section comparison of BL09-0 and BL09-2

Figure 4-12 displays the cross sections of DCU, MCU, and SEFI events in two irradiation tests. The ratios of DCU, MCU, and SEFI for BL09-0 and BL09-2 tests are 2.23:1, 1.11:1, and 2.78:1, respectively. The ratio signifies the contribution of thermal neutron once more.



Figure 4-12 Cross sections of non-single bit upset of twice irradiation tests

The two tests demonstrate that thermal neutron leading to hazards is still required to pay attention in 28nm CMOS SoC, although there is no boron in the package. Besides boron, another high cross section element, Hf, also exists in the SoC. Figure 4-13 shows that the cross section of the thermal neutron with Hf is higher than Si by about two orders of magnitudes. So, further analysis is required to understand the thermal neutron contribution.



Figure 4-13 Cross section of various energies neutron with Hf, B, and Si

#### 4.5.3 Elements Interaction

## a) Interaction with B

The thermal neutron (n<sub>th</sub>) interacts with <sup>10</sup>B, generating <sup>7</sup>Li and alpha particles, as formulas (4-3) and (4-4) show. And the majority reaction comes from (4-4) since its possibility approaches 93.7% <sup>[166]</sup>. Therefore, it can be speculated that the 0.84 MeV <sup>7</sup>Li and 1.47 MeV  $\alpha$  depositing energy in the sensitive volumes and leading to the discrepancy between the two irradiation tests.

$${}^{10}\text{B+n}_{\text{th}} \rightarrow {}^{7}\text{Li}(1.01 \text{ MeV}) + \alpha(1.78 \text{ MeV})$$
 (4-3)

$${}^{10}\text{B+n_{th}} \rightarrow {}^{7}\text{Li}(0.84 \text{ MeV}) + \alpha (1.47 \text{ MeV}) + \gamma (0.48 \text{ MeV})$$
 (4-4)

Table 4-8 shows the ranges and LETs of 0.84 MeV <sup>7</sup>Li and 1.47 MeV  $\alpha$  particles. It can be observed ranges of both particles in silicon is about 2.46 and 5.00 µm, respectively <sup>[135]</sup>. They are much less than the chip's thickness from the top passive layers to the substrate's surface, as shown in Figure 2-4. It verifies that the B element exists in the device. The LETs are 2.10 and 1.15 MeV·cm<sup>2</sup>·mg<sup>-1</sup>, they are higher than the predicted direct ionization LET<sub>th</sub> in 3.3.2 and can induce SEE in Xilinx 28nm CMOS SoC.

Table 4-8 Range and LET of majority secondary particles of <sup>10</sup>B with thermal neutron

| Secondary Particle | Energy/MeV | Range in silicon/µm | LET/MeV·cm <sup>2</sup> ·mg <sup>-1</sup> |
|--------------------|------------|---------------------|-------------------------------------------|
| <sup>7</sup> Li    | 0.84       | 2.46                | 2.10                                      |
| α                  | 1.47       | 5.00                | 1.15                                      |

b) Interaction with Hf

The key mechanism of thermal neutron interacting with <sup>10</sup>B is the nuclear reaction depicted in (4-3) and (4-4). Nevertheless, the (n,  $\gamma$ ) reaction is the main process of the thermal neutron interaction with Hf, as displayed in Figure 4-14. The majority range of produced <sup>179</sup>Hf is only tens of Angstroms. It is too short to pass the sensitive volume. The generated  $\gamma$  ray cannot lead to SEE directly, and it can collide with other atoms to cause effects. This probability is rather low.



Figure 4-14 Cross section of Hf interacting with thermal neutron

Figure 4-14 shows that the elastic interaction cross section is about 10<sup>5</sup> barns for thermal

neutron with Hf. The maximum energy transferred in elastic interaction can be calculated with the formula (4-5). Figure 4-15 draws  $E_n$  and corresponding  $E_t$  in the elastic interaction. The maximum  $E_t$  is about 0.03 keV, and whose LET is 0.03 MeV·cm<sup>2</sup>·mg<sup>-1</sup>. It's lower than the predicted LET<sub>th</sub> in 3.3.2.

$$E_t = \frac{4M_n M_t}{(M_n + M_t)^2} E_n$$
(4-5)

where  $E_t$ -max energy transferred to Hf/keV;  $M_n$ -mass of a neutron,  $1.67 \times 10^{-27}$  kg;  $M_t$ -mass a Hf atom,  $2.96 \times 10^{-25}$  kg;  $E_n$ -the energy of neutron/keV.



Figure 4-15 The maximum energy transferred in elastic interaction

Xilinx Zynq-7000 SoC is manufactured with 28nm HKMG technology. The gate covers TiN (8nm), HfO<sub>2</sub> (10nm), and SiON (1.2nm) <sup>[167-168]</sup>. To check Hf influence further, neutron spectrum simulations are conducted on two updated Geant4 simulation models <sup>[141-143]</sup>. In Figure 4-16, the first model only introduces the TiN and ultra-thin SiO<sub>2</sub> layers. While in the second model, the TiN, HfO<sub>2</sub>, and ultra-thin SiO<sub>2</sub> are included, as drawn in Figure 4-17. Other layers are the same for the two models.



Figure 4-16 The TiN and ultra-thin SiO<sub>2</sub> layers are included



Figure 4-17 The TiN, HfO2, and ultra-thin SiO2 layers are included

In the simulation, the neutron spectrum is the same as BL09-0, covering thermal and high energy ranges.  $10^7$  particles emit from a 10  $\mu$ m×10  $\mu$ m source, 32×32 sensitive volumes are placed, each size is 130 nm×130 nm×130 nm, and the critical charge is 0.21 fC.

In both simulations, SEE, and deposited dose on the thin  $SiO_2$  layer are calculated. Table 4-9 summarizes the simulation results of both models. The results verify Hf element does not influence the SEU. However, it indicates the risk that the total dose is increased by five times.

|              | Table 4-9 | Simulation results of both models                |                    |
|--------------|-----------|--------------------------------------------------|--------------------|
|              | SEU       | Cross section/cm <sup>2</sup> ·bit <sup>-1</sup> | Deposited Dose/rad |
| First Model  | 5         | 5×10 <sup>-16</sup>                              | 12.6               |
| Second Model | 5         | 5×10 <sup>-16</sup>                              | 63.3               |

Table 4-9 Simulation results of both models

In summary, the two spallation neutron irradiation tests and simulation demonstrate thermal neutron can make about 44% contribution to the SoC atmospheric neutron SEE, which is mainly induced by the secondary particles of <sup>7</sup>Li and  $\alpha$ .

# 4.6 Equivalence with Medium-energy Proton

The 64 MeV proton is considered a credible surrogate for atmospheric neutrons SEE <sup>[53]</sup>. And the equivalence of spectrum neutron and proton SEE is discussed in [169], too. For the Xilinx 28nm CMOS SoC, in Chapter 3, SEEs induced by medium energy proton are discussed. And in this part, SEEs caused by atmospheric neutrons are observed. It provides details to discuss the equivalence based on the irradiation results.

| Table 4-10 SEE cross sections for | 70 MeV proton and BL09-0 for E>Twev              |
|-----------------------------------|--------------------------------------------------|
| Irradiation                       | Cross section/cm <sup>2</sup> ·bit <sup>-1</sup> |
| 70 MeV proton                     | 1.67×10 <sup>-15</sup>                           |
| BL09-0 for E>1MeV                 | 1.80×10 <sup>-15</sup>                           |

Table 4-10 SEE cross sections for 70 MeV proton and BL09-0 for E>1MeV

Table 4-10 shows the SEE cross sections of 70 MeV proton and BL09-0 for E>1MeV, respectively. The cross section of 70 MeV proton irradiation test is  $1.67 \times 10^{-15}$  cm<sup>2</sup>·bit<sup>-1</sup>, and that is  $1.80 \times 10^{-15}$  cm<sup>2</sup>·bit<sup>-1</sup> in BL09-0 at E>1 MeV. The ratio is 1:1.08. This result demonstrates 70 MeV proton SEE results can predict atmospheric neutron SEEs to a degree.

# 4.7 Summary

Atmospheric neutron inducing SEE on the Xilinx 28nm CMOS SoC is examined at CSNS. SEE induced by different energy range neutrons are observed and analyzed combining with Geant4 simulation. The effort illustrates the atmospheric neutron SEE should consider neutron contribution of E>1 MeV for the advanced nanoscale COTS SoC. Meanwhile, the thermal neutron contributes about 44.4% to the atmospheric neutron SEE, although no boron exists in the package.

#### Multi Patterns SEE on Xilinx 28nm CMOS SoC 5

The PS part in Xilinx 28nm CMOS SoC integrates the dual-ARM core. This feature makes it possible to execute multiple different processor pattern designs on the SoC. Specifically, the processor patterns can be sole-processor (SP), asymmetric multiprocessing (AMP), and others <sup>[170]</sup>. Meanwhile, data in memory blocks can be accessed statically or dynamically in different processor patterns. SEE sensitivities in these cases might be different. This chapter explores and discusses them, taking advantage of two heavy ion irradiation facilities in China.

#### 5.1 Patterns Examination and Irradiation Setup

#### 5.1.1 Patterns Examination

The 32 KB OCM is examined in this section. And data are checked statically and dynamically under the AMP and SP processor patterns during heavy ion irradiations. In specific, Table 5-1 describes more detail about each examined pattern. It can be viewed the SP pattern indicates operations executed on one ARM core only, usually, this processor is the Core0. And the AMP pattern requires that the dual-ARM core cooperates with master and slave.

|             | Table 5-1 | The core patterns examined during the heavy ion irradiation                 |
|-------------|-----------|-----------------------------------------------------------------------------|
| Core Patter | rn        | Detail                                                                      |
| SP          |           | SoC executes applications relying on only one ARM processor, and the        |
| 51          |           | processor is the Core0 usually.                                             |
| AMP         | Ap        | plications are executed by the cooperation of the dual-ARM cores, which are |
| Alvir       |           | the master and slave one                                                    |

T 1 1 5 1 TI . . . . . . . · · • •

In AMP, the master and slave cores are Core0 and Core1, respectively. The master awakens the slave one at the program beginning. Moreover, OCM is accessed by both cores, and it's written by the master core firstly. Then, Corel plays the role of SEU detecting, which reads and compares the data with the expected to decide SEU occurrence in OCM. The reading and comparison involve static and dynamic during examinations. The 32 KB test data occupied with 0xA5A5A5A5 in OCM are accessed and compared without writing and refreshing in the static examination. While the test data are read and written repeatedly during the dynamic examination.

#### 5.1.2 Irradiation Setup

The irradiation is performed using two heavy ion accelerators in China. One is the HI-13 at CIAE, and the other is the HIRFL. Before irradiations, the SoC chip is de-capped and then irradiated by the ion beams. The board is supplied by a 12 V programmable power and communicates with the host via the fibber USB. The SEE messages are recorded in real-time. Figure 5-1 is the photos of the irradiation worksites in the two heavy ion accelerator irradiation tests, respectively. As the ion's LET is much higher, the irradiation of HIFRL is operated in



the air. In contrast, that is executed in a vacuum at HI-13.

(a) Photo of HI-13 irradiation

(b) Photo of the HIRFL irradiation

Figure 5-1 Photo of the heavy ion irradiations

The utilized heavy ions during the irradiation tests are presented in Table 5-2. The parameters, including ions and corresponding energies, LETs, and ranges, are listed. It can be observed, the LET of ions at HIRFL is significantly high, about 78.3 MeV·cm<sup>2</sup>·mg<sup>-1</sup>. It's approximately six times for the LET of Cl at HI-13. The ARM processor patterns and the OCM data test modes for different irradiation tests are summarized in Table 5-3 and 5-4, respectively. Specifically, the dual-ARM core is tested in the AMP when OCM data is accessed statically, and a single ARM core is tested while data is accessed dynamically, during the HI-13 irradiation test. During the HIRFL irradiation test, the dual-ARM core is examined with OCM data accessed dynamically, and both static and dynamic accesses are examined in the SP pattern.

|          |                         | Table 5-2        | lons used in heavy ion irradiation        |                     |  |
|----------|-------------------------|------------------|-------------------------------------------|---------------------|--|
| Facility | Ion                     | Energy/MeV       | LET/MeV·cm <sup>2</sup> ·mg <sup>-1</sup> | Range in Silicon/µm |  |
|          | Cl                      | 160              | 13.1                                      | 46                  |  |
| HI-13    | Si                      | 135              | 9.3                                       | 50.7                |  |
|          | С                       | 80               | 1.7                                       | 127.1               |  |
| HIRFL    | Та                      | 1697.4           | 78.3                                      | 99.3                |  |
|          |                         | Table 5-3        | Test pattern in HI-13 Irradiation         |                     |  |
| LET/     | /MeV·cm <sup>2</sup> ·r | ng <sup>-1</sup> | Core pattern                              | Data mode           |  |
|          | 13.1                    |                  | AMP                                       | Static              |  |
|          | 13.1                    |                  | SP                                        | Dynamic             |  |
|          | 0.2                     |                  | AMP                                       | Static              |  |
|          | 9.3                     |                  | SP                                        | Dynamic             |  |
| AMP      |                         | AMP              | Static                                    |                     |  |
|          | 1.7                     |                  | SP                                        | Dynamic             |  |
|          |                         | Table 5-4        | Test pattern in HIRFL irradiation         |                     |  |
| LET/     | /MeV·cm <sup>2</sup> ·r | ng <sup>-1</sup> | Core Pattern                              | Data mode           |  |
|          |                         |                  | AMP                                       | Dynamic             |  |
| 78.3     |                         |                  | SD                                        | Static              |  |
|          |                         |                  | SP                                        | Dynamic             |  |

Table 5-2 Ions used in heavy ion irradiation

## 5.2 Irradiation Results

During both irradiations, SEU and SEFI events are observed. The SEFI events appear as the UART communication exception. More important is the abnormal step-increment currents are detected during the HIRFL irradiation.

## 5.2.1 SEE of HI-13 Irradiation

Concerning the HI-13 irradiation test, SEE sensitivity on multi patterns is investigated at different ions striking. Table 5-5 shows the ions' fluxes and fluences at HI-13 irradiation. Take the Cl ion as an example. The AMP and SP modes are tested until the accumulative fluence reaches  $1.0 \times 10^6$  cm<sup>-2</sup>, respectively. And others are similar. It can be viewed that the fluence is the same:  $10^6$  cm<sup>2</sup>, although the flux is a little different for different ions.

|     | Table 5-5Fluxes and fluences of d         | lifferent ions at HI-13 irr            | adiation                 |
|-----|-------------------------------------------|----------------------------------------|--------------------------|
| Ion | LET/MeV·cm <sup>2</sup> ·mg <sup>-1</sup> | Flux/cm <sup>-2</sup> ·s <sup>-1</sup> | Fluence/cm <sup>-2</sup> |
| Cl  | 13.1                                      | $1.5 \times 10^{3}$                    | $1.0 \times 10^{6}$      |
| Si  | 9.3                                       | $1.0 \times 10^{3}$                    | $1.0 \times 10^{6}$      |
| С   | 1.7                                       | $2.0 \times 10^{3}$                    | $1.0 \times 10^{6}$      |

. .....

\_\_\_\_ . \_ .

. ....

Finally, 1186 SEU and 152 SEFI events are detected during HI-13 irradiation for different cases tests. The cumulative fluences are the same for each mode, and higher SEE numbers stand for the higher sensitivity. Table 5-6 lists the detail of the observed SEE events of each case.

|                                               |              | 8         | -   |      |
|-----------------------------------------------|--------------|-----------|-----|------|
| LET/<br>MeV·cm <sup>2</sup> ·mg <sup>-1</sup> | Core pattern | Data mode | SEU | SEFI |
| 13.1                                          | AMP          | Static    | 504 | 44   |
| 15.1                                          | SP           | Dynamic   | 175 | 33   |
| 9.3                                           | AMP          | Static    | 252 | 38   |
| 9.5                                           | SP           | Dynamic   | 124 | 26   |
| 1.7                                           | AMP          | Static    | 91  | 7    |
|                                               | SP           | Dynamic   | 40  | 4    |

Table 5-6 Detected SEE events during HI-13 irradiation

The SEU and SEFI cross sections for different mode tests are obtained with (3-1) and (3-2), respectively. They are drawn in Figures 5-2 and 5-3, respectively. At the same time, the reported heavy ion irradiation tests on the SoC from [140] and [171] are also presented in Figure 5-2. It can be observed these are more consistent with the trend of the static mode test instead of the dynamic. In [140], the SoC was irradiated in SP pattern with static mode using heavy ion's LET at 24.3 MeV·cm<sup>2</sup>·mg<sup>-1</sup>. And in the same condition as [140], the SoC was examined by heavy ions at 6.4 and 17 MeV·cm<sup>2</sup>·mg<sup>-1</sup> in [171]. Combining the SEU static cross sections in [140] and [171], it can preliminarily point out that the SEU cross section is more influenced by data access mode than processor pattern.

For Cl, Si, and C ion, the SEU cross sections of the static and dynamic modes are  $1.92 \times 10^{-9}$  and  $6.68 \times 10^{-10}$  cm<sup>2</sup>·bit<sup>-1</sup>,  $9.61 \times 10^{-10}$  and  $4.73 \times 10^{-10}$  cm<sup>2</sup>·bit<sup>-1</sup>, and  $3.47 \times 10^{-10}$  and  $1.53 \times 10^{-10}$  cm<sup>2</sup>·bit<sup>-1</sup>. These illustrate that the static SEU cross sections are almost double the dynamic

cross section.

Figure 5-3 displays the SEFI cross sections of the HI-13 irradiation. The cross sections of the AMP pattern are a bit higher than that of the SP. This phenomenon underlies that the processor pattern affects the SoC SEFI vulnerability.



Figure 5-2 SEU cross sections in HI-13 irradiation



Figure 5-3 SEFI cross sections in HI-13 irradiation

## 5.2.2 SEE of HIRFL Irradiation

During HIRFL irradiation, the LET is 78.3 MeV·cm<sup>2</sup>·mg<sup>-1</sup>. It's rather higher than that of ions in the HI-13 irradiation tests. As mentioned above, abnormal currents are detected apart from the detected SEE events. What's more, a top 20 cell upset events are detected. For different patterns, the ion fluxes and fluences are listed in Table 5-7. It can be observed the flux for AMP+Dyanmic (AMP+D), SP+Static (SP+S), and SP+Dyanmic (SP+D) are the same, even though the fluences are a little different.

|              | Table 5 / The flux and f | dence of each patient in this          | Emaduation               |
|--------------|--------------------------|----------------------------------------|--------------------------|
| Core Pattern | Data mode                | Flux/cm <sup>-2</sup> ·s <sup>-1</sup> | Fluence/cm <sup>-2</sup> |
| AMP          | Dynamic                  | $1.0 \times 10^{3}$                    | $2.1 \times 10^5$        |
| CD           | Static                   | $1.0 \times 10^{3}$                    | $3.0 \times 10^{5}$      |
| SP           | Dynamic                  | $1.0 \times 10^{3}$                    | $2.5 \times 10^{5}$      |

Table 5-7 The flux and fluence of each pattern in HIRFL irradiation

Table 5-8 summarizes the detected SEU and SEFI events in the HIFRL irradiation. The detected SEU events are mainly composed of different MCUs, and Figure 5-4 presents the numbers of MCUs of each mode test. In the AMP test, 284 SEU and 47 SEFI events are investigated. While in the SP pattern, 1277 and 254 SEU, as well as 33 and 41 SEFI events are observed during the static and dynamic tests.

|              |                                                                                                                                                                                                                                                                                                                                                                         | 1    |      |
|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|------|
| Core Pattern | Data mode                                                                                                                                                                                                                                                                                                                                                               | SEU  | SEFI |
| AMP          | Dynamic                                                                                                                                                                                                                                                                                                                                                                 | 284  | 47   |
| CD           | Static                                                                                                                                                                                                                                                                                                                                                                  | 1277 | 33   |
| SP           | Dynamic                                                                                                                                                                                                                                                                                                                                                                 | 254  | 41   |
|              | 13     12       11     11       11     11       11     11       9     11       9     11       10     11       11     11       11     11       11     11       11     11       11     11       11     11       11     11       11     11       11     11       11     11       11     11       11     11       12     11       13     11       14     11       15     11 |      | D    |

Table 5-8 The detected SEE of different patterns in HIRFL irradiation

Figure 5-4 Detected MCU events in HIRFL irradiation

The SEU and SEFI cross sections are obtained with (3-1) and (3-2), respectively. And they are depicted in Figures 5-5 and 5-6, respectively. In Figure 5-5, SEU cross sections of static and dynamic modes at the SP pattern are  $1.62 \times 10^{-8}$  and  $3.88 \times 10^{-9}$  cm<sup>2</sup>·bit<sup>-1</sup>. The difference is more than four times, which again verifies the data access mode influences SEU sensitivity. The SEU cross section of the AMP+D mode is  $5.15 \times 10^{-9}$  cm<sup>2</sup>·bit<sup>-1</sup>. Compared with that of the SP+D mode, the ratio is about 1.32:1. This fact demonstrates that SEU cross sections are influenced more by data access mode than processor pattern.



Figure 5-5 SEU cross sections of different patterns at HIFRL irradiation

In Figure 5-6, the SEFI cross sections of the SP and AMP patterns are examined, respectively. And they are  $1.10 \times 10^{-4}$ ,  $1.64 \times 10^{-4}$ , and  $2.24 \times 10^{-4}$  cm<sup>2</sup> for the SP+S, SP+D, and AMP+D test, respectively. The SP+D and AMP+D tests involve different processor patterns under the same data access mode, and the SEFI cross section rate is 1:1.37. It means the AMP pattern experience more SEFI vulnerability again.



Figure 5-6 SEFI cross sections of different patterns at HIFRL irradiation

For the Xilinx 28nm CMOS SoC, its normal current is about 330 mA. But the programmable power detects step-increment currents during the HIRFL irradiation test. The exceptional currents are depicted in Figures 5-7 (a) and (b). At the same time, it should be noticed that the lower current is also observed in Figure 5-7(b).



Figure 5-7 Detected abnormal current steps in HIRFL irradiation

## 5.3 Different Test Modes Influence

Figure 5-2 shows that the SEU cross sections increase with the LET in static and dynamic accessing modes. In HI-13 irradiation, the static mode is used under the case of AMP, and it's the dynamic access mode in the SP. To further examine the processor core and accessing modes' contribution to the SEU sensitivity, the static and dynamic accessing explorations are performed in the same SP pattern in HIRFL irradiation. Table 5-9 summarizes the cross section

ratios between the static and dynamic accessing in both heavy ion irradiation tests. For the HI-13 irradiation, the ratio is in the range of 2 to 3, while it's 4.19 in the HIRFL irradiation. The ratios suggest that dynamic access can reduce SEU vulnerability in a certain.

| Table 5-9 SEU cross section ratios in different ion irradiations |      |      |      |      |  |  |
|------------------------------------------------------------------|------|------|------|------|--|--|
| LET/MeV·cm <sup>2</sup> ·mg <sup>-1</sup>                        | 1.7  | 9.3  | 13.1 | 78.3 |  |  |
| Cross section ratio (Static/Dynamic)                             | 2.28 | 2.03 | 2.88 | 4.19 |  |  |

 Table 5-9
 SEU cross section ratios in different ion irradiations

In addition, the Weibull curve fitting is conducted for the static and dynamic SEU cross section with formula (5-1) <sup>[172]</sup>. The curves are drawn in Figure 5-8. And Table 5-10 lists the parameters of the fitted curves.

$$\sigma(L) = \sigma_{sat} (1 - e^{-\left(\frac{L-L_{th}}{W}\right)^s})$$
(5-1)

where  $\sigma_{sat}$ -stature cross section/cm<sup>2</sup>·bit<sup>-1</sup> or cm<sup>2</sup>,  $L_{th}$ -LET threshold/MeV·cm<sup>2</sup>·mg<sup>-1</sup>, W and S-fitting parameters.

Table 5-10 The fitting parameters of static and dynamic cross section curves

|                               | $\sigma_{sat}$                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | $L_{th}$        | W  | S    |
|-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------|----|------|
| Static cross section fitting  | 1.9×10 <sup>-8</sup>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 0.55            | 35 | 1.98 |
| Dynamic cross section fitting | 3.7×10 <sup>-9</sup>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 0.55            | 29 | 1.87 |
| 10 <sup>-12</sup>             | Static Cross see United Static Cross see United Static results Un | section fitting |    |      |
| 0 20 40                       | 60 80                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | 100 120         |    |      |

Figure 5-8 Weibull curves for the static and dynamic accessing

LET/MeV·cm<sup>2</sup>·mg<sup>-1</sup>

Relying on the irradiation test and the fitting results, the SEU soft error rates at quiet solar conditions are predicted in static or dynamic access mode using CREME96. The predicted orbit is 450 km, under 51.6 degrees inclination with 100 mil aluminum shielding [144-145]. Table 5-11 points out the predicted SERs for memory and device. It means every bit memory experiences  $2.48 \times 10^{-8}$  and  $1.35 \times 10^{-8}$  error risks per day in static and dynamic accessing, respectively.

| Table 5-11 Predicted soft error rates in CREME90 |                                                |                                                      |  |  |  |
|--------------------------------------------------|------------------------------------------------|------------------------------------------------------|--|--|--|
|                                                  | Bit error/bit <sup>-1</sup> ·day <sup>-1</sup> | Device error/device <sup>-1</sup> ·day <sup>-1</sup> |  |  |  |
| Static                                           | 2.46×10 <sup>-8</sup>                          | 5.16×10 <sup>-2</sup>                                |  |  |  |
| Dynamic                                          | 1.35×10 <sup>-8</sup>                          | 2.83×10 <sup>-2</sup>                                |  |  |  |

Table 5-11Predicted soft error rates in CREME96

Figures 5-3 and 5-6 depict the SEFI cross sections in HI-13 and HIRFL irradiation tests, respectively. In Figure 5-3, the cross sections of the AMP are higher than that of the SP pattern. Specifically, they are  $7.0 \times 10^{-6}$ ,  $3.8 \times 10^{-5}$ , and  $4.5 \times 10^{-5}$  cm<sup>2</sup> for the AMP, while  $4.0 \times 10^{-6}$ ,  $2.6 \times 10^{-5}$ , and  $3.3 \times 10^{-5}$  cm<sup>2</sup> for the SP pattern under C, Si, and Cl ion striking, respectively. Figure 5-5 presents cross sections of SP and AMP patterns in the same dynamic accessing mode, and the SEFI cross section rate is about 1:1.37. These results signify that the processor core pattern can influence SEFI events. In the AMP pattern, more interfaces and registers are utilized, which may increase the SFEI occurrence.

In the HIREL irradiation tests, different numbers of MCU events are detected in three pattern examinations. For the 1697.4 MeV Ta ion, the corresponding LET is 78.3 MeV cm<sup>2</sup>·mg<sup>-1</sup>, and the range in silicon is about 99.3  $\mu$ m. This high ionization particle can deposit energy along its track and lead to MCU. It also poses challenges to SEE hardening on the SoC.

Apart from the detected SEE events, the step-increased currents are observed in the HIRFL irradiation test, as shown in Figure 5-7 (a) after 500 ms. Meanwhile, they are also visible in Figure 5-7 (b) after 600 ms. Following the step-increments, currents experienced sharp decrements in both Figures at last. Although the UART communication works well without disturbing, the power is cut off to protect the device when the current is abnormal. These abnormal currents are directly detected from the power supply interface instead of the OCM block. During irradiation, the entire chip is irradiated by broad beams. It is difficult to track the root causes. The following factors can be considered. First of all, SEL emerges in some separate circuits inner the chip. The SoC is manufactured with 28nm HKMG CMOS technology, and the existed parasitic structures can trigger SEL currents during high LET ions striking. The SEL current can behave as a step-increment current <sup>[173-174]</sup>. Secondly, the circuits spread over the entire chip experiencing diverse SEL vulnerability, the abnormal current may originate from the same circuit or different ones. The circuits, for instance, power managing circuit, input and output circuit, amplifier circuits, could be triggered to be SEL. Last but not least, different block circuits are supplied by different power rails inner the chip. It isn't easy to identify and record each circuit's current trails during irradiation in a complex system. For the SoCs applied in critical applications, such as aerospace, it can develop a specific current management unit to monitor the different power rails' real-time currents in the Soc. If possible, it can detect every supply power rails current all the time to prevent the abnormal current and avoid more serious consequences.

## 5.4 Summary

Different heavy ions irradiate Xilinx 28nm CMOS SoC at HI-13 and HIRFL, respectively, in China. SEE vulnerability for different ARM processor patterns and data accessing modes are evaluated. Detected results demonstrate that the SEU cross sections are more affected by the data access mode than the processor patterns. Additionally, the soft error rates under different modes are predicted, relying on the irradiation results. The Xilinx 28nm CMOS SoC memory suffers from  $2.48 \times 10^{-8}$  and  $1.35 \times 10^{-8}$  soft error per day per bit in static and dynamic

access modes, respectively. What's more, multiple MCU events and step-increment currents are detected in the high LET ion irradiation.

# 6 Single Event Effect Hardening by Multi-Layer Design

The above chapters evaluated SEE sensitivities of Xilinx 28nm CMOS SoC in different irradiation environments using various accelerators. Based on these efforts, SEE in multiblocks and different modes are explored. The investigated results indicate effective hardening measures are required and necessary for the SoC. This chapter proposes and applies a multilayer design to mitigate SEE in Xilinx 28nm CMOS SoC. At the same time, proton irradiation is performed to verify the hardening performance.

# 6.1 System-Level SEE Hardening

SEE hardening at the system level can be software- or hardware-based. At the same time, it can also rely on both hybrid techniques <sup>[175]</sup>. Nevertheless, it should be noticed that no hardening designs or systems can 100% mitigate all possible SEEs in the devices <sup>[176]</sup>. Redundancy, watchdog monitor and checkpoint rollback recovery are frequently adopted measures against SEE in SoCs.

#### 6.1.1 Redundancy

Redundancy is generally classified into four types: hardware, software, information, and time <sup>[177-178]</sup>. For example, the DMR and TMR are common hardware redundancy. The multiple version programming (MVP) is a typical software redundancy design. The error detecting and correcting codes are well-known information redundancy. And the temporal TMR is an effective time redundancy to harden transient errors. In a design, these redundancies can be applied individually or hybrid. More information about each kind of redundancy is described in Table 6-1.

| Redundancy  | Description                              | Example      |
|-------------|------------------------------------------|--------------|
| Hardware    | Hardware Replication                     | DMR, TMR     |
| Software    | Multi code snippets or software versions | MVP          |
| Information | Extra information added in raw data      | EDAC         |
| Time        | Re-executing the same programs           | Temporal TMR |

Table 6-1 Different kinds of redundancies

# 6.1.2 Watchdog

Watchdog monitor can reset the program executions and restart the system from an unknown or hang state to achieve system recovery <sup>[179-180]</sup>. For example, in Xilinx Zynq-7000 SoC, each Cortex-A9 processor has its own private 32-bit watchdog timer, and there is also a 24-bit system watchdog timer <sup>[99]</sup>.

#### 6.1.3 Checkpoint Rollback Recovery

The checkpoint provides snapshots of the system states <sup>[181]</sup>. Specifically, all information and values in the relevant registers are restored in checkpoint memory. Once an error is detected during the interval of two checkpoints, the program execution can return to the

previous checkpoint's restored state <sup>[86]</sup>. Figure 6-1 draws the schematic of the checkpoint rollback recovery technique. Unlike the watchdog monitor, it avoids relaunching the entire program from the beginning. Meanwhile, the overhead is low for checkpoint rollback recovery, compared with hardware or software redundancy.



Figure 6-1 Schematic of the checkpoint rollback recovery

For SEE hardening on Xilinx 28nm CMOS SoC, researchers proposed DCL and TCL implementations relying on the dual-ARM core system and these common techniques <sup>[114-115, 182]</sup>. Figures 6-2 and 6-3 draw the schematic of the proposed DCL and TCL implementations on the SoC, respectively.



Figure 6-3 Schematic of the proposed TCL in [115]

In Figure 6-2, two identical cores execute the same program simultaneously <sup>[182]</sup>. As an improvement and extension of Figure 6-2, another core is also implemented in the PL part in Figure 6-3. For Xilinx 28nm CMOS SoC, the critical flaw of the proposed DCL and TCL is the dual ARM cores are completely occupied while mitigating SEE. Unlike these efforts, a

multi-layer design to harden SEE on Xilinx 28nm CMOS SoC is proposed. In addition, proton irradiation is performed to examine its hardening efficiency.

# 6.2 Multi-Layer Hardening Design

For a COTS SoC, it is impossible to mitigate SEE through modifying hardware layout or architecture. However, if it is fully hardware resource duplications, as shown in Figure 6-2, the extra overhead concerning the CPUs and memories is 100%. Therefore, software implementation is more feasible.

In this work, a three-layer hardening implementation is designed. It includes the redundancy layer, watchdog layer, and AMP layer. Known to all, the redundancy and watchdog monitor are traditional hardening measures against SEE in SoC. And the main contribution is the AMP layer, which prevents both processors from fully occupied.

In Xilinx 28nm CMOS SoC, a dual-ARM core is embedded. It can work in the symmetric way as described in DCL or TCL, and it can also run in the asymmetric way as an AMP pattern. In the AMP pattern hardening implementation, it makes the slave core (Core1) dedicated to mitigating SEE to guarantee data correctness for the master core (Core0). This measure is the most significant difference compared with the traditional and others.

#### 6.2.1 Redundancy Layer

As described in 6.1.1, redundancy means replication, repetition operations, and others. It is a measure to mitigate SEU and SET using spatial or temporal redundancy in most cases. However, in this study, both spatial and temporal redundancies are employed. Specifically, in spatial redundancy, data in OCM are replicated in two different DDR memory spaces. Data are read out from OCM and other two separate addresses and compared by a majority voter during checking.

Besides spatial redundancy, it also takes advantage of temporal redundancy to guarantee data correctness in OCM. For a datum in OCM, the core reads three times consecutively in three cycles to decide a datum status. In this layer, temporal redundancy takes precedence over spatial redundancy. It means temporal redundancy is firstly used to determine data correctness. If the datum is deeded to be incorrect during the temporal redundancy examination, it does not enter the spatial redundancy check. This way saves time from detecting the corrupted data, because the redundant data are stored in DDR, and accessing them extends the read routine and time. For one datum, if it is directly reported as an incorrect one by temporal redundancy, it takes about 1  $\mu$ s. However, it is about 2  $\mu$ s employing temporal redundancy and spatial redundancy checking together. Then, for the data considered correct by temporal redundancy, it will step into spatial redundancy examination to eliminate misjudgment on the data corrupted before reading.

## 6.2.2 Watchdog Monitor Layer

For the SoC, SEE events can disturb the processor's proper running or function, leading to a program exception or stepping into an unknown state. As mentioned in 6.1.2, The watchdog monitor can restart the system from these unexpected statuses. Both cores run in the AMP pattern, and they can reset the watchdog timer.

#### 6.2.3 AMP Layer

Core0 is the master processor in the AMP pattern, and Core1 is the slave awakened by the master at the initialization stage. Then, Core1 starts instruction execution to detect and mitigate SEU in OCM, cooperating with the redundancy layer when recognizing the effective flag. Figure 6-4 displays the workflow, and more detail are described as follows.



Figure 6-4 Workflow of the AMP layer

OCM is used as data memory, and 32 KB out of the 256 KB OCM is tested. First of all, Core0 writes data check pattern: 0xA5A5A5A5, which can be used to investigate  $0\rightarrow 1$  and  $1\rightarrow 0$  upsets at the same time, to all the 32 KB memory space. And a flag variable is stored in another place outside the 32 KB range. The flag can be set by both cores alternately at the end of their examinations. Core0 checks whether the flag is 0xF0 to start its examination, which is set by Core1 when it checks over the data. Core1 begins operation when the flag is 0x0F, set by Core0 when it finishes the examination.

Then, when the 32 KB OCM writing is over by Core0 and the flag is set for the first time, Core1 launches its check. It copies OCM data to two DDR spaces and enters the redundancy layer. Core1 reads the OCM data consecutively in three cycles firstly. To guarantee data correctness in OCM, if Core1 detects one datum different from others in temporal redundancy examination, it will correct errors directly using redundant data. Otherwise, if the temporal redundancy check does not detect any incorrectness, then the spatial redundancy check will be performed. If the OCM datum differs from the two redundant copies in DDR, it is deemed a corrupted one. It will try to correct the corrupted data via copying the redundant ones. If all three copies are different, Core1 will keep the datum in OCM. During Core0 check, it will examine the data again through making XOR operation with the expected 0xA5A5A5A5. If it is indeed corrupted, the upset data and address will be reported by Core0.

During Core1 checking, Core0 is available for other workloads, such as logic or algorithm applications. It can be considered as the Core1 is dedicated to detecting SEU and SET in OCM. Compared with SEE hardening at a single processor system or DCL and TCL, this improves the efficiency and flexibility of the entire SoC system. In [114], [115], and [182], the proposed techniques occupy double even more resources during the executions.

Meanwhile, when each core finishes the examination, they also reload the watchdog besides setting the flag. The watchdog will be activated no matter which core encounters SEE causing hang or crash, and the system will be re-launched. The cooperation of these layers keeps the correctness of data in OCM.

To verify the performance of this multi-layer design, proton irradiation tests are performed. In 3.2, SEE in OCM is examined by 90 and 70 MeV proton irradiations. To compare with that, the OCM block is examined again during this irradiation test. Nevertheless, it should be noticed that the multi-layer design is available for all dual-core shared resources, even though this work tests the OCM block.

#### 6.3 Irradiation Tests

It's the same as described in 3.2, proton irradiation test is conducted at NICRA again. The difference is that OCM is tested without any mitigation measures in 3.2. In contrast, it adopted the proposed multi-layer SEE hardening design in this irradiation test. To facilitate comparison and analysis in the following descriptions, the proton irradiation in 3.2 is considered the first irradiation test. And The irradiation test in this chapter is called the second irradiation test.

#### 6.3.1 Test Setup

The test setup for the second irradiation test is similar to 3.2. The irradiation test facility locates in a shielding room, away from the main hall, more than 10 meters. And the monitor and the programmable power are placed in the main hall. The host computer and programmable power remotely connect the test board that is mounted on the facility holder. The host computer communicates with the device through a fiber USB cable. Once the program exception appears during irradiation, the particle beam is halted immediately. The programmable power supplied the test board through a cable during the irradiations, and it is also used to detect current abnormalities. The running messages are logged from the UART interface in real-time. SoC components run in nominal conditions for the two irradiation tests, such as processors, OCM, and other interfaces.

#### 6.3.2 Proton Beam

The proton beam is ejected from the accelerator, experiencing a series of process measures before hitting the DUT. They involve homogenization, energy adjustment, and collimation, and others. As described in 3.2, the beam spot size is  $1 \times 1$  to  $10 \text{ cm} \times 10 \text{ cm}$ . The adopted beam spot covered the entire SoC chip and the DDR memory regions during the

irradiation. Figure 6-5 is the photo of the endpoint of the second irradiation test.



Figure 6-5 Photo of the endpoint of the second irradiation test

The 90 and 70 MeV proton beams are used in the two irradiation tests at different times. Beam fluxes and fluences of the two irradiation tests are listed in Table 6-2. From the table, it can be seen the flux and fluence of the second irradiation test are smaller than that of the first irradiation test.

| Table 0-2 Beam nuxes and nucleos in the two intaliation tests |            |                                          |                                             |  |  |  |
|---------------------------------------------------------------|------------|------------------------------------------|---------------------------------------------|--|--|--|
| Test                                                          | Energy/MeV | $Flux/10^8 p \cdot cm^{-2} \cdot s^{-1}$ | Fluence/10 <sup>11</sup> p·cm <sup>-2</sup> |  |  |  |
| First irradiation test                                        | 90         | 1.30                                     | 1.00                                        |  |  |  |
| First infautation test                                        | 70         | 2.30                                     | 1.00                                        |  |  |  |
| C 1 : 1:                                                      | 90         | 0.28                                     | 0.50                                        |  |  |  |
| Second irradiation test                                       | 70         | 0.20                                     | 0.50                                        |  |  |  |

Table 6-2 Beam fluxes and fluences in the two irradiation tests

During the second irradiation test, the irradiation starts when the beam switches on, and the host computer displays the real-time information. The AMP dual-core, the temporal redundancy, spatial redundancy, and watchdog cooperate to process the detected SEE as introduced in 6.2.

# 6.4 Irradiation Results and Discussions

## 6.4.1 Irradiation Results

Both SEU and SEFI were detected in the 90 and 70 MeV irradiations without mitigation in the first irradiation tests. However, no SEU was observed in the second irradiation test when OCM adopted the proposed multi-layer design. SEFI events just emerged in the second 90 and 70 MeV proton irradiations. No abnormal currents were detected in both irradiation tests. The detail of SEU and SEFI events of both irradiation tests is presented in Table 6-3.

| Test                    | En argu/       |     |                 | SEU             |                 |                  | SEFI |
|-------------------------|----------------|-----|-----------------|-----------------|-----------------|------------------|------|
|                         | Energy/<br>MeV | SBU | 2-Cell<br>upset | 3-Cell<br>upset | 4-Cell<br>upset | >4-Cell<br>upset | /    |
| First irradiation test  | 90             | 102 | 27              | 11              | 1               | 2                | 7    |
|                         | 70             | 88  | 18              | 8               | 3               | 0                | 6    |
| Second irradiation test | 90             |     | 0               |                 | 6               |                  |      |
|                         | 70             |     |                 | U               |                 | 4                |      |

Table 6-3 Detected SEE events in both irradiation tests

#### 6.4.2 Results Analysis

During both tests, phenomena like system hang or output exception were regarded as SEFI. Table 6-4 shows the detail of the SEFI events in both irradiations. The table shows that the detected SEFI events were solved by manual power-cycle in the first irradiation test. In contrast, the watchdog recovered them in the second irradiation test.

| Test                    | Energy/MeV | SEFI           | Number | Recovery method |
|-------------------------|------------|----------------|--------|-----------------|
| First irradiation       | 90         | Hang           | 7      | Repower         |
| test                    | 70         | Hang           | 6      | Repower         |
| C 1                     | 00         | Hang           | 5      | Watchdog        |
| Second irradiation test | 90         | Output garbled | 1      | Watchdog        |
| irradiation test        | 70         | Hang           | 4      | Watchdog        |

Table 6-4 The SEFI details of both irradiation test

For the observed SEFI events in the second irradiation test, as outlined in Table 6-4, the majority appeared as the hang, characterized by messages stopping output and program execution halt. The one Output garbled manifests continuously unknown messages. The SEFI cross sections of the two irradiation tests are displayed in Figure 6-6. In the first irradiation test, the SEFI cross sections are  $6 \times 10^{-11}$  and  $7 \times 10^{-11}$  cm<sup>2</sup> for 70 and 90 MeV irradiations, respectively. While for the second irradiation test, they are  $8 \times 10^{-11}$  and  $1.2 \times 10^{-10}$  cm<sup>2</sup> for 70 and 90 MeV irradiations, respectively. Compared with the first irradiation test, the ratios are 1.3 and 1.7 for 70 and 90 MeV irradiations. That is because two ARM cores are called in the AMP pattern, using more registers and other resources. It also illustrates the AMP pattern suffers more SEFI events.



Figure 6-6 SEFI cross section of the two irradiation tests

Additionally, since the proton fluxes for the two irradiation tests are different, the influence of flux is also checked in the second irradiation test. Two sets of different fluxes' proton irradiations are cumulated to the same fluence during the check, as presented in Table 6-5. Finally, the detected SEE cross sections are displayed in Figure 6-7. It can be seen the SEE cross sections are the same for the two checks. It evidences flux does not influence the SEE cross sections.

| Check ID | $Flux/10^8 p \cdot cm^{-2} \cdot s^{-1}$ | Fluence/ $10^{11}$ p·cm <sup>-2</sup> |
|----------|------------------------------------------|---------------------------------------|
| А        | 0.2                                      | 0.5                                   |
| В        | 0.1                                      | 0.5                                   |
|          |                                          |                                       |

 Table 6-5
 Parameters in the proton flux check

Figure 6-7 SEE cross section in the flux check

Two ARM processors are utilized in this multi-layer design, expanding the number of utilized registers. SEE in registers can induce SEFI, more register utilized, higher SEFI probability suffers. The system can be recovered by watchdog automatically, unlike manual power-cycle in the first irradiation. Register refreshing can be adopted in this multi-layer design to improve system resilience against the SEFI events.

Multi factors can cause SEFI, although it appears as hang or output garbled. For example, it may be caused by data corruption in processor registers or interface registers. It is difficult to predict them immediately during irradiation, and the watchdog is a solution to solve. As mentioned above, the watchdog is reloaded by both ARM cores interactively in the multi-layer design. It processes the SEFI events in time without repowering the test board.

The irradiation tests illustrate the proposed multi-layer design can mitigate SEU events in Xilinx 28nm CMOS SoC. For the redundancy layer, temporal redundancy was first used to report SEU rather than directly using spatial redundancy. This operation improves efficiency in upset detection. Since the DDR is outside of the chip with a longer read routine. To examine the improvement as a whole, two cases' cycles are compared. In the first case, 32 KB OCM was read three times in three cycles to determine whether an SEU occurs, and the time cycle is 23.88 ms. In comparison, it is 33.30 ms using the spatial TMR directly in the second case. It indicates the cycle can be shortened by 28.3%. As the memory capacity increases, it can be speculated that this difference is more prominent. Another consideration of introducing temporal redundancy is detecting and processing SET in OCM effectively, avoiding SET possible propagation from OCM to other blocks in a certain.

The ongoing applications will be affected or halted for a sole processor system if the processor needs to perform the error detection and recovery operations. However, it can be avoided in the AMP pattern. The dedicated slave ARM core for the AMP layer makes the master processor run the ongoing applications without being disturbed. Moreover, for the 32

KB OCM, if a single processor examines an SEU, the cycle is about 15.6 ms, while for the AMP pattern, the period is about 8.3 ms. The time is shortened by 46.7%. Although we just examined the OCM block, this multi-layer design is also applicable to other shared resources. For the hardening required SoC, it can develop several specific multi-layer designs for multiple critical blocks, such as the OCM, Cache, and even some important I/O ports.

### 6.5 Summary

The dual-ARM core SoC can run in the AMP pattern. A multi-layer design based on the AMP pattern cooperating with redundancy and watchdog monitor is applied to mitigate SEE in Xilinx 28nm CMOS SoC. Two proton irradiation tests were performed to examine the performance of the multi-layer design. The test results demonstrate that the multi-layer design can mitigate SEU effectively.

# 7 SEM-based FI and FTA on Xilinx 16nm FinFET MPSoC

SEM IP provided by Xilinx is an effective way to make FI on Xilinx Ultrascale+ MPSoC. As introduced in 2.2.1, several image processing applications are the tested benchmarks, and the application algorithm involves image Histogram, Stretch, and Sobel processing. The SEM and algorithm application subsystems are two individual subsystems in FI design. The fault tree analysis method is also employed to further investigate and analyze the subsystem contributions, combining with the obtained FI results.

## 7.1 Overall Framework of SEM-based FI

SEM IP can make FI and mitigate SEE on an FPGA design. It can be introduced into the target design as an individual subsystem. Meanwhile, the FTA method is employed, which can point out the individual subsystems' influence on the MPSoC reliability quantitatively. Thus, this implementation helps discuss SEE contributions from SEM and application subsystems separately.

In this chapter, three image processing algorithm applications are designed as test benchmarks. SEM IP is introduced in each block design to execute FI. According to the obtained FI results, the FTA is performed for different algorithms applications one by one. The overall framework is depicted in Figure 7-1. It can be seen five key parts are composed of the SEM-based FI.

- a) Algorithm design: the customized IP for each image processing algorithm is generated.
- b) Block design: the customized IP and the SEM IP module are added into block design. After processing, such as synthesis and implementation, bitstream and essential bit files are generated.
- c) FI script creation: randomly extracts 10000 essential bits and converts them into LFA format.
- d) FI execution: 10000 times FIs are executed for each algorithm application.
- e) FTA: SEE contributions from different subsystems and events are quantitatively analyzed.



Figure 7-1 Schematic of the overall design of SEM-based FI

More detail about key parts is described in the following sections further.

### 7.2 FI Design

The DUT is the Xilinx 16nm FinFET MPSoC. Although some fault injections or radiation tests have been executed on designs combining with SEM IP since the DUT has been released, they primarily focused on SEE examination <sup>[89, 183]</sup>. This section makes FI on the Xilinx 16nm FinFET MPSoC depending on SEM IP, but the FTA method is also utilized to analyze the impact of SEE. It is also the key difference of this study compared with others.

#### 7.2.1 Test Design in SEM-based FI

The image processing algorithm applications on advanced SoCs are broadly developed and applied. In which, the following three algorithm applications are reputed three basic ones, including image histogram extraction (Histogram), original image stretch (Stretch), and the image edge detection with Sobel algorithm (Sobel) <sup>[184-185]</sup>.

In this study, the above three algorithm applications are designed and tested. Each algorithm is briefly introduced as follows<sup>(1)</sup>.

Histogram: count the number of each pixel value

Stretch: adjust the value of each pixel with (7-1)

Sobel: the adopted Sobel operator is (7-2)

$$P_s = 255 \frac{P_i - P_{min}}{P_{max} - P_{min}} \tag{7-1}$$

where  $P_s$ --the stretched pixel value;  $P_i$ --the pixel value to be processed;  $P_{min}$ --the minimum value in all raw pixel values;  $P_{max}$ --the maximum pixel value in all raw ones

$$\begin{pmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{pmatrix}$$
(7-2)

The image to be processed is a 320×240 pixel 2D Lena grayscale image (a total of 76800pixel values). First of all, the customized IPs for each algorithm are designed in Vivado HLS 2019.2. Figures 7-2 (a) to (c) are the generated customized IPs for Histogram, Stretch, and Sobel, respectively. Afterward, these IPs are added in corresponding block designs in Vivado 2019.2. In addition, the SEM module, including SEM IP and UART blocks, is added in each block design. The simplified diagram for the design is presented in Figure 7-3. Finally, fault injections are performed for the algorithms one by one.



Figure 7-2 Generated customized IPs for each algorithm

<sup>&</sup>lt;sup>(1)</sup> The algorithm design also referred to the YouTube channel- "The Development Channel," which introduced the basic image processing algorithm implementations on Xilinx Zynq-7000 SoC.

In Figure 7-3, the SEM and application are two subsystems. And the SEM subsystem is the same for each algorithm. The separate UART interfaces provide the communication for SEM and application subsystems, respectively. Thus, the separate terminals for SEM and applications are also designed.



Figure 7-3 Simplified block diagram for the design

## 7.2.2 FI and Outcome Terminal Design

The terminals are designed in Python for FI and Outcome, respectively, displaying the real-time messages about SEM and application subsystems. Figure 7-4 depicts the FI terminal, and Figure 7-5 presents the Outcome terminal. It can be seen that the FI terminal is composed of four areas.



Figure 7-4 The FI terminal

- a) Port area: setting the port number, baud rate, data width, parity, and stop bits. It also controls the connection and closing of the UART. It can be viewed, the UART port is COM10, the baud rate is 115200 b/s, the data with is 8 bits, no parity bit, and the stop bit is 1 bit.
- b) Send area: send instructions to UART
- c) FI area: it can create the injection script, start the location of injection. What's more important is that it can reset the initial injection position according to the aborted ones.
- d) Receive area: the SEM injection information is updated in real-time in this area.

| Sutcome      |                                                                                             |                                                                                                          | – D ×                                                                                                     |
|--------------|---------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------|
|              | Send area                                                                                   |                                                                                                          |                                                                                                           |
| t update     | Send                                                                                        | data                                                                                                     | Clear send                                                                                                |
| en serial    |                                                                                             |                                                                                                          |                                                                                                           |
| op run       |                                                                                             |                                                                                                          |                                                                                                           |
|              |                                                                                             |                                                                                                          |                                                                                                           |
|              |                                                                                             |                                                                                                          |                                                                                                           |
| COM9         | -                                                                                           |                                                                                                          |                                                                                                           |
| 115200       | -                                                                                           |                                                                                                          |                                                                                                           |
| 8            | -                                                                                           |                                                                                                          |                                                                                                           |
| N            | -                                                                                           |                                                                                                          |                                                                                                           |
| 1            | -                                                                                           |                                                                                                          |                                                                                                           |
|              |                                                                                             | c                                                                                                        | lear receive                                                                                              |
| 1[12] Raw: 7 | ', PL: 3                                                                                    |                                                                                                          |                                                                                                           |
|              | t update<br>en serial<br>op run<br>se serial<br>r uart log<br>COM9<br>115200<br>8<br>N<br>1 | t update Send area<br>se serial op run<br>se serial ruart log<br>COM9 -<br>115200 -<br>8 -<br>N -<br>1 - | seud area<br>Seud area<br>Send data<br>Se serial<br>r uart log<br>COM9 -<br>115200 -<br>8 -<br>N -<br>1 - |

Figure 7-5 The Outcome terminal

At the same time, it can be viewed that the Outcome terminal can be considered as a part of the FI terminal without the FI-ebc area.

## 7.3 FI Implementation

As described in 2.2.2, SEM IP has six modes. And in this study, the mitigation and testing mode is selected, and the FI and error detection and correction operations can be executed in this mode. These operations are achieved by transferring different commands from the FI UART. And the mainly utilized commands are listed in Table 7-1 <sup>[124]</sup>. The status is shifted to Idle status for each injection by entering 'I' from the SEM UART to make fault injection in one frame. Then it shifts to Observation status by entering 'O', which can correct the injected error. After that, the 'I' command is sent to SEM UART again to execute a new injection. The two commands alternate to achieve 10000 injections in each algorithm application test.

| SEM subsystem UART commands |
|-----------------------------|
| Function                    |
| Enter Observation           |
| Enter Idle                  |
| Enter Detect only           |
| Enter Diagnostic Scan       |
|                             |

 Table 7-1
 SEM subsystem UART commands

As depicted in Figure 7-1, the bitstream and essential bit files are generated at last in the block design parts. And the generated EBD file is the source file of creating the FI script. '1' in the EBD file stands for the essential bit. Because different algorithm application designs enjoy different resource utilization, the number of the total essential bits for them are also different. In specific, Table 7-2 shows some available resources in the PL parts of Xilinx Ultrascale+ MPSoC. Table 7-3 lists the main resource utilization, such as look up table (LUT), flip flop (FF), input and output (IO) ports, global clock buffer (BUFG) and Table 7-4 summarizes the number of the total essential bits of each algorithm application design.

|                                                                              |               |         |            |         | 1         |         |         |         |
|------------------------------------------------------------------------------|---------------|---------|------------|---------|-----------|---------|---------|---------|
| Resource                                                                     | LUT           | LUTRAN  | 1 F        | F I     | BRAM      | DSP     | IO      | BUFG    |
| Available                                                                    | 70560         | 28800   | 141        | 120     | 216       | 360     | 180     | 196     |
| Table 7-3         Resource utilization of each algorithm design              |               |         |            |         |           |         |         |         |
| Resource                                                                     | Algorithm     | LUT     | LUT<br>RAM | FF      | BRAM      | DSP     | ΙΟ      | BUFG    |
|                                                                              | Histogram     | 4714    | 433        | 5731    | 21        | 1       | 4       | 2       |
|                                                                              | nistogram     | (6.68%) | (1.50%)    | (4.06%) | ) (9.72%) | (0.28%) | (2.22%) | (1.02%) |
| Utilization                                                                  | Stretch       | 6320    | 927        | 8498    | 9         | 4       | 4       | 2       |
| UIIIZation                                                                   | Suetch        | (8.96%) | (3.22%)    | (6.02%) | ) (4.17%) | (1.11%) | (2.22%) | (1.02%) |
|                                                                              | C - 1 - 1     | 5754    | 893        | 8203    | 11.05     | 15      | 4       | 2       |
|                                                                              | Sobel         | (8.15%) | (3.10%)    | (5.81%) | ) (5.32%) | (4.17%) | (2.22%) | (1.02%) |
| Table 7-4         Essential bit length in EBD files of each algorithm design |               |         |            |         |           |         |         |         |
|                                                                              | Algorithm     |         | Histo      | ogram   | S         | tretch  | So      | obel    |
| #                                                                            | Essential bit |         | 205        | 8461    | 26        | 502158  | 251     | 2355    |

Table 7-2 Some available resources in the PL part of Xilinx Ultrascale+ MPSoC

Table 7-4 shows that the essential bit length for the algorithm designs is more than 2000000 bits. If FI is performed on all of these bits, it's rather time-consuming. Therefore, 10000 essential bits are randomly extracted from each essential bit file. For every extracted essential bit, it's converted into the LFA format as presented in Table 2-3 firstly since the SEM fault injection is injected in a frame. For example, Figure 7-6 is a small part of the converted essential bit concerning the Histogram.

| 1  | Ν | C000000010B |
|----|---|-------------|
| 2  | Ν | C00000055B  |
| 3  | N | C00000010F5 |
| 4  | Ν | C0000001365 |
| 5  | Ν | C00000013E5 |
| 6  | N | C000000150B |
| 7  | N | C00000027E5 |
| 8  | Ν | C00000338D  |
| 9  | N | C0000033BD  |
| 10 | Ν | C00000364D  |
| 11 | N | C000003AD7  |

Figure 7-6 A small section snapshot of the converted essential bits of Histogram

Then, for each algorithm application design, the fault is injected one by one during the test program running. The software program runs after launched in Joint Test Action Group (JTAG) mode, and it runs until all extracted 10000 bits are injected over. For an injected fault, if it leads to the UART halting, the FI is suspended no matter whether it is the FI or Outcome UART's stop. And the corresponding frame's position is the next starting injection position. The workflow of each algorithm design's FI operation is drawn in Figure 7-7.



Figure 7-7 Workflow of each algorithm design's FI operation

When power is supplied for the Xilinx 16nm FinFET MPSoC, the default PS boot process is executed through PCAP. After that, although this process is finished, the configuration logic interface is still the PCAP. However, the SEM IP operations must be executed over the ICAP interface. This setting is achieved by clearing the pcap\_pr bit of the pcap\_ctrl register (the bit[0] at register address 0xFFCA3008). Moreover, the SEM controller icap\_grant port must be connected and set, which enables the controller initialization operations.

Each algorithm application test program is mainly composed of three segments. To be more specific, the first is about SEM controller register settings. The second segment is the software-produced golden results. These first two program segments are executed once, respectively. And the last is the algorithm application program execution on hardware, which is continuously executed. At the end of each execution, the generated results are compared with the golden ones to detect whether the injected faults induce errors.

Furthermore, the signatures are also inserted in the algorithm program. Figure 7-8 displays a small section snapshot of these signatures. These help to determine the exact locations of Outcome UART halt errors. As depicted in Figure 7-8, if the Outcome terminal output stops at '&&,' that illustrates the injected fault leads to DMA error. And if it stops at '##,' that means the injected fault causes customized IP module failure <sup>[186]</sup>.

Figure 7-8 Snapshot of a part of signatures

### 7.4 Detected FI Results

Although the algorithm applications are different, the types of detected results are the same for Histogram, Stretch, and Sobel designs. The results include four types: normal, silent data corruption (SDC), Outcome, and FI terminal hang (OTH and FITH), as summarized in Table 7-5. It's clear the last three types of results are errors. The SDC error means the detected results are different from the golden ones. The OTH error reveals the application subsystem UART stops outputting, and the FITH error manifests the FI operation can't be continued anymore. Meanwhile, it demonstrates that the corresponding injected essential bits are critical since they induce errors in the tested designs.

| Detected Result              | Description                                                    |
|------------------------------|----------------------------------------------------------------|
| Normal                       | The application and SEM subsystem normally run without error   |
| Silent data corruption (SDC) | The calculated results are different from the golden ones      |
| Outcome terminal hang (OTH)  | Outcome terminal stops outputting messages                     |
| FI terminal hang (FITH)      | SEM subsystem fails and cannot run the fault injection anymore |

Table 7-5 Obtained results during fault injection of each algorithm

The detected error numbers of each algorithm application test are presented in Table 7-6. It can be seen the SDC error dominates the results. Figure 7-9 displays the SDC ratio of each algorithm application test. The SDC ratios are 55.14%, 57.44%, and 54.98% for Histogram, Stretch, and Sobel algorithm tests. Nevertheless, they can be corrected by SEM IP cooperating with the ECC circuit and disappearing before the next injection.

|           | Table 7-0 Detected erro | I numbers of each | raigorium |      |
|-----------|-------------------------|-------------------|-----------|------|
| Algorithm | Total                   | SDC               | OTH       | FITH |
| Histogram | 214                     | 118               | 80        | 16   |
| Stretch   | 289                     | 166               | 102       | 21   |
| Sobel     | 271                     | 149               | 100       | 22   |

Table 7-6 Detected error numbers of each algorithm



Figure 7-9 SDC ratio in detected errors of each algorithm test

Another error type that should be noticed is the OTH error. Figure 7-10 shows the OTH error percentages of each algorithm application test. They are 37.38%, 35.29%, and 36.90%, respectively. Moreover, occurrence locations of OTH errors can be confirmed relying on the inserted signatures for each algorithm application test. Table 7-7 demonstrates that the DMA and customized IP failures are two key roots of OTH errors. Unlike the SDC errors, the OTH remains even though SEM IP recovers the injected faults. The power or recycling is necessary for these OTH errors.



Figure 7-10 OTH percentages in detected errors of each algorithm test

|  | Table 7-7 | Roots of OTH errors | for each algorithm test |
|--|-----------|---------------------|-------------------------|
|--|-----------|---------------------|-------------------------|

| Algorithm | Total | DMA IP failure | Customized IP failure |
|-----------|-------|----------------|-----------------------|
| Histogram | 80    | 65             | 15                    |
| Stretch   | 102   | 89             | 13                    |
| Sobel     | 100   | 83             | 17                    |

The SDC and OTH errors are about applications subsystems impacted by injected faults, but the FITH errors signify the injected faults can lead to the SEM subsystem failure. Figure 7-11 shows the ratio of FITH error in each algorithm application test. It signifies that the ratios of FITH errors are less than 10% compared with application subsystems' errors. These errors, however, cannot be ignored because the SEM IP cannot work anymore. The fault injection is not only forced to stop, the upset bits in configuration memory also cannot be recovered while these errors emerge.



Figure 7-11 FITH percentages in detected errors of each algorithm test

The obtained errors involve SEM and application subsystems, even though they are two separate subsystems. The error illustrates SEU in configuration memory can induce data error and malfunctions in the Xilinx 16nm FinFET MPSoC. In order to comprehensively analyze the obtained results further, the FTA method is employed to discuss the detected errors quantitatively.

## 7.5 FTA on the Detected Errors

FTA method is a quantitative analysis method. It can assess the corrupted components and their contributions to the failure of a system. The constructed fault trees can testify components or parts' criticality to MPSoC reliabilities <sup>[187]</sup>. Especially, there are two independent subsystems in these designs.

## 7.5.1 Events in Fault Trees

To build the fault tree, it needs to determine the various events and probabilities according to the research object or system, such as the top event (T), basic event (X), and intermediate event (S). The fault trees are constructed for the three tests, respectively. System failures are the top events in these fault trees, the detected errors in different components are the basic events, and others between the two events are intermediate. The structure is the same for each fault tree, except some event names and probabilities are different. Table 7-8 lists the events in these fault trees. It can be viewed the constructed fault trees are composed of one top event, two intermediate events, and four basic events. Particularly, system failure is the top event. And the application subsystem and customized IP failures are two intermediate events. The SEM subsystem failure, the DMA IP OTH, the customized IP OTH, and customized IP SDC

|        | Table 7-8         The events in each fault tree |  |
|--------|-------------------------------------------------|--|
| Symbol | Failure event                                   |  |
| Т      | System failure                                  |  |
| S1     | Application subsystem failure                   |  |
| S11    | Customized IP Failure                           |  |
| X1     | SEM subsystem failure                           |  |
| X2     | DMA IP OTH                                      |  |
| X3     | Customized IP OTH                               |  |
| X4     | Customized IP SDC                               |  |

are four basic events <sup>[186]</sup>.

#### 7.5.2 Failure Rates of Events

The failure rate of basic events for each fault tree is produced by the SEM-based fault injection results in 7.4. The soft error sensitivity (SES) is an important metric to measure targets' vulnerable probabilities in fault injection <sup>[186]</sup>. It can be calculated with (7-3). Table 7-9 summarizes the SES values of events in three fault trees, respectively.

$$SES = \frac{n_e}{N_i} \tag{7-3}$$

where SES--soft error sensitivity;  $n_e$ --number of detected errors;  $N_i$ --the number of injected faults.

| Symbol — |           | SES     |        |
|----------|-----------|---------|--------|
|          | Histogram | Stretch | Sobel  |
| Т        | 0.0214    | 0.0289  | 0.0271 |
| S1       | 0.0198    | 0.0268  | 0.0249 |
| S11      | 0.0133    | 0.0179  | 0.0166 |
| X1       | 0.0016    | 0.0021  | 0.0022 |
| X2       | 0.0065    | 0.0089  | 0.0083 |
| X3       | 0.0015    | 0.0013  | 0.0017 |
| X4       | 0.0118    | 0.0166  | 0.0149 |

Table 7-9SES values in each fault tree

The constructed fault trees for the Histogram, Stretch, and Sobel FIs are presented in Figures 7-11, 7-12, and 7-13, respectively. It can be viewed the fault trees mainly contain two parts. One is the SEM subsystem failure without further branches, and the other is the application subsystem failure with more branches. In each fault tree, the application subsystem branches make about 93% contributions to all failures. Even though the SDC branch makes more SES contributions, more attention should be paid to the OTH branches because the SEM IP cannot recover the errors in these branches.

In each fault tree, the OTH branches, including the customized IP OTH and the DMA IP OTH ones, contribute 40.40%, 38.06%, and 40.16% SES values in the application subsystem parts for Histogram, Stretch, and Sobel FIs. Further, DMA IP OTH branches account for 81.25%, 87.25%, and 83.00% SES values in the application subsystem parts for them, respectively.

This study illustrates that the SEM fault injection cooperating with FTA can

quantitatively analyze the SEE impacts on the Xilinx 16nm FinFET MPSoC. Meanwhile, it declares that more mitigation measures should be taken, especially for these designs involving DMA IP to transmit large amounts of data.



Figure 7-11 The built tree for Histogram FI



Figure 7-12 The built tree for Stretch FI



Figure 7-13 The built tree for Sobel FI

# 7.6 Summary

SEM-based fault injection is performed on Xilinx 16nm FinFET MPSoC. SEE in three algorithm application benchmarks, involving Histogram, Stretch and Sobel, are examined one by one. 10000 essential bits are extracted as the fault injection script from each essential bit file during the fault injection. And then, they are injected into the frames in CRAM. Finally, three kinds of errors are detected in each fault injection campaign. The error includes silent data corruption, Outcome, and FI terminal hangs. According to the obtained results, fault trees are constructed for each test, and the contribution from branches are analyzed.

# 8 DPR-based FI and FMEA on Xilinx 16nm FinFET MPSoC

DPR is a unique feature of SRAM-based all programmable MPSoC. DPR-based fault injection on Xilinx 16nm FinFET MPSoC is more convenient because it can inject in any bit or word without restriction. More importantly, extra resource utilization and the UART interface are not required. DPR implementation is conducted in this chapter, and two reconfiguration modules (RMs) implement image processing algorithms: Sobel and Gaussian edge detection. Fault injection is performed on the full and partial bitstreams, respectively. At last, the FMEA method is employed to analyze the obtained injection results quantitatively, the severity of different components and soft errors are discussed.

## 8.1 DPR-based FI Overall Structure

To perform the fault injection of DPR on Xilinx 16nm FinFET MPSoC, it needs to implement the DPR design first of all. After that, the FI execution can be operated. Figure 8-1 describes the overall structure of DPR-based FI in this section. It is mainly composed of four parts: DPR design, FI script creation, FI execution, and FMEA.



Figure 8-1 The overall structure of DPR-based FI

- a) DPR design: the DPR design is operated in tool command language in Vivado 2019.2. The two RMs are implemented, respectively. At the same time, the full bitstream (FB) (accompanying essential bit files) and partial bitstream (PB) (without specific essential bit files) are generated.
- b) FI script creation: the FI script includes two parts: the word offsets and the bit offsets. For FB, the FI script is generated from the EBD file. In comparison, the FI script is produced from the PBs directly.
- c) FI execution: 100000 essential bits are extracted from the EBD file of FB, and all '1' in two PBs are injected
- d) FMEA: according to the investigated results, the FMEA method is applied further to analyze the severity of different components and errors.

More details of DPR-based FI are described in the following sessions.

### 8.2 DPR Design

The DPR design is implemented on the Xilinx 16nm FinFET MPSoC. A 2-D grayscale image of Lena with  $512 \times 512$  pixels is processed in Sobel and Gaussian edge detection algorithms in two RMs<sup>①</sup> <sup>[189-190]</sup>. The Sobel operator in horizontal and vertical directions are presented in (8-1) and (8-2). The 2-D Gaussian operation is executed as two 1-D operations in X and Y directions, respectively, with the same operator as shown in (8-3) <sup>[189]</sup>.

$$\begin{pmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{pmatrix}$$
(8-1)

$$\begin{pmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{pmatrix}$$
 (8-2)





The procedure of DPR design is summarized in Figure 8-2 [127, 189]. Firstly, the filter IP

<sup>&</sup>lt;sup>®</sup> The link: http://ivpcl.unm.edu/ivpclpages/Research/drastic/PRWebPage/PR\_Sub.php, introduced DPR implementation on Xilinx Zedboard. It provided reference for the design.

that only includes the module name and necessary ports is generated as a black box. The module name and ports must be the same for the RMs. At the same time, design checkpoints (DCPs) for the two RMs are generated. Then, the Sobel configuration module is loaded as the first RM, the pblock region is drawn, and the design rule check (DRC) is executed. After that, the first configuration is implemented with running instructions, such as opt\_design, place\_design, and route\_design. Figure 8-3 shows the implemented Sobel configuration. Then, the Static module, which does not include the Sobel implementation, is locked. And the Gaussian configuration is achieved by loading its DCP into the locked static design. Then, implementation is executed with the same instructions as used in the Sobel configuration. At last, following the pr\_verify, the bitstream-related files are generated.



Figure 8-3 Implemented Sobel configuration

The DPR is reached relying on entering commands in the Tcl Console in Vivado 2019.2. For example, Figure 8-4 is the snapshot of the Sobel configuration implementation. To generate the essential bit file in DPR, the command as (8-4) is entered in the Tcl Console. The DPR-based FI is achieved via modifying the bit information in the bitstream. In order to load the fault injected bitstream into CRAM, it requires disabling the CRC, and the command in (8-5) achieves this setting. It should be noticed that both commands are executed before generating bitstreams.



Figure 8-4 Snapshot for implementation Sobel configuration

8 DPR-based FI and FMEA on Xilinx Ultrascale+ MPSoC

set\_property bitstream.seu.essentialbits yes [current\_design] (8-4)

set\_property bistream.general.crc disable [current\_design] (8-5)

Finally, the FB and PBs are created. Figure 8-5 displays the snapshot of mainly generated bitstream-related files, and Table 8-1 shows more detail about some that will be used in the software program design. Moreover, key resource utilization for the two RMs implementation is listed in Table 8-2.

| 2020/11/27 9:21 PM  | BIT文件                                                                                                                             | 5,439 KB                                                                                                                                          |
|---------------------|-----------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
| 2020/11/27 9:21 PM  | <b>EBC</b> 文件                                                                                                                     | 32,039 KB                                                                                                                                         |
| 2020/11/27 9:21 PM  | EBD 文件                                                                                                                            | 32,039 KB                                                                                                                                         |
| 2020/11/13 5:11 PM  | BIT 文件                                                                                                                            | 5,439 KB                                                                                                                                          |
| 2020/12/10 11:30 AM | BIT 文件                                                                                                                            | 5,439 KB                                                                                                                                          |
| 2020/11/13 5:11 PM  | <b>BIN</b> 文件                                                                                                                     | 466 KB                                                                                                                                            |
| 2020/11/13 5:13 PM  | <b>BIN</b> 文件                                                                                                                     | 466 KB                                                                                                                                            |
| 2020/11/13 5:14 PM  | BIN 文件                                                                                                                            | 5,439 KB                                                                                                                                          |
|                     | 2020/11/27 9:21 PM<br>2020/11/27 9:21 PM<br>2020/11/13 5:11 PM<br>2020/12/10 11:30 AM<br>2020/11/13 5:11 PM<br>2020/11/13 5:13 PM | 2020/11/27 9:21 PMEBC 文件2020/11/27 9:21 PMEBD 文件2020/11/13 5:11 PMBIT 文件2020/12/10 11:30 AMBIT 文件2020/11/13 5:11 PMBIN 文件2020/11/13 5:13 PMBIN 文件 |

Figure 8-5 Snapshot of generated bitstream-related files in DPR

| Name                | Description                | Corresponding module |            |
|---------------------|----------------------------|----------------------|------------|
| config_sobel.bit    | Sobel full bitstream       | Static+Sobel         |            |
| sobel.bin           | Sobel partial bitstream    | Sobel                |            |
| config_gaussian.bit | Gaussian full bitstream    | Static+Gaussian      |            |
| gaussian.bin        | Gaussian partial bitstream | Gaussian             |            |
| blank.bit           | Blank full bitstream       | Static               |            |
| static.bin          | Blank full bitstream       | Static               |            |
| blank.ebd           | ebd file for static.bin    | Static               |            |
|                     | Table 8-2 Resource         | utilization in DPR   |            |
| D                   |                            | Utilization          |            |
| Resource            | Sobel                      |                      | Gaussian   |
| CLB                 | 256(2.90                   | %)                   | 239(2.71%) |
| LUT                 | 1017(1.44%)                |                      | 937(1.33%) |
| LUTRAM              | 82(0.28%)                  |                      | 82(0.28%)  |
| CARRY8              | 14(0.169                   | %)                   | 5(0.06%)   |
| DSP                 | /                          |                      | 5(1.39%)   |
| BUFG                | 1(0.51%                    | 6)                   | 1(0.51%)   |

Table 8-1 Description of some generated bitstream-related files in DPR

The fault injection is mainly executed on static.bin, sobel.bin, and gaussian.bin. Table 8-1 shows that they correspond to the Static, Sobel, and Gaussian reconfiguration modules, respectively.

## 8.3 FI in DPR

The diagram of DPR-based FI is drawn in Figure 8-6. In Xilinx 16nm FinFET MPSoC, relying on the provided functions in xil\_library, bitstreams are flexibly loaded into CRAM over the PCAP. A terminal for DPR-based FI is developed in Python, as presented in Figure 8-7, which communicates with the MPSoC via UART. In which, clicking the different items in the FI-DPR area can achieve FI in FP or PB.



Figure 8-6 Diagram of DPR-based FI

| •           |                                               | Sead Area                                                                 |                                                                     | FI-DPR                                                       |
|-------------|-----------------------------------------------|---------------------------------------------------------------------------|---------------------------------------------------------------------|--------------------------------------------------------------|
| ort Update  |                                               | Send Data                                                                 | Clear Seud                                                          |                                                              |
| Open Sectal |                                               |                                                                           |                                                                     | Sobel                                                        |
| Stop Rus    |                                               |                                                                           |                                                                     | Germin                                                       |
| lose Serial |                                               |                                                                           |                                                                     |                                                              |
| COM4        | 7                                             |                                                                           |                                                                     | Static                                                       |
| 115200      | 1.00                                          |                                                                           |                                                                     | From 1001                                                    |
|             | -                                             |                                                                           |                                                                     | From 1001                                                    |
|             |                                               |                                                                           |                                                                     | Fact lajert                                                  |
| 1           |                                               |                                                                           |                                                                     |                                                              |
|             |                                               |                                                                           |                                                                     | Clear Receive                                                |
|             |                                               |                                                                           |                                                                     |                                                              |
|             |                                               |                                                                           |                                                                     |                                                              |
|             |                                               |                                                                           |                                                                     |                                                              |
|             |                                               |                                                                           |                                                                     |                                                              |
|             |                                               |                                                                           |                                                                     |                                                              |
|             |                                               |                                                                           |                                                                     |                                                              |
|             |                                               |                                                                           |                                                                     |                                                              |
|             | pro Sectal<br>Stop Ran<br>Jose Sectal<br>COM4 | pre Secial<br>Stop Res<br>Tese Secial<br>COM4 -<br>115200 -<br>8 -<br>N - | tere Sectal<br>Stop Ran<br>Jose Sectal<br>COM4 =<br>115200 =<br>N = | see Serial<br>See Serial<br>CO34 -<br>132200 -<br>8 -<br>N - |

Figure 8-7 The terminal of DPR-based FI

Since it only generated the essential bit files for the Static module (static.bin), there are no specific essential bit files for two RMs. The fault injection is conducted in two stages, as described in Figure 8-8.



Figure 8-8 Two stages of fault injection in DPR

In the 1<sup>st</sup> stage, fault injection in the Static module is executed. The corresponding bitstream is the blank FB (static.bin). Because two RMs share the Static module, FI runs once only. And in the 2<sup>nd</sup> stage, fault injection on Sobel and Gaussian modules (sobel.bin and gaussian.bin) are operated, respectively. For FI in the blank full bitstream (FI-FB) at the 1<sup>st</sup> stage, 100000 essential bits are extracted from the EBD file. And the total essential bits for that is 609658 bits. For the FI in Sobel and Gaussian PB (FI-SPB and FI-GPB) at the 2<sup>nd</sup> stage, '1' in the sobel.bin and gaussian.bin are deemed the injected locations. The numbers of '1' in sobel.bin and gaussian.bin are 276089 and 273343, respectively. No matter the injection locations of FB and PB, the word and bit offsets are converted into the FI script.

Figure 8-9 (a) and (b) display the key procedure flows of FI in FB and PB, respectively. Fault injection in FB is operated before loading that from DDR into CRAM. And for the PB,

injection is conducted before loading the PB from DDR into CRAM. Since both RMs share the FB, anyone RM configuration can be executed during fault injection in FB.



Figure 8-9 The key procedure flows of FI in FB and PB

## 8.4 Detected Errors in FB and PB injections

## 8.4.1 Detected Errors

In FI-FB, 100000 faults are injected. At the same time, 276089 and 273343 faults are injected in FI-SPB and FI-GPB, respectively. For these FIs, the observed results are summarized as four types: Normal, bitstream load failure (BLF), calculation-result error (CRE), and SH. The latter three types are errors induced by injected faults. More descriptions of them are presented in Table 8-3.

Table 8-3 Obtained results in DPR-based FI

| Obtained result | Description                                                         |
|-----------------|---------------------------------------------------------------------|
| Normal          | Operation and results not impacted by injected faults               |
| BLF             | Bitstream can't be loaded to CRAM from DDR                          |
| CRE             | At least one calculation result is different from the expected ones |
| SH              | The program stops running, and no fresh messages                    |

At last, 6822 errors are detected in FI-FB, FI-SPB, and FI-GPB, totally. Table 8-4 lists the detected numbers of errors in each FI. It can be viewed that only 87 errors are detected during 100000 injections for FB. The errors for FI-SPB and FI-GPB are 3905 and 2830, respectively.

| Table 0-4 Indition of chois in cach fi | Table 8-4 | Number of errors | s in each FI |
|----------------------------------------|-----------|------------------|--------------|
|----------------------------------------|-----------|------------------|--------------|

| Injection | Number of error |  |  |
|-----------|-----------------|--|--|
| FI-FB     | 87              |  |  |
| FI-SPB    | 3905            |  |  |
| FI-GPB    | 2830            |  |  |

Table 8-5 shows the numbers of each kind of error during fault injections. It can be seen all 87 errors are SH in FI-FB. In FI-SPB and FI-GPB, three kinds of errors are observed, and the majorities are CRE. The percentages of CRE in two partial bitstreams' injections are 97.00%

and 95.87%. This phenomenon demonstrates that one energetic particle's striking is easier to cause result error in DPR than other error types. The BLF errors mean that bitstreams are loaded failure due to the injected faults. CRE and BLF errors can be recovered with retransferring and reloading the bitstreams from the SD card again. For the SH error, however, repower is required.

| Injection | BLF                               | CRE                              | SH        |
|-----------|-----------------------------------|----------------------------------|-----------|
| FI-FB     | /                                 | /                                | 87        |
| FI-SPB    | 113                               | 3788                             | 4         |
| FI-GPB    | 113                               | 2713                             | 4         |
| Table 8-6 | Word and bit offsets of SH errors | s in partial bitstreams' fault i | njections |
| Injection | Word                              | Bit                              |           |
|           | 367032                            | 4                                |           |
| FI-SPB    | 367032                            | 5                                |           |
| F1-5FD    | 367034                            | 5                                |           |
|           | 367035                            | 0                                |           |
|           | 367032                            | 4                                |           |
|           | 367032                            | 5                                |           |
| FI-GPB    | 367034                            | 5                                |           |
|           | 367035                            | 0                                |           |

Table 8-5 Numbers of each kind of error in FIs

Each FI location is extracted from the word and bit offset scripts described in 8.1. Therefore, the corresponding word and bit offsets can be easily obtained for each detected error. A phenomenon is found concerning SH and BLF errors in PB's fault injection. That is, the word and bit offsets belonging to SH and BLF errors are duplicated in FI-SPB and FI-GPB. For instance, as shown in Table 8-6, the word and bit offsets of the 4 SH errors are exactly the same in FI-SPB and FI-GPB.

## 8.4.2 SES of Errors

The SES values of detected errors in each FI are calculated with (7-3). For FI-FB, the detected error's SES value is  $8.70 \times 10^{-4}$ . For FI-SPB and FI-GPB, three kinds of errors are detected, and the majorities are CRE. Figure 8-10 shows the SES values of CRE error in the two partial bitstreams' fault injections. They are  $1.37 \times 10^{-2}$  and  $9.92 \times 10^{-3}$  for FI-SPB and FI-GPB, respectively.



Figure 8-10 SES values of CRE errors in two partial bitstream fault injections

The Sobel and Gaussian partial bitstreams are 476320 bytes, namely 3810560 bits. Even though the detected numbers of errors for FI-SPB and FI-GPB are different, mainly caused by the discrepancy of CRE errors in both FIs, there are some commonalities in the detected errors for both FIs. Table 8-5 shows that the numbers of BLF and SH errors are the same for FI-SPB and FI-GPB. More important is that their offsets are duplicated as above mentioned. This phenomenon is reasonable since others are identical, besides algorithms are not the same in two RMs.



Figure 8-11 SES values of BLF and SH errors in two partial bitstreams' fault injections

The entire 476320 bytes for Sobel and Gaussian partial bitstreams are compared to analyze the detected errors further. Finally, 19263 bytes are found existing discrepancies between the two partial bitstreams. For these 19263 bytes, numbers of '1' in the two partial bitstreams are counted. They are 34000 and 31254 for Sobel and Gaussian bitstreams, respectively. The number of difference is 2746. And this discrepancy is exactly equaled to the number difference of injected faults in FI-SPB and FI-GPB (273343 and 276089, and the discrepancy is 2746). When it minuses the number of '1' in these 19263 bytes, the remaining number of '1' for Sobel and Gaussian partial bitstreams are the same, namely 242089. Hence, the detected identical errors in FI-SPB and FI-GPB can be deemed induced by FI in the 242089 bits. Figure 8-11 displays the SES values of BLF and SH errors in FI-SPB and FI-GPB, respectively. For BLF, they are  $4.67 \times 10^{-4}$ , and for SH, they are  $1.65 \times 10^{-5}$ .

## 8.5 FMEA on the FI results

Fault injection mimics SEU occurring in the CRAM of Xilinx 16nm FinFET MPSoC. And the detected soft errors are the manifestation of system failures. It is different from the FTA in 7.5, in which the SEM and the application subsystems are independent. The DPR design is composed of the static and reconfiguration modules, and it should follow the specific operation sequences for the two modules' execution. Thus, the FMEA is more feasible to analyze the obtained results in the current project. The FMEA method is an effective way to quantitatively assess severities of components errors in a system. This chapter executes fault injections in full and partial bitstreams, and three kinds of errors are detected. Moreover, these fault injections obtain the SES values of each kind of error. These provide basements for FMEA.

#### 8.5.1 FMEA Construction

In order to perform FMEA, the following information is necessary, including the top event, different modules, various failure modes, failure rates, severities, and the risk priority number (RPN). For the Xilinx 16nm FinFET MPSoC, the top event is the system malfunction in DPR design, and the module includes the static and reconfiguration ones. According to the fault injection, the module, failure modes, failure rates, and processing methods in FMEA are presented in Table 8-7 <sup>[191]</sup>.

| Module        | Failure mode | Failure rate          | Processing methods  |
|---------------|--------------|-----------------------|---------------------|
| Static Module | SH           | 8.70×10 <sup>-4</sup> | Repowering          |
|               | BLF          | 4.67×10 <sup>-4</sup> | Reloading bitstream |
| Sobel RM      | CRE          | 1.37×10 <sup>-2</sup> | Reloading bitstream |
|               | SH           | 1.65×10 <sup>-5</sup> | Repowering          |
|               | BLF          | 4.67×10 <sup>-4</sup> | Reloading bitstream |
| Gaussian RM   | CRE          | 9.92×10 <sup>-3</sup> | Reloading bitstream |
|               | SH           | 1.65×10 <sup>-5</sup> | Repowering          |

Table 8-7 Parameters in the FMEA.

In the DPR design, modules include the Static, Sobel, as well as the Gaussian RMs. The failure mode of the Static module is SH, and the obtained failure rate is  $8.70 \times 10^{-4}$ . The failure mode can be solved through repowering the device. While the failure modes are the same for the Sobel and Gaussian RMs, involving BLF, CRE, and SH. Meanwhile, the failure rates of BLF and SH are identical for Sobel and Gaussian RMs, and they are  $4.67 \times 10^{-4}$  and  $1.65 \times 10^{-5}$ , respectively. The BLF failure mode is processed by reloading bitstream, nevertheless, the SH failure mode requires repowering. The failure rates of CRE are  $1.37 \times 10^{-2}$  and  $9.92 \times 10^{-3}$ , respectively. And the CRE failure mode is processed by reloading the bitstream.

#### 8.5.2 System Risk Assessment

A failure mode RPN stands for its impact on system outcomes from system level risk evaluation. The larger the RPN value, the greater the influence and the higher severity on system vulnerability for a failure mode. Meanwhile, the larger RPN indicates the priority solution to take for this failure component or mode <sup>[192-193]</sup>. Since Xilinx takes advantage of the utter low alpha material, the average flux is estimated about 0.001 cm<sup>-2</sup>·h<sup>-1</sup> from package impurities <sup>[159]</sup>. And the upset rate from package impurities is approximately 0.1 FIT/Mb for the 16nm FinFET CRAM <sup>[88]</sup>. The total bits are 44549344 and 3810560 for static and reconfiguration module, respectively. Thus, the static and reconfiguration modules' failure rates caused by package alpha impurities are about 4.45 and 0.38 FIT for the DPR design <sup>[191]</sup>.

For a DPR design, the loaded configuration memory is composed of two components, one is the full bitstream corresponding to the Static module, and the other is the partial bitstream corresponding to the RM. At the same time, three kinds of failure modes exist. During FMEA, they can be represented as follows.  $\{C(1), C(2)\} = \{Static module, Reconfiguration module\}$  and  $\{FM(1), FM(2), FM(3)\} = \{SH, BLF, CRE\}$ . The *i*<sup>th</sup> component's RPN (*RPN\_C(i)*) and the *k*<sup>th</sup> failure mode's RPN (*RPN\_FM(k)*) are calculated using (8-6) and

(8-7), respectively <sup>[191]</sup>.

$$RPN_C(i) = FR_C(i) \times \sum_{k=1}^{z} P(i, FM(k)) \times S_FM(k) , 1 < k < z$$
(8-6)

$$RPN_FM(k) = S_FM(k) \times \sum_{i=1}^{n} FR_C(i) \times P(i, FM(k)), \ 1 < i < n$$
(8-7)

where  $RPN_C(i)$ --RPN of the component;  $FR_C(i)$ --SEU rate of the component; P(i, FM(K))--the probability of FM(K) if a failure occurs in the component;  $S_FM(k)$ --severity level of the failure mode.

For the current DPR design,  $\{FR\_C(1), FR\_C(2)\}$  value is  $\{4.45 \ FIT, 0.38 \ FIT\}$ . Considering the impact and processing methods on these failure modes, as shown in Table 8-8,  $\{S\_FM(1), S\_FM(2), S\_FM(3)\}$  value is  $\{10, 6, 4\}$ .

 Table 8-8
 Severity level consideration in DPR-based FI

| Severity level | Failure mode | Description                                                                  |
|----------------|--------------|------------------------------------------------------------------------------|
| 10             | SH           | Program execution can't continue<br>and it requires artificial recovery      |
| 6              | BLF          | The test algorithm can't be executed,<br>and bitstream reloading is required |
| 4              | CRE          | The calculation result is incorrect,<br>and bitstream reloading is required  |

Thus, the detail values of  $RPN_C(i)$  and  $RPN_FM(k)$  are calculated as follows.

RPN C(1) =  $4.45 \times (8.70 \times 10^{-4} \times 10 + 0 \times 6 + 0 \times 4) = 3.87 \times 10^{-2}$  FIT

 $RPN_C(2) = 0.38 \times (1.65 \times 10^{-5} \times 10 + 4.67 \times 10^{-4} \times 6 + 1.18 \times 10^{-2} \times 4) = 1.91 \times 10^{-2} FIT$ 

 $(1.18 \times 10^{-2})$  is the average CRE of SBPI and GBPI, other failure rates are also averages)

 $RPN_FM(1) = 10 \times (8.70 \times 10^{-4} \times 4.45 + 1.65 \times 10^{-5} \times 0.38) = 3.88 \times 10^{-2} FIT$ 

RPN FM(2) =  $6 \times (0 \times 4.45 + 4.67 \times 10^{-4} \times 0.38) = 1.06 \times 10^{-3}$  FIT

 $RPN_FM(3) = 4 \times (0 \times 4.45 + 1.18 \times 10^{-2} \times 0.38) = 1.79 \times 10^{-2} FIT$ 

The RPN\_C(1) is larger than RPN\_C(2). It demonstrates that the Static module has a greater impact on system failure. The RPN\_ FM(1) is larger than RPN\_ FM(3) and RPN\_ FM(2), which signifies the system halt influences more on system failure. These illustrate the Static module and system halt error that must be prioritized in mitigating SEE for DPR design.

In the current research, the reconfiguration module configuration memory is injected in all '1' bits. If it can identify the essential bit for these RMs precisely, the partial bitstream fault injection can speed up significantly in the future.

#### 8.6 Summary

The Xilinx 16nm FinFET MPSoC embeds the SRAM-based FPGA in the PL part, making fault injection based on DPR feasible on the device. The Sobel and Gaussian reconfiguration modules are implemented in DPR. Fault injection on the Static module's full bitstream and RM's partial bitstreams are operated relying on DPR. For full bitstream, fault injection is executed on 100000 bits, which locations are extracted from the essential bit file. For partial

bitstreams, locations of '1' are injected. Finally, system halt error is only detected in the full bitstream fault injection. And three kinds of errors, including bitstream load failure, calculation result error, and system halt, are observed in partial bitstream fault injections. According to the obtained errors, the failure modes and effects analysis method is adopted to assess system failure. The severity of modules and failures are evaluated quantitatively. Finally, the RPN values illustrate the Static module and system halt error have greater impacts on system failure.

# 9 DR-based FI on DNN in Xilinx 16nm FinFET MPSoC

Nowadays, the advanced nanoscale COTS MPSoC, integrating the PS and PL parts, is considered an excellent platform for machine learning because of its architecture and outstanding features <sup>[194]</sup>. SEU, however, caused by energetic particles impinging on configuration memory, can impact the neuron network's performance that is implemented on the MPSoC. Especially for the scaled MPSoC, influence from SEU should be considered. To investigate SEE impact on DNN implementation on the Xilinx 16nm FinFET MPSoC, an open-source DNN is implemented on the MPSoC. Fault injection based on DR is operated, and SEU impacts on DNN implementation are discussed.

# 9.1 DR-based FI on DNN Realization Diagram

DNN has enjoyed a whirlwind development speed these years <sup>[195]</sup>. Relying on its low power consumption, high integration, and other merits, the FPGA implementation DNN accelerator has been constantly witnessed <sup>[196]</sup>. In particular, since vendors released the advanced nanoscale COTS MPSoCs embedding the ARM processor and FPGA together in a chip, machine learning studies on these MPSoCs develop quickly <sup>[197-204]</sup>.

K. Vipin developed an open-source DNN implementation named ZyNet and examined the performance on a 28nm SoC <sup>[205-206]</sup>. The ZyNet is a Python package implementing DNN on SoC, which supports pre-train or board train networks. In this chapter, the ZyNet is implemented on the Xilinx 16nm FinFET MPSoC. And the ZyNet DNN processes the MNIST dataset to identify handwritten digits from 0 to 9 <sup>[207]</sup>. In addition, fault injection in CRAM based on DR is operated to evaluate SEU impact on the DNN. Finally, according to the obtained results, SEU influences on DNN implementation are analyzed. In this work, five sets of DNNs are examined. Although the numbers of neurons for these networks are different, the processing and study diagrams are the same. The entire realization diagram of each DNN research is presented in Figure 9-1.



Figure 9-1 The entire realization diagram for DPR-based FI on DNN

The DR-based FI on DNN is mainly composed of six stages. They include network pretrain, ZyNet RTL generation, Block design, FI script creation, FI in CRAM, and results analysis.

- a) Network pre-train: ZyNet can be pre-trained in Python, and the weight and bias values can be generated in this stage. Then, these values can be used in RTL generation.
- b) ZyNet RTL generation: ZyNet Verilog RTL codes are produced that can be introduced directly in the block design stage.
- c) Block design: it needs to create a block design, and the necessary IPs are added to that, such as the IP of the MPSoC, DMA, and others. After the operations, for instance, synthesis and implementation, the bitstream and the essential bit files are generated. They can be used to create FI scripts.
- d) FI script creation: word and bit offset from the essential bit files are extracted as the FI script for the intended injection locations.
- e) FI in CRAM: before loading bitstream from DDR to CRAM, the DR fault injection is achieved by directly flipping the information in any bit.
- f) Results analysis: the soft errors are observed and analyzed, positive and negative impacts from SEU on DNN implementation are discussed.

Detail about the DNN fault injection is described in the following sections.

## 9.2 ZyNet DNN Implementation on Ultrascale+ MPSoC

#### 9.2.1 Tested DNN

The MNIST dataset is concerned about 28×28 pixel handwritten digit grayscale images identification. It contains 50000 images as the training data, 10000 images as the validate data, and another 10000 images as the test data. For DNN implementation MNIST, the neuron numbers of input and output layers are 784 and 10. The neuron numbers of hidden layers are variable.

Figure 9-2 shows a schematic of fully connected NN, and this network architecture is similar to that of ZyNet. The neuron's output comes from the previous layers with a non-linear activation function, and this activation function can be sigmoid, Rectified Linear Unit (ReLU), and others.



Figure 9-2 Schematic of a fully connected NN

In this study, the ZyNet is implemented using a seven layers DNN structure, and a total

of five different sets of ZyNet DNNs are produced. Figure 9-3 shows a snapshot of the golden one, in which the number of neurons in five hidden layers are 30, 30, 30, 30, and 10. It can be seen, the input layer is the flatten type, while the five hidden layers and one output layer are dense types. The activation functions for the hidden layers are sigmoid functions, whose expression is shown in (9-1), and Figure 9-4 displays the figure of that. The output layer neurons are processed with a hardmax module to get the maximum output value <sup>[205]</sup>. The data type of the network is 8 bits fixed type, and in which 4 bits represent the integer portion for weight value.

```
model = zynet.model()
model.add(zynet.layer("flatten",784))
model.add(zynet.layer("Dense",30,"sigmoid"))
model.add(zynet.layer("Dense",30,"sigmoid"))
model.add(zynet.layer("Dense",30,"sigmoid"))
model.add(zynet.layer("Dense",30,"sigmoid"))
model.add(zynet.layer("Dense",10,"sigmoid"))
model.add(zynet.layer("Dense",10,"sigmoid"))
```





Figure 9-4 The figure of the sigmoid function

For the five different sets of DNNs, the neuron numbers of the input and output layers are the same, while the hidden layers' neuron numbers are varied. The neuron numbers of each hidden layer for the five DNNs are listed in Table 9-1. In the table, 30(G) stands for the golden DNN, which is the benchmark for other networks. 31(i) means that the i<sup>th</sup> layer adds one neuron, and H(i) denotes the i<sup>th</sup> hidden layer.

|         | 14010 / 1 |      | · · · · · · · · · · · · · · · · · · · | · · - · - |      |
|---------|-----------|------|---------------------------------------|-----------|------|
| Network | H(1)      | H(2) | H(3)                                  | H(4)      | H(5) |
| 30(G)   | 30        | 30   | 30                                    | 30        | 10   |
| 31(1)   | 31        | 30   | 30                                    | 30        | 10   |
| 31(2)   | 30        | 31   | 30                                    | 30        | 10   |
| 31(3)   | 30        | 30   | 31                                    | 30        | 10   |
| 31(4)   | 30        | 30   | 30                                    | 31        | 10   |

 Table 9-1
 Neuron numbers of hidden layers in five DNNs

#### 9.2.2 DNN Training and Implementation on MPSoC

Even though the DNNs have different neurons, their research flows are identical. The five DNNs are trained in Python one by one firstly. This process does not involve operations on Xilinx 16nm FinFET MPSoC, which is completely out of the board and operated on the computer. Each DNN is trained for 30 epochs. The mini-batch size and the learning rate are the same for each DNN, and they are 10 and 0.1, respectively. For each DNN, 10000 validation data are used to check the accuracy of the trained networks. Figure 9-5 displays the identification rate for the trained DNNs. They are 0.9621, 0.9607, 0.9569, 0.9614, and 0.9635 for 30(G), 31(1), 31(2), 31(3) and 31(4), respectively.



Figure 9-5 Identification rate of trained DNNs

After each training, the generated weight and bias values are transferred to ZyNet RTL code. The layout and structure of the block design, shown in Figure 9-6, are available to five DNNs. The DMA IP connects with the ZyNet block in the block design, and only the read channel is active. The resource utilization of five DNNs is listed in Table 9-2. The utilized resources mainly involve LUT, FF, BRAM, and BUFG.



Figure 9-6 The layout of the block design of 30(G) DNN

For each DNN implemented, it checks the 10000 test data, and the identification rate is obtained under the condition without fault injection. The on-board examination results are shown in Figure 9-7. They are 0.9555, 0.9568, 0.9587, 0.9604 and 0.9616 for 30(G), 31(1), 31(2), 31(3) and 31(4), respectively. For each implemented DNN on the MPSoC, the utilized

| Network | LUT            | LUTRAM      | FF            | BRAM           | BUFG      |
|---------|----------------|-------------|---------------|----------------|-----------|
| 30(G)   | 16576 (23.49%) | 257 (0.89%) | 9995 (7.08%)  | 42.50 (19.68%) | 1 (0.51%) |
| 31(1)   | 16223 (22.99%) | 257 (0.89%) | 9998 (7.08%)  | 45.50 (21.06%) | 1 (0.51%) |
| 31(2)   | 15738 (22.30%) | 257 (0.89%) | 9957 (7.06%)  | 47.50 (21.99%) | 1 (0.51%) |
| 31(3)   | 16075 (22.78%) | 257 (0.89%) | 9975 (7.07%)  | 45 (20.83%)    | 1 (0.51%) |
| 31(4)   | 16475 (23.35%) | 257 (0.89%) | 10025 (7.10%) | 43.50 (20.14%) | 1 (0.51%) |

weight and bias values from the software trained are corresponding.

0.8

Identification rate 0.6

0.4

0.2

0.0

30(G)

 Table 9-2
 The resource utilization of five DNN

Network Figure 9-7 Identification rates of DNNs on the MPSoC

31(2)

31(3)

31(4)

31(1)

Compared with the identification rates of software implementations, although a little discrepancy exists for MPSoC implemented DNNs, the values are rather closed. This fact evidences the trained DNNs are credible. Additionally, for the on-board DNNs, it can be viewed the identification grows up gradually as the added neuron is closer to the output layer. Obviously, this feature does not apply to software training networks.

After verifying the performance of the trained DNN on the MPSoC, fault injection on that can be launched.

## 9.3 FI on DNN

The DNN maps to different resources on the FPGA, such as LUT, RAM, FF, and others. They are sensitive to SEE. Meanwhile, their corresponding configurations information is kept in CRAM, which is also vulnerable. Suppose an energetic particle hits the FPGA and induces SEU. It can change the trained weight or bias values. It may change the routines of cells. Sporadic, these can impact the DNN performance. Via fault injection, it can investigate this influence on DNN implementation directly.

Fault injection on ZyNet DNN relies on dynamic reconfiguration, and the procedure flow for DR fault injection is similar to that based on DPR. Figure 9-8 shows the layout of DRbased FI. It can be seen that it is similar to the layout of DPR. The FI terminal communicates with the test board through UART. And the terminal of Figure 8-7 is also available in the current study.



Figure 9-8 Layout of the FI on DNN

The MPSoC is the SD card launch mode. Figure 9-9 is the snapshot of files stored in the SD card. For the two files, the BOOT.bin is necessary for SD card launch mode, and the cnn.bin is the original bitstream of each DNN. For each DNN, the BOOT.bin and cnn.bin are corresponding, every time, via changing the BOOT.bin and corresponding cnn.bin can achieve shifting different DNN.



Figure 9-9 Files restored in SD card

The FI is executed on DNNs one by one. As aforementioned, a certain number of essential bits are extracted from each DNN's EBD file to create an FI script. Table 9-3 shows the total essential bit of each DNN. And in this study, 50000 bits are extracted for each DNN.

| Network | # bit   |  |
|---------|---------|--|
| 30(G)   | 5084661 |  |
| 31(1)   | 5097678 |  |
| 31(2)   | 4949596 |  |
| 31(3)   | 4983812 |  |
| 31(4)   | 5093162 |  |

Table 9-3 The essential bit length of each DNN

Figure 9-10 draws the key fault injection flow of each DNN. Firstly, the cnn.bin is loaded into DDR from the SD card. Then, one fault is injected with the 'XOR' operation on the target location. After that, the fault injected bitstream is loaded into CRAM over the PCAP, and the program is executed. For each fault injection, 10000 test data are checked, and the total misidentification numbers (MN) among the 10000 test data are reported at the end of each examination. The running results are recorded in time. Before injecting a new fault, the current injected fault is recovered with the 'XOR' operation again. If a network's 50000 times injections are tested over, a new DNN's FI starts.



Figure 9-10 Fault injection flow on DNN

## 9.4 DNN FI Results

At last, four types of results are detected during each DNN's fault injection. The results are identification accuracy changed (IAC), DMA failed at initialization (DFI), SH, and normal. It's clear the former three kinds of results are errors. Table 9-4 describes detail about each result further. The detected error demonstrates an upset in CRAM can lead to multiple unexpected results on DNN.

|                 | Table 9-4    Detail of the detected results      |
|-----------------|--------------------------------------------------|
| Obtained result | Description                                      |
| IAC             | The MN is different from the original            |
| DFI             | DMA fails at the initialization stage            |
| SH              | The program stops running, and no fresh messages |
| Normal          | The MN is the same as the original               |

Here, it should be noticed, if the MN is different from the original during any fault injection, no matter that is larger or smaller, it's counted as an IAC. Table 9-5 shows the original MN of each DNN. They correspond to the onboard identification rate described in Figure 9-7 for each DNN.

| Table 9-5 The | original MN of each DNN on MPSoC |
|---------------|----------------------------------|
| Network       | MN                               |
| 30(G)         | 445                              |
| 31(1)         | 432                              |
| 31(2)         | 413                              |
| 31(3)         | 396                              |
| 31(4)         | 384                              |

The numbers of detected errors are counted in Table 9-6 for each DNN during 50000 times fault injections. It can be seen the IAC error accounted for a large part. Compared with

|         | Table 9-6 | Detected | Detected error numbers during each DNN's fault injection |     |     |  |  |
|---------|-----------|----------|----------------------------------------------------------|-----|-----|--|--|
| Network |           | Total    | IAC                                                      | SH  | DFI |  |  |
| 30(G)   |           | 5500     | 5239                                                     | 246 | 15  |  |  |
| 31(1)   |           | 4620     | 4385                                                     | 215 | 20  |  |  |
| 31(2)   |           | 3971     | 3768                                                     | 183 | 20  |  |  |
| 31(3)   |           | 4768     | 4529                                                     | 223 | 16  |  |  |
| 31(4)   |           | 4830     | 4502                                                     | 304 | 24  |  |  |

the DFI errors', the numbers of SH errors are approximately higher by one order of magnitude.

To quantitatively analyze these errors further, SES values of each kind of error are calculated with (7-3). The SH and DFI errors are negative effects on DNNs when SEE occurs in CRAM. They are recovered by repower the board and reloading the bitstream, respectively. SES values for each DNN are presented in Figure 9-11. For SH errors, the SES values are 4.92×10<sup>-3</sup>, 4.30×10<sup>-3</sup>, 3.66×10<sup>-3</sup>, 4.46×10<sup>-3</sup> are 6.08×10<sup>-3</sup> for five DNNs. The maximum and minimum come from the 31(4) and 31(2) networks. For the DFI errors, the SES values are  $3.00 \times 10^{-4}$ ,  $4.00 \times 10^{-4}$ ,  $4.00 \times 10^{-4}$ ,  $3.20 \times 10^{-4}$ , and  $4.80 \times 10^{-4}$  for five DNNs. It's similar to the SH errors, and the maximum SES value comes from the 31(4) network. It can be speculated the added neuron is closer to the output layer, and the network has a higher probability of suffering SH and DFI errors.



Figure 9-11 SES values for SH and DFI

Training the DNN network is to get higher identification accuracy, that is, less MN. Even though the changed MNs are counted into IAC in Table 9-6, it must clarify a part of the IAC is positive. Since the corresponding MNs are less than the original ones on the DNN. Hence, it is necessary to discuss the two cases separately. The MN, less than the original, can be considered an enhancement of the identification accuracy (EIA). The MN larger than the original can be regarded as degradation of the identification accuracy (DIA). For the DNN, the EIA can be deemed a positive impact from SEE in CRAM because it reduces the MN of the network. In contrast, the DIA is a negative influence from SEE in CRAM. Table 9-7 lists the specific numbers of EIA and DIA in the IAC for each DNN. The ratios of EIA in the IAC for each DNN are about 0.26, 0.33, 0.30, 0.31, and 0.24, respectively.

| 9 | DR-based | FI on | DNN in | Xilinx | 16nm | FinFET | MPSoC |
|---|----------|-------|--------|--------|------|--------|-------|
|---|----------|-------|--------|--------|------|--------|-------|

|         | Table 9-7Numbers of EIA and DIA |            |
|---------|---------------------------------|------------|
| Network | EIA number                      | DIA number |
| 30(G)   | 1359                            | 3880       |
| 31(1)   | 1455                            | 2930       |
| 31(2)   | 1114                            | 2654       |
| 31(3)   | 1402                            | 3127       |
| 31(4)   | 1084                            | 3418       |

| Table 0.7 | Numbers of EIA and | DIA |
|-----------|--------------------|-----|

# 9.4.1 EIA on DNN

Traditionally, SEEs are considered undesirable outcomes by energetic particle striking electronic systems. Researchers need to pay a lot of effort to immune them. Nevertheless, the EIA in this study can be considered different.

The weight and bias are two important parts of neurons in the network. They are obtained from the software training stage and mapped to FF or LUT during the block designs' synthesis and implementation in Vivado. In this study, the weight and bias are 8 bits fixed point data type, and 4 bits is the integer portion. Figure 9-12 shows the example of SEU occurring in the fraction portion at the first and second bit in weight value, (a) shows that emerges in the first fraction bit and (b) describes that occurs in the second fraction bit. The output of each neuron comes from the operation of its previous layer combined with the weight and bias values. Here, if SEE pushes that the weight value changes a little and relies on this small change, a possible misidentification is prevented, the performance of DNN is enhanced.

| 0     | 0                                   | 0     | 0     | 1  | H     | 1     | 1      | 0     | 0     | 0      | 0     | 1     | 0      | $\times$ | 1      |
|-------|-------------------------------------|-------|-------|----|-------|-------|--------|-------|-------|--------|-------|-------|--------|----------|--------|
| Sign  | Int                                 | teger | porti | on | Fract | ion p | ortion | Sign  | Int   | teger  | porti | on    | Fract  | ion p    | ortion |
| (a) U | (a) Upset in the first fraction bit |       |       |    |       | (     | b) U   | Jpset | in th | ie seo | cond  | fract | ion bi |          |        |

Figure 9-12 SEU in fraction portion

In the current study, 50000 faults are injected in CRAM for each DNN, and various numbers of EIA are observed in Table 9-7. As aforementioned, the percentages of EIA are about 1/4 in all IAC for five DNNs. Furthermore, the EIA sensitivity (EIAS) of each DNN is calculated with (9-2). Figure 9-13 displays the EIAS of each DNN.

$$EIAS = \frac{n_{eia}}{N_i} \tag{9-2}$$

where EIAS--EIA sensitivity;  $n_{eia}$ --number of EIA;  $N_i$ --the number of injected faults.



Figure 9-13 EIAS of each DNN

The EIAS values are about  $2.72 \times 10^{-2}$ ,  $2.91 \times 10^{-2}$ ,  $2.23 \times 10^{-2}$ ,  $2.80 \times 10^{-2}$ , and  $2.17 \times 10^{-2}$  for five DNNs. Currently, EIAS values are produced from fault injection on 50000 essential bit. Table 9-3 summarizes the length of the essential bit for each DNN. Based on these EIAS, the numbers of EIA on all essential bits for each DNN are predicted in Table 9-8. Each DNN has about 138201, 148342, 110277, 139746, and 110419 upsets in CRAM to promote the DNN identification rate.

|         | Table 9-8     | Predicted EIA number of each DNN |  |
|---------|---------------|----------------------------------|--|
| Network | Essential bit | Predicted EIA number             |  |
| 30(G)   | 5084661       | 138201                           |  |
| 31(1)   | 5097678       | 148342                           |  |
| 31(2)   | 4949596       | 110277                           |  |
| 31(3)   | 4983812       | 139746                           |  |
| 31(4)   | 5093162       | 110419                           |  |

## 9.4.2 Optimal EIA on DNN

The original MNs for five DNNs are presented in Table 9-5. Although the EIA means the MN is less than the original, the discrepancy varies among different EIAs. Take the 30(G) DNN as an example, its original MN is 445. And 1359 times EIA are observed during 50000 FIs. Among the 1359 EIA, the MN can be 440, 430, or other numbers, as long as it's less than 445. It's easy to get the smallest among these numbers, and the corresponding FI location can be regarded as the optimal EIA (OEIA). Table 9-9 shows the OEIA of each DNN. They are 419, 399, 377, 380, and 363 for five DNNs, respectively. Compared with the original MN, the enhancement at the OEIA location is also obtained. They are 5.84%, 7.64%, 8.72%, 4.04% and 5.47%, respectively. The maximum enhancement is investigated in 31(2) DNN.

|         |                                                      | Table 9-9 MN at OEIA                      |             |
|---------|------------------------------------------------------|-------------------------------------------|-------------|
| Network | MN at original                                       | MN at OEIA                                | Enhancement |
| 30(G)   | 445                                                  | 419                                       | 5.84%       |
| 31(1)   | 432                                                  | 399                                       | 7.64%       |
| 31(2)   | 413                                                  | 377                                       | 8.72%       |
| 31(3)   | 396                                                  | 380                                       | 4.04%       |
| 31(4)   | 384                                                  | 363                                       | 5.47%       |
|         | 30 - ■<br>25 -<br>20 -<br>☆ 15 -<br>10 -<br>5 -<br>0 | 30(G)<br>31(1)<br>31(2)<br>31(3)<br>31(4) |             |
|         | 0                                                    | 200000 400000 600000 800000<br>Word       |             |

Figure 9-14 OEIA fault injection location

Moreover, the fault injection depends on DR with the 'XOR' operation. The word and bit offsets of each FI are definite and known. It's easy to get the OEIA injection locations. Figure 9-14 shows the word and bit offsets of OEIA for each DNN. It can be seen two OEIA injection locations are detected for 30(G) DNN. The word and bit offset coordinates of OEIA for 30(G) are (365603, 9) and (377606, 24). For other DNNs, there is only one OEIA injection location. And the coordinates are (51026, 30), (347689, 29), (127953, 20) and (362146, 22) for 31(1), 31(2), 31(3) and 31(4), respectively. These phenomena suggest that the OEIA injection location locations may be multi or one for different DNNs. The OEIA also provides a method to improve DNN performance, and more detail is discussed in 9.5.

At the same time, the specific information of the OEIA locations in CRAM is observed for each DNN. Table 9-10 shows the original information at these OEIA locations. It can be viewed the original information is '0' for all OEIA locations. It underlies the 0 to1 upset helps to enhance the DNN performance. If we make a hypothesis, these OEIA injections indeed change the fraction portion of the weight value of a neuron. Relying on these SEUs, the corresponding weight perhaps increases 0.5 maximum (0 to1 upset emerges at the first bit of fraction portion).

| Network | OEIA location | Original information |
|---------|---------------|----------------------|
| 30(G)   | (365603, 9)   | 0                    |
|         | (377606, 24)  | 0                    |
| 31(1)   | (51026, 30)   | 0                    |
| 31(2)   | (347689, 29)  | 0                    |
| 31(3)   | (127953, 20)  | 0                    |
| 31(4)   | (362146, 22)  | 0                    |

Table 9-10 The original information at the OEIA locations

#### 9.5 DNN enhancement based on DR

To enhance the identification rate of DNN, a lot of time and cost is spent on developing complicated algorithms <sup>[208-209]</sup>. It can propose another way to enhance the performance of DNNs that are implemented on the advanced nanoscale COTS MPSoCs based on this work. Figure 9-15 summarizes the procedure flow to enhance DNN performance on advanced MPSoCs.



Figure 9-15 Procedure flow for DNN enhancement

The procedure flow comprises five parts for an established DNN network on SRAMbased MPSoCs. First of all, it should generate the fault injection script from the essential bit file, and then, it can make FI on DNN and record the injection results. These results might include various errors, and it only focuses on the EIA injections. Thirdly, it can find out the OEIA injection location from all EIA injections. At last, load the OEIA fault injected bitstream into CRAM and run the program, the obvious enhancement can be observed. For instance, the maximum enhancement reaches 8.72% in the current research. It once again shows SEE on DNN can also make positive contributions.

Currently, this study is operated on the Xilinx Ultrascale+ MPSoC, but this procedure flow is available to DNNs implemented on other advanced SRAM-based MPSoC. If other kinds of FPGA features dynamic reconfiguration, it can be speculated this solution is also available.

#### 9.6 Summary

An open-source DNN, ZyNet, is transplanted and implemented on the Xilinx Ultrascale+ MPSoC. Totally, five different sets of DNNs are developed. These DNNs have the same architecture, even though the neuron numbers are slightly different. Fault injections on these DNNs are executed based on dynamic reconfiguration. Three kinds of errors, including IAC, SH, and DFI, are detected for each DNN's FI. Moreover, EIA is observed in the IAC errors. And the OEIA is obtained for each DNN. The EIA and OEIA demonstrate SEE can also make positive contributions, namely reducing the misidentification number of the neural network to DNN implemented on advanced MPSoC. It proposes a solution to enhance DNN performance implemented on advanced SRAM-based MPSoCs.

## 10 Conclusions and Suggestions

### 10.1 Conclusions

The current research studied soft errors in two nanoscale COTS MPSoCs: Xilinx 28nm CMOS SoC and 16nm FinFET MPSoC. Researches on the two MPSoCs are conducted separately. The involved methodologies include irradiation tests, Monte Carlo simulation, software-based fault injections, and probability safety analysis.

In particular, irradiation tests and Monte Carlo simulations are performed to study soft errors on Xilinx 28nm CMOS SoC. The proton, atmospheric neutron, and heavy ion irradiations are performed on different test benchmarks to examine SEE sensitivities using China-made accelerators. And Monte Carlo simulations are performed to analyze the investigated SEEs further in some irradiations. Regarding SEE tests on Xilinx 28nm CMOS SoC, conclusions are drawn as follows:

1) Various SEU events, including SBU, and MCU, are investigated in the 70 and 90 MeV proton beam irradiations on Xilinx 28nm CMOS SoC, but the investigated SEE sensitivity induced by 70 and 90 MeV proton irradiations tests are close. The vertical structure of the SoC is extracted, and the Geant4 and CREME-MC Monte Carlo simulation model is constructed to analyze the detected SEEs further. It points out that the generated secondary particles in the sensitive volume are similar, and the corresponding ranges in silicon and LETs for these secondary particles are also close for 70 and 90 MeV proton irradiations.

2) Combining spallation neutron source irradiation and Geant4 Monte Carlo simulation results, SEE induced by multiple energy ranges' atmospheric neutron are examined and analyzed. The results demonstrate it should consider the contribution of 1 to 10 MeV neutron. With the help of a 2 mm Cd slice absorbing the thermal neutron, the contribution of the thermal neutron is observed. That makes about 44% contribution to the SoC atmospheric neutron SEE, and the impact is mainly caused by the secondary particles: <sup>7</sup>Li and  $\alpha$ . At the same time, relying on the proton and neutron irradiation test, the SEE equivalence between 70 MeV proton and the atmospheric neutron is discussed. The SEE cross section ratio for 70 MeV and atmospheric neutron is about 1:1.08. And the 70 MeV proton irradiation can mimic atmospheric neutron SEE on Xilinx 28nm CMOS SoC with 92.59% confidence.

3) SEE on multi-processor patterns combined with different on-chip memory data access modes is investigated using heavy ion irradiations. It discovers the SEU cross sections are primarily impacted by the data access mode, while the processor patterns affect the SEFI cross sections. The static and dynamic SEU cross sections are fitted, and soft errors on the two modes are also predicted. Additionally, the step-increasing and even lower currents are detected in the high LET heavy ion irradiation test.

4) A multi-layer SEE hardening design is proposed, including the redundancy, watchdog, and AMP layers. 70 and 90 MeV proton irradiation tests are performed again to examine its performance. The results demonstrate the multi-layer design can mitigate SEU effectively.

Meanwhile, the multi-layer design can save processing time and make the master core available to other workloads during SEE examination and hardening.

Unlike the SEE test methods on Xilinx 28nm CMOS SoC, fault injection and probability safety analysis are mainly used in SEE evaluations on Xilinx 16nm FinFET MPSoC. Conclusions for them are drawn as follows.

1) SEM-based fault injections on Xilinx 16nm FinFET MPSoC are performed. SEE on three image processing algorithm applications, including Histogram, Stretch, and Sobel processing, are examined. Three kinds of errors, including silent data corruption, Outcome, and FI terminal hang, are observed. The fault tree analysis method is adopted to assess the detected errors. The built fault trees demonstrate that more attention should be paid to the OTH branches because the SEM IP cannot recover these errors.

2) For the SRAM-based MPSoC, dynamic partial reconfiguration is an important feature. Two reconfiguration modules: Sobel and Gaussian, are implemented in a reconfiguration region on Xilinx 16nm FinFET MPSoC. DPR-based fault injections are performed on full and partial bitstreams. Three kinds of errors, including bitstream load failure, calculation result error, and system halt, are detected in partial bitstream fault injections. In contrast, the system halt error is only investigated in the full bitstream injection. The failure modes and effects analysis method is employed to quantitatively evaluate components and soft errors severity. The analysis reveals that the Static module and system halt error should be prioritized in mitigating SEE for DPR design because they experience higher severities.

3) Five sets of deep learning networks are implemented on the Xilinx 16nm FinFET MPSoC. And fault injections based on dynamic reconfiguration are executed on the five DNNs, respectively. At last, three kinds of errors, including identification accuracy changed, DMA failed at initialization, system halt, are detected for each DNN's fault injection. What's more important is that the network identification enhancements are observed. And the optimal enhancement fault injection location is obtained for each DNN. The enhancement phenomena demonstrate that SEE can also make positive contributions, namely reducing misidentification number, to DNN implemented on advanced MPSoC. In addition, a solution is proposed to enhance DNN performance implemented on advanced SRAM-based MPSoCs, which relies on dynamic reconfiguration.

## 10.2 Innovations

1) The aerospace and terrestrial soft error rates in various conditions on Xilinx 28nm CMOS SoC are assessed using China-made accelerators simultaneously. And high LET and long-range heavy ion inducing step-increased and even lower currents are reported on the SoC.

2) It points out that neutron above 1 MeV should be considered in atmospheric neutron SEE assessment for the 28nm SoC rather than the traditional above 10 MeV, and the thermal neutron contribution on SEE in Xilinx 28nm CMOS SoC should not be ignored. Then, it examines the SEE equivalence between 70 MeV proton and atmospheric neutron using China-made irradiation facilities for the first time. In addition, it proposes a multi-layer SEE hardening design based on the AMP pattern and examines its performance taking advantage

of the irradiation test. Meanwhile, SEEs of multi patterns on the Xilinx 28nm CMOS SoC are obtained.

3) Two SEE soft error evaluation solutions are proposed aiming at Xilinx 16nm FinFET MPSoC. Firstly, it proposes the method relying on SEM-based fault injection and fault tree analysis to evaluate SEM and the application subsystems SEE influences. And secondly, it adopts the study approach depending on DPR and FMEA to observe the severity of components and soft errors.

4) It's different from the conventional views: SEU results in a negative impact on design, the positive contributions from a single event upset in CRAM for deep neuron network implementation on the Xilinx 16nm FinFET MPSoC are found. It discovers that some SEU in CRAM can reduce the misidentification number of neural networks for the first time. One solution to enhance the deep neuron network performance implemented on advanced SRAM-based MPSoC is proposed based on dynamic reconfiguration.

#### 10.3 Suggestions

1) The single event effect tests on Xilinx 28nm SoC mainly perform on memory blocks. In the future, for a specific application environment, it should develop the specific applications on the SoC and then examine its single event effect vulnerability and propose targeted hardening measures.

2) For Xilinx 16nm FinFET MPSoC, hybrid hardening measures can be designed. The hardening measures can include asymmetric multiprocessing, soft error mitigation, and dynamic partial reconfiguration methods.

3) More complicated neuron networks can be trained on Xilinx 16nm FinFET MPSoC, the study of single event effect positive contribution on neuron networks can be pushed further.

## Acknowledgments

It is excited and lucky for me when my doctoral research approaches here. There is no doubt that this excitement and luck come from countless help and support. For these, I would like to express my sincere thanks.

First of all, I want to thank my motherland. It is because of her development and support, we can devote ourselves to the interested research. And I can get valuable overseas study opportunities with the support of the China Scholarship Council.

When it comes to my doctoral research, undoubtedly, the greatest support and help comes from my supervisor, He Chaohui, at Xi'an Jiaotong University. His selfless support and help motivated me to move forward at any time and helped me get out of every confusion. Meanwhile, my cooperation supervisor, Luca Sterpone, great thanks go to you. Because of your support and guidance, my two years of study at Politecnico di Torino can be carried out smoothly.

Secondly, I would like to thank dear professors or partners at two groups in XJTU and Polito. In the XJTU group, instructors like Zang Hang, Li Yonghong, Li Pei, Liu Shuhuan, Zhao Yaolin, Zhang Qingmin, Liu Wenbo, and others helped me solve many problems. They gave me lots of useful suggestions. Partners like Li Yang, Wei Jianan, Huang Zhisheng, Guo Yaxin, Zhao Haoyu, He Huan, Liao Wenlong, Bai Yurong, Guo Yuhang, Deng Bangjie, Sang Yaodong, and others assisted me in different conditions. They gave me a hand in the experiment and research. Du Boyang, Sarah Azimi, Corrado De Sio, and Ludovica Bozzoli gave me lots of help in study and life in Italy at the Polito Lab7 and Lab3.

Thirdly, sincere thanks are dedicated to the help from research institutions in China. For example, researchers: Guo Gang at CIAE, Liu Jie, Du Guanghua at HIRFL, and Liang Tianjiao at CSNS, provided precious accelerator irradiation hours to the irradiation tests. Members in these institutions: Yin Qian, Zhang Yanwen, Han Jinhua, Luo Jie, Ai Wensi, Hu Zhiliang, Zhou Bin, and others, assisted in performing experiments.

Meanwhile, thanks to my good friends during study at Xi'an and Torino.

In Xi'an, because of help from friends, such as Liu Jingang, Xia Hui, Zhang Tianwen, Liu Shuyou, Zhang Hailong, Zhang Yanqiang, Zhou Jingzhi, Cao Huasong, Rong Xinlei, Shen Panfeng, Zhao Run, and others, I can overcome difficulties in life.

In Torino, with the help of friends, for instance, Wu Haosheng, Sun Yue, Hao Wenmei, Chen Yukai, Wang Wenlong, Liang Liyi, Li Linwei, and others, I passed through the difficult times.

Finally, special thanks go to my fiancée and family. My fiancée, Su Shuai, gives me firmly support and help all the time. Without this, I can't arrive here. My father, mother, sisters, and brothers in Shaan Xi and Shan Xi provinces always help and support me wholeheartedly and make me focus on my research. Thanks. I love you all.

In addition, thanks to every experience that I had in my life.

Thanks a lot to the ones who help me grow up.

Thanks a lot to the each reviewer effort on my dissertation. Thanks a lot to the experts attending my final defense. Wish everything goes well for everyone.

# References

- [1] IEEE IRDS. The international roadmap for devices and systems: 2020 [R]. IEEE, 2020.
- [2] Computer History Museum. 1974-Digital watch is first system-on-chip integrated circuit [OL]. 2010-10-14 [2021-04-14]. http://www.computerhistory.org/semiconductor/timeline/1974-digital-watch-isfirst-system-on-chip-integrated-circuit-52.html.
- [3] Wael B, Graham J. System-on-chip for real-time applications [M]. New York: Springer Science Business Media, 2003.
- [4] DS890. Ultrascale architecture and product data sheet: overview [R]. San Jose: Xilinx, 2021.
- [5] Shen Z X, Venkata D. Real-time MPSoC-based electrothermal transient simulation of fault-tolerant MMC topology [J]. IEEE Transactions on Power Delivery, 2019, 34(1): 260-270.
- [6] Jose A B, German L, Jose M B, et al. Evaluating the computational performance of the Xilinx Ultrascale+ EG heterogeneous MPSoC [J]. The Journal of Supercomputing, 2021, 77: 2124-2137.
- [7] Hans-Joachim S, Mladen B, Soren M, et al. HiBRID-SoC: A multi-core SoC architecture for multimedia signal processing [J]. Journal of VLSI Signal Processing. 2005, 41: 9-20.
- [8] Pasricha S, Dutt N. On-chip communication architectures [M]. Burlington: Elsevier, 2008.
- [9] Helleputte N V, Tomasik J M, Galjan W, et al. Full impedance cardiography measurement device using raspberry PI3 and system-on-chip biomedical instrumentation solutions [J]. IEEE Journal of Biomedical and Health Informatics, 2018, 22(6): 1883-1894.
- [10] Wayne W, Ahmed A J, Grant M. Multiprocessor system-on-chip (MPSoC) technology [J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2008, 27(10): 1701-1713.
- [11] Gianluca F, Gabriele M, Aubrey D, et al. Towards the use of artificial intelligence on the edge in space systems: challenges and opportunities [J]. IEEE Aerospace and Electronic Systems Magazine, 2020, 35(12): 44-56.
- [12] Ignacio B M, Alfredo G V, Jose L L G. New applications and architectures based on FPGA/SoC [J]. Electronics, 2020, 9: 1789.
- [13] Didier K, Simon S, Jason R, et al. High performance space computing with system-on-chip instrument avionics for space-based next generation imaging spectrometers (NGIS) [C]. 2018 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), Edinburgh, UK, 2018.
- [14] Yamada Y, Kimura K. NANO-CHIPS 2030: efficient System-on-Chip (SOC) for automated driving with high safety [M]. Gewerbestrasse: Springer. 2020
- [15] Cieri D, Abovyan S, Danielyan V, et al. Hardware demonstrator of a compact first-level muon track trigger for future hadron collider experiments [J]. Journal of Instrumentation, 2019, 14: 1-15.
- [16] Andrea C, Ruben G A, Jan B, et al. Radiation hardness assurance through system-level testing: risk acceptance, facility requirements, test methodology and data exploitation [J]. IEEE Transactions on Nuclear Science, 2021, 68(5): 958-969.
- [17] DARPA rolls out electronics resurgence initiative [OL]. 2017-09-13. [2021-04-22]. https://www.darpa.mil/news-events/2017-09-13
- [18] Avionics Steering Committee. Flight avionics hardware roadmap [R]. Hampton: National Aeronautics and Space Administration, 2013.
- [19] Schimmerling W. The space radiation environment: an introduction [OL]. 2011-05-02. [2021-04-22]. https://three-jsc.nasa.gov/concepts/SpaceRadiationEnviron.pdf
- [20] Stassinopoulos E G, Raymond J P. The space radiation environment for electronics [J]. Proceedings of the IEEE, 1988, 76(11): 1423-1442.
- [21] Christian W F, Herwig S. Particle physics reference library volume 2: detectors for particles and

radiation [M]. Gewerbestrasse: Springer. 2020.

- [22] Herve C, Julia M K, Daniela B, et al. Space as a tool for astrobiology: review and recommendations for experimentations in earth orbit and beyond [J]. Space Science Reviews, 2017, 209: 83-181.
- [23] Reeves G D, Spence H E, Henderson M G, et al. Electron acceleration in the heart of the van allen radiation belts [J]. Science, 2013, 341(6149): 991-994.
- [24] Viyas G. Analysis of single event radiation effects and fault mechanisms in SRAM, FRAM and NAND Flash: application to the MTCube nanosatellite project [D]. Montpellier: Université Montpellier, 2017.
- [25] Harriet G, Emilia K, Adnane O, et al. Outer van allen belt trapped and precipitating electron flux responses to two interplanetary magnetic clouds of opposite polarity [J]. Annales Geophysicae, 2020, 38: 931-951.
- [26] Anderson P C, Rich F J, Borisov S. Mapping the south atlantic anomaly continuously over 27 years [J]. Journal of Atmospheric and Solar-Terrestrial Physics, 2018, 177: 237-246.
- [27] Nicolas S, Nathalie C, Athina V, et al. Statistical estimation of uncertainty for single event effect rate in OMERE [C]. 2011 12th European Conference on Radiation and Its Effects on Components and Systems, Seville, Spain, 2011.
- [28] Bazilevskay G A. Solar cosmic rays in the near earth space and the atmosphere [J]. Advances in Space Research, 2005, 35: 458-464.
- [29] Townsend L W, Adams J H, Blattnig S R, et al. Solar particle event storm shelter requirements for missions beyond low earth orbit [J]. Life Sciences in Space Research, 2018, 17: 32-39.
- [30] Lisa C S, Tony C S, Peter G, et al. NASA's first ground-based galactic cosmic ray simulator: enabling a new era in space radiobiology research [J]. Plos Biology, 2020, 18(5): e3000669.
- [31] John W N, Tony C S, Sukesh A, et al. Advances in space radiation physics and transport at NASA [J]. Life Sciences in Space Research, 2019, 22: 98-124.
- [32] Clive D, Alex H, Keith R, et al. Extreme atmospheric radiation environments and single event effects [J]. IEEE Transactions on Nuclear Science, 2018, 65(1): 432-438.
- [33] Alex H, Paul M, Keith R, et al. Single event effects in power MOSFETs due to atmospheric and thermal neutrons [J]. IEEE Transactions on Nuclear Science, 2011, 58(6): 2687-2694.
- [34] Alexander D, Alex H, Keith R, et al. Single-event effects in ground-level infrastructure during extreme ground-level enhancements [J]. IEEE Transactions on Nuclear Science, 2020, 67(6): 1139-1143.
- [35] Daniel S, Lucas M L, Heikki K, et al. Electron-induced upsets and stuck bits in SDRAMs in the jovian environment [J]. IEEE Transactions on Nuclear Science, 2021, 68(5): 716-723.
- [36] Renu S. Ground-level atmospheric neutron flux measurements in the 10-170 MeV range [D]. Durham: University of New Hampshire. 1990.
- [37] Kenneth A L, Martha V O, Dakai C, et al. Compendium of single event effects, total ionizing dose, and displacement damage for candidate spacecraft electronics for NASA [C], 2014, IEEE Radiation Effects Data Workshop (REDW), Paris, France, 2014.
- [38] G. Spiezia, P. Peronnard, A. Masi, et, al. A new radmon version for the LHC and its injection lines [J]. IEEE Transactions on Nuclear Science, 2014, 61(6): 3424-3431.
- [39] Jeffrey P, Firman M S, Paul L, et al. Low-power electronic technologies for harsh radiation environments [J]. Nature Electronics, 2021, 4: 243-253.
- [40] Federico R. Dosimetry techniques and radiation test facilities for total ionizing dose testing [J]. IEEE Transactions on Nuclear Science, 2018, 65(8): 1440-1464.
- [41] Pablo F R V. Evaluation of the SEE sensitivity and methodology for error rate prediction of applications implemented in multi-core and many-core processors [D]. Grenoble: Grenoble Alpes University, 2017.
- [42] Fred W S. Destructive single-event effects in semiconductor devices and ICs [J]. IEEE Transactions on Nuclear Science, 2003, 50(3): 603-621.

- [43] Paul E D, Lloyd W M. Basic mechanisms and modeling of single-event upset in digital microelectronics [J]. IEEE Transactions on Nuclear Science, 2003, 50(3): 583-602.
- [44] Henri K. Effects of ultra-high total ionizing dose in nanoscale Bulk CMOS technologies [D]. Mons: Université de Mons, 2018
- [45] Gaillardin M, Paillet P, Ferlet-Cavrois V, et al. Total ionizing dose effects on triple-gate FETs [J]. IEEE Transactions on Nuclear Science, 2006, 53(6): 3158-3165.
- [46] Barnaby H J. Total-ionizing-dose effects in modern CMOS technologies [J]. IEEE Transactions on Nuclear Science, 2006, 53(6): 3103-3121.
- [47] D M Fleetwood. Evolution of total ionizing dose effects in MOS devices with Moore's law scaling [J]. IEEE Transactions on Nuclear Science, 2018, 65(8): 1465-1481.
- [48] Richard H M, Martin E F, Mark N M, et, al. Harsh environments: space radiation environment, effects, and mitigation [J]. Johns Hopkins APL Technical Digest, 2008, 28(1): 17-29.
- [49] Wang F, Vishwani D A. Single event upset: an embedded tutorial [C]. 21<sup>st</sup> International Conference on VLSI Design (VLSID 2008), Hyderabad, India, 2008.
- [50] JESD89A. Measurement and reporting of alpha particle and terrestrial cosmic ray-induced soft errors in semiconductor devices [R]. JEDEC Solid State Technology Association, 2012.
- [51] Juan J R A, Eduardo T A, María D V, et al. Embedded processors in FPGA architectures from: FPGAs, fundamentals, advanced features, and applications in industrial electronics [M]. London: CRC Press, 2017.
- [52] Arturo P, Leonardo S, Andrés O, et al. Dynamic reconfiguration under RTEMS for fault mitigation and functional adaptation in SRAM-based SoPCs for space systems [C]. 2017 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), Pasadena: CA, USA, 2017.
- [53] Austin L, Wojciech K, Glenn S, et al. Soft error study of ARM SoC at 28 nanometers [C]. IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE 2014), Palo Alto, USA. 2014.
- [54] Giovanni B. Temperature effects on soft error rate due to atmospheric neutrons on 28 nm FPGAs [D]. Padova, Italy, 2014.
- [55] Lucas A T, Alexey A, Dmitriy V, On the characterization of embedded memories of Zynq-7000 all programmable SoC under single event upsets induced by heavy ions and protons [C]. 2015 15<sup>th</sup> European Conference on Radiation and Its Effects on Components and Systems (RADECS), Moscow, Russia. 2015.
- [56] Lucas A T, Paolo R, Eduardo C, et al. Analyzing the impact of radiation-induced failures in programmable SoCs [J]. IEEE Transactions on Nuclear Science, 2016, 63(4): 2217-2224.
- [57] Lucas A T, Jorge T, André S, et al. Analyzing reliability and performance trade-offs of HLS-based designs in SRAM-based FPGAs under soft errors [J]. IEEE Transactions on Nuclear Science, 2016, 63(4): 2217-2224.
- [58] Gennaro S R, Felipe R, Adria B de O, et al. Analyzing the impact of fault tolerance methods in ARM processors under soft errors running linux and parallelization APIs [J]. IEEE Transactions on Nuclear Science, 2017, 64(8): 2196-2203.
- [59] Israel D C. Convolutional neural network reliability on an APSoC platform a traffic-sign recognition case study [D]. Porto Alegre: Federal University of Rio Grande do Sul, 2017.
- [60] Athanasios C, Pablo B, George P, et al. Demystifying soft error assessment strategies on ARM CPUs: microarchitectural fault injection vs. neutron beam experiments [C]. 49<sup>th</sup> Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Portland, OR, USA. 2019.
- [61] Mehran A, Farokh I, Steven M G, et al. Heavy ion single event effects measurements of Xilinx Zynq-7000 FPGA [C]. 2015 IEEE Radiation Effects Data Workshop (REDW), Boston, MA, USA, 2015.
- [62] F Libano, P Rech, L Tambara, et al. On the reliability of linear regression and pattern recognition feedforward artificial neural networks in FPGAs [J]. IEEE Transactions on Nuclear Science, 2018,

65(1): 288-295.

- [63] Mostafa D, Yves A, Yves B, et al. On the susceptibility of SRAM-based FPGA routing network to delay changes induced by ionizing radiation [J]. IEEE Transactions on Nuclear Science, 2019, 66(3): 643-654.
- [64] Vasileios V, Aitzan S, John V, et al. Single event effects characterization of the programmable logic of Xilinx Zynq-7000 FPGA using very/ultra high-energy heavy ions [J]. IEEE Transactions on Nuclear Science, 2021, 68(1): 36-45.
- [65] David M H, Valeri K. Single event upset characterization of the Zynq-7000 ARM® Cortex<sup>TM</sup>-A9 processor unit using proton irradiation [C]. 2015 IEEE Radiation Effects Data Workshop (REDW), Boston, MA, USA. 2015.
- [66] Eduardo C, Felipe R, Gennaro S R, et al. Reliability on ARM processors against soft errors through SIHFT techniques [J]. IEEE Transactions on Nuclear Science, 2016, 63(4): 2208-2216.
- [67] Mahsa M, Mohammad T, Roel J, et al. A generic methodology to compute design sensitivity to SEU in SRAM-based FPGA [C]. 21<sup>st</sup> Euromicro Conference on Digital System Design (DSD), Prague, Czech Republic, 2018.
- [68] Benjamin J, Heather Q, Michael W, et al. Applying compiler-automated software fault tolerance to multiple processor platforms [J]. IEEE Transactions on Nuclear Science, 2020, 67(1): 321-327.
- [69] Benjamin J, Michael W, Jeffrey G, Investigating how software characteristics impact the effectiveness of automated software fault tolerance [J]. IEEE Transactions on Nuclear Science, 2021, 68(5):1014-1022.
- [70] Aaron G S. Configuration scrubbing architectures for high-reliability FPGA systems [D]. Provo: Brigham Young University, 2015.
- [71] Adria B D O. Applying dual-core lockstep in embedded processors to mitigate radiation-induced soft errors [D]. Porto Alegre: Federal University of Rio Grande do Sul, 2017.
- [72] Igor V, Unai B, Uli K, et al. Fast and accurate SEU-tolerance characterization method for Zynq SoCs [C]. 24<sup>th</sup> International Conference on Field Programmable Logic and Applications (FPL), Munich, Germany, 2014.
- [73] Igor V, Unai B, Julen G C, et al. Estimating the SEU failure rate of designs implemented in FPGAs in presence of MCUs [J]. Microelectronics Reliability, 2017, 78: 85-92.
- [74] Aaron S, Ammon G, Peter Z et al. High-Speed PCAP configuration scrubbing on Zynq-7000 all programmable SoCs [C]. 26<sup>th</sup> International Conference on Field Programmable Logic and Applications (FPL), Lausanne, Switzerland. 2016.
- [75] Farah A, Darshana J, Sompasong S, et al. LFTSM: lightweight and fully testable SEU mitigation system for Xilinx processor-based SoCs [C]. 30<sup>th</sup> International Conference on Field-Programmable Logic and Applications (FPL), Gothenburg, Sweden. 2020.
- [76] Ludovica B, Luca S. Self rerouting of dynamically reconfigurable SRAM-based FPGAs [C]. 2017 NASA/ESA Conference on Adaptive Hardware and Systems (AHS), Pasadena, CA, USA. 2017.
- [77] TR0020 Test Report. SmartFusion2 and IGLOO2 Neutron Single Event Effects (SEE) [R]. Microchip company, 2020.
- [78] Tambara L A, Chielle A E, Kastensmidt F L, et al. Analyzing the impact of radiation-induced failures in flash-based APSoC with and without fault tolerance techniques at CERN environment [J]. Microelectronics Reliability, 2017, 76-77: 640-643.
- [79] Du X C, He C H, Liu S H, et al. Measurement of single event effects induced by alpha particles in the Xilinx Zynq-7010 System-on-Chip [J]. Journal of Nuclear Science and Technology. 2017, 54(3): 287-292.
- [80] Du X C, Liu S H, Luo D Y, et al. Single event effects sensitivity of low energy proton in Xilinx Zynq-7010 System-on-Chip [J]. Microelectronic Reliability. 2017, 71: 65-70.

- [81] Du X C, Liu S H, He C H, et al. Analysis of sensitive blocks of soft errors in the Xilinx Zynq-7000 system-on-chip [J]. Nuclear Instruments and Methods in Physics Research A, 2019, 940: 125-128.
- [82] Du X C, He C H, Liu S H et al. Soft error evaluation and vulnerability analysis in Xilinx Zynq-7010 system-on chip [J]. Nuclear Instruments and Methods in Physics Research A, 2016, 831: 344-348.
- [83] Liu S H, Du X C, Du X Z, et al. Primary investigation the impact of external memory(DDR3) failure on the performance of Xilinx Zynq-7010 SoC based system(Microzed) using laser irradiation [J]. Nuclear Instruments and Methods in Physics Research B, 2017, 406: 449-455.
- [84] Yang W T, Du X C, He C H, et al. Microbeam heavy-ion single-event effect on xilinx 28-nm system on chip [J]. IEEE Transactions on Nuclear Science, 2018, 65(1): 545-549.
- [85] Wu J, Meng X K, Zhang N. Fault-tolerant technology based on FPGA: a research of logiCORE<sup>™</sup> IP soft error mitigation controller [J]. Journal of Physics: Conference Series, 2020, 1486: 052030.
- [86] Cui X H, Gao Q, Wang R C, et al. Fault-tolerant method for anti-SEU of embedded system based on dual-core processor [J]. The Journal of Engineering, 2019, 23: 8755-8759.
- [87] Du X Z, Luo D Y, He C H, et al. A fine-grained software-implemented DMA fault tolerance for SoC against soft error [J]. Journal of Electronic Testing, 2018, 34:717-733.
- [88] Pierre M, Michael H, Jeff B, et al.Neutron, 64 MeV proton & alpha single-event characterization of Xilinx 16nm FinFET Zynq<sup>®</sup> Ultrascale+<sup>™</sup> MPSoC [C]. 2017 IEEE Radiation Effects Data Workshop (REDW), New Orleans, LA, USA, 2017.
- [89] Christian M F, Pai C, Wen X Q, et al. A fault-tolerant MPSoC for CubeSats [C]. 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Noordwijk, Netherlands. 2019.
- [90] Brittany M W. Evaluating and improving the SEU reliability of artificial neural networks implemented in SRAM-Based FPGAs with TMR [D] Provo: Brigham Young University, 2020.
- [91] Oscar B, Pierre M, Jue A, et al. Evaluation of ISO 26262 and IEC 61508 metrics for transient faults of a multi-processor system-on-chip through radiation testing [J]. Microelectronics Reliability, 2020, 107: 113601.
- [92] Jordan D A. Neutron beam testing methodology and results for a complex programmable multiprocessor SoC [D] Provo: Brigham Young University, 2019.
- [93] David M H. Valeri K, Jakub B. Single event upset characterization of the Zynq ultrascale+ MPSoC using proton irradiation [C]. IEEE Radiation Effects Data Workshop (REDW), New Orleans, LA, USA, 2017.
- [94] David S L, Michael K, William E, et al. Single-event characterization of 16 nm FinFET Xilinx ultrascale+ devices with heavy ion and neutron irradiation [C]. IEEE Radiation Effects Data Workshop (REDW), Waikoloa, HI, USA. 2018.
- [95] Maximilien G, Adrian E, Thomas L, et al. Single-event characterization of Xilinx ultrascale+ MPSOC under standard and ultra-high energy heavy-ion irradiation [C]. IEEE Radiation Effects Data Workshop (REDW), Waikoloa, HI, USA. 2018.
- [96] Pierre M, Jue A, Christina S, et al. Test methodology & neutron characterization of Xilinx 16nm Zynq® Ultrascale+TM multi-processor system-on-Chip (MPSoC) [C]. IEEE Radiation Effects Data Workshop (REDW), Waikoloa, HI, USA. 2018.
- [97] Philip D, David S L, Mark L, et al. Single-event characterization of the 16 nm FinFET Xilinx ultrascale+<sup>TM</sup> RFSoC field programmable gate array under proton irradiation [C]. IEEE Radiation Effects Data Workshop, San Antonio, TX, USA. 2019.
- [98] Pierre M, Michael J H, Paula C, et al. Single-event evaluation of Xilinx 16nm ultrascale+<sup>™</sup> Single Event Mitigation IP [C]. IEEE Radiation Effects Data Workshop (REDW), Waikoloa, HI, USA. 2018.
- [99] UG585. Zynq-7000 All programmable SoC technical reference manual (v1.7) [R]. San Jose: Xilinx, 2014.

- [100] CERN. Zynq® ultrascale+<sup>TM</sup> MPSoC and road to versal [OL]. 2019-06-12 [2021-05-02]. https://indico.cern.ch/event/799275/contributions/3413723/attachments/1860639/3057774/Zynq\_Ultr ascale MPSoC Product Overview - CERN - INDICO.pdf
- [101]Qin J C, Yu J X, Yang S C, et al. Testing and commissioning of the HI-13 tandem accelerator [J]. Nuclear Instruments and Methods in Physics Research B, 1988, 268 (2-3): 316-318.
- [102] Xia J W, Zhan W L, Wei B W, et al. Heavy ions research facility in Lanzhou (HIRFL) [J]. Chinese Science Bulletin, 2016, (4-5): 467-477.
- [103] Wei J, Chen H S, Chen Y W, et al. China spallation neutron source: design, R&D, and outlook [J]. Nuclear Instruments and Methods in Physics Research A, 2009, 600: 10-13.
- [104] Zhang F Q, Guo G, Liu J C, et al. Study on experimental ability of 100 MeV proton single event effect test facility in China institute of atomic energy [J]. Atomic Energy Science and Technology, 2018, 52(11): 2101-2105.
- [105] Guo G, Xu J C, Li Z C, et al. Irradiation facility for single-event effect studies at Beijing HI-13 tandem accelerators [J]. High Energy Physics and Nuclear Physics, 2006, 30: 268-270.
- [106] Geng C, Liu J, Zhang Z G, et al. Monte Carlo simulation based on Geant4 of single event upset induced by heavy ions [J]. Science China: Physics, Mechanics & Astronomy, 2013, 56 (6): 1120-1125.
- [107] Gennady I Z, Artur M G. Compact modeling and simulation of heavy ion induced soft error rate in space environment: principles and validation [J]. IEEE Transactions on Nuclear Science, 2017, 64(8): 2129-2135.
- [108] Angelo I, Rubén García A, Markus B. Monte Carlo evaluation of single event effects in a deepsubmicron bulk technology: comparison between atmospheric and accelerator environment [J]. IEEE Transactions on Nuclear Science, 2017, 64(1): 596-604.
- [109] Aguiar Y Q, Frédéric W, Guagliardo S, et al. Radiation hardening efficiency of gate sizing and transistor stacking based on standard cells [J]. Microelectronics Reliability, 2019, 100-101: 113457.
- [110] Paulo R C V, Rodrigo T, Roger C G, et al. Fault tolerant soft-core processor architecture based on temporal redundancy [J]. Journal of Electronic Testing, 2019, 35: 9-27.
- [111] Christopher M W. Modeling and mitigation for hybrid space computers [D]. Gainesville: university of florida, 2018.
- [112] Luis A C B. Automated design flow for applying triplemodular redundancy in complex semi-custom digital integrated circuits [D]. Porto Alegre: Federal University of Rio Grande do Sul, 2018.
- [113] Sánchez-Clemente A J, Entrena L, García-Valderas M. Partial TMR in FPGAs using approximate logic circuits [J]. IEEE Transactions on Nuclear Science, 2016, 63(4): 2233-2240.
- [114] Ádria B de O, Gennaro S R, Fernanda L K et al. Lockstep dual-core ARM A9: implementation and resilience analysis under heavy ion-induced soft errors [J]. IEEE Transactions on Nuclear Science, 2018, 65(8): 1783-1790.
- [115] Server K, Eduardo W W, Zhai X J, et al. Novel lockstep-based approach with roll-back and rollforward recovery to mitigate radiation-induced soft errors [C]. 2020 IEEE Nordic Circuits and Systems Conference (NorCAS), Oslo, Norway, 2020.
- [116] UG570. Ultrascale architecture configuration user guide (v1.13) [R]. San Jose: Xilinx, 2020.
- [117] Xilinx all programmable<sup>TM</sup>. Ultrascale+ 16nm technology and portfolio announcement backgrounder
   [R]. San Jose: Xilinx, 2015.
- [118] Marcin K, Dominika P, Tomasz K. Real-time implementation of contextual image processing operations for 4K video stream in Zynq Ultrascale+ MPSoC [C]. 2018 Conference on Design and Architectures for Signal and Image Processing (DASIP). Porto, Portugal, 2018.
- [119] Yang X C, Sumanta C, Lirida N, et al. Quad-Approx CNNs for embedded object detection systems [C]. 2020 27<sup>th</sup> IEEE International Conference on Electronics, Circuits and Systems (ICECS), Glasgow, UK, 2020.

- [120] Hiroki N, Haruyoshi Y, Shimpei S. An object detector based on multiscale sliding window search using a fully pipelined binarized CNN on an FPGA [C]. 2017 International Conference on Field Programmable Technology (ICFPT), Melbourne, VIC, Australia, 2017.
- [121] UG973. Vivado design suite user guide release notes, installation, and licensing (v2019.2) [R], San Jose: Xilinx, 2019.
- [122] Oscar R, Francisco G H, Luis A A, et al. Fault injection emulation for systems in FPGAs: tools, techniques and methodology, a tutorial [J]. Sensors, 2021, 21: 1392.
- [123] UG1085. Zynq ultrascale+ device technical reference manual (v1.9) [R]. San Jose: Xilinx, 2019.
- [124] PG187. Ultrascale architecture soft error mitigation controller v3.1 LogiCORE IP product guide [R]. San Jose: Xilinx, 2019.
- [125] Mohamed E K, Kenichi A. Fault injection in dynamic partial reconfiguration design based on essential bits [J]. Journal of Aeronautics and Space Technologies, 2018,11(2): 25-33.
- [126] Andrew E W. Dynamic reconfigurable real-time video processing pipelines on SRAM-based FPGAs[D]. Provo: Brigham Young University, 2020.
- [127] UG909. Vivado design suite user guide partial reconfiguration (v2018.1) [R]. San Jose: Xilinx Inc, 2019.
- [128] Aldemir T. A survey of dynamic methodologies for probabilistic safety assessment of nuclear power plants [J]. Annals of Nuclear Energy, 2013, 52(2): 113-124.
- [129] Durga R K, Gopika V, Sanyasi R V V S, et al. Dynamic fault tree analysis using Monte Carlo simulation in probabilistic safety assessment [J]. Reliability Engineering and System Safety, 2009, 94: 872-883
- [130] Wu Y C. Development of reliability and probabilistic safety assessment program RiskA [J]. Annals of Nuclear Energy, 2015, 83: 316-321
- [131]Kenneth P R, David F H, Henry H K T, et al. Low-energy proton-induced single-event-upsets in 65 nm node, silicon-on-insulator, latches and memory cells [J]. IEEE Transactions on Nuclear Science, 2007, 54(6): 2474-2479.
- [132] Wang J L, Jeffrey P, Andrea C, et al. Study of SEU sensitivity of SRAM-Based radiation monitors in 65 nm CMOS [J]. IEEE Transactions on Nuclear Science, 2021, 68(5): 913-920.
- [133] Dodds N A, Martinez M J, Dodd P E, et al. The contribution of low-energy protons to the total onorbit SEU rate [J]. IEEE Transactions on Nuclear Science, 2015, 62(6): 2440-2451.
- [134] Wang Z B, Chen W, Yao Z B, et al. Proton-induced single-event effects on 28 nm Kintex-7 FPGA [J]. Microelectronics Reliability, 2020, 107: 113594.
- [135] James Ziegler. SRIM-2003 [J]. Nuclear Instruments and Methods in Physics Research B, 2004, 219-220: 1027-1036.
- [136] Wang B, Zeng C B, Geng C, et al. A comparison of heavy ion induced single event upset susceptibility in unhardened 6T/SRAM and hardened ADE/SRAM [J]. Nuclear Instruments and Methods in Physics Research B, 2017, 406: 437-442.
- [137] Kimbrough J R, Colella N J, Denton S M, et al. Single event effects and performance predictions for space applications of RISC processors [J]. IEEE Transactions on Nuclear Science, 1994, 41(6): 2706-2714.
- [138] Petersen E L. Single-event data analysis [J]. IEEE Transactions on Nuclear Science, 2008, 55(6): 2819-2841.
- [139] ESCC Basic Specification No. 25100. Single Event effects test method and guidelines [R]. ESCC: ESA, 2014.
- [140] Allen G, Irom F, Amrbar F, Zynq SoC radiation test results and plans for the altera MAX10 [C]. NASA Electronic Parts and Packaging Program (NEPP) Electronics Technology Workshop (ETW). Greenbelt: MD, USA, 2015.

- [141] Agostinelli S, Allison J, Amako K, et al. GEANT4—a simulation toolkit [J]. Nuclear Instruments and Methods in Physics Research A. 2003, 506: 250-303.
- [142] Allison J, Amako K, Apostolakis J, et al. Geant4 developments and applications [J]. IEEE Transactions on Nuclear Science, 2006, 53(1): 270-278.
- [143] Allison J, K. Amako K, Apostolakis J, et al. Recent developments in GEANT4 [J]. Nuclear Instruments and Methods in Physics Research A. 2016, 835: 186-225.
- [144] Weller R A, Mendenhall M H, Reed R A, et al. Monte carlo simulation of single event effects [J]. IEEE Transactions on Nuclear Science, 2010, 57(4): 1726-1746.
- [145] Brian D S, Mendenhall M H, Robert A W, et al. CREME-MC: A physics-based single event effects tool [C]. IEEE Nuclear Science Symposium & Medical Imaging Conference, Knoxville, TN, USA, 2010.
- [146] Warren K M, Weller R A, Mendenhall M H, et al. The contribution of nuclear reactions to heavy ion single event upset cross-section measurements in a high-density SEU hardened SRAM [J]. IEEE Transactions on Nuclear Science, 2005, 52(6): 2125-2131.
- [147] Ziegler J F, Ziegler M D, Biersack J P. SRIM The stopping and range of ions in matter [J]. Nuclear Instruments and Methods in Physics Research B. 2010, 268: 1818-1823.
- [148] Bernard W R, Franz X. G, Laura J. D. Single event effects test facility at oak ridge national laboratory [C]. 2015 IEEE/AIAA 34th Digital Avionics Systems Conference (DASC), Prague, Czech Republic, 2015.
- [149] Ni W J, Jing H T, Zhang L Y, et al. Possible atmospheric-like neutron beams at CSNS [J]. Radiation Physics and Chemistry, 2018, 152: 43-48.
- [150] Wang S, Fang S X, Fu S N, et al. Introduction to the overall physics design of CSNS accelerators [J]. Chinese Physics C, 2009, 33 (2): 1-3.
- [151] Chen H S, Chen Y B, Wang F W, et al. Target station status of China spallation neutron source [J]. Neutron News, 2018, 29(2): 2-6.
- [152] Nilesh B. Mitigating single-event upsets using cypress's 65-nm asynchronous SRAM [R]. San Jose, 2016.
- [153]Kumar S, Agarwal S, Jung J P. Soft error issue and importance of low alpha solders for microelectronics packaging [J]. Reviews on Advanced Materials Science, 2013, 34: 185-202.
- [154] Normand E, Baker T J. Altitude and latitude variations in avionics SEU and atmospheric neutron flux [J]. IEEE Transactions on Nuclear Science, 1993, 40(6): 1484-3434.
- [155] Gordon M S, Goldhagen P, Rodbell K P, et al. Measurement of the flux and energy spectrum of cosmic-ray induced neutrons on the ground [J]. IEEE Transactions on Nuclear Science, 2004, 51(6): 3427-3434.
- [156] Brookhaven National Laboratory National Nulcear Data Center (NNDC). Evaluated nuclear data file (ENDF) [OL]. 2018-2-2 [2021-05-13]. https://www.nndc.bnl.gov/exfor/endf00.jsp
- [157] Dario B, Andrea C, Natalia D, et al. Neutron production targets for a new single event effects facility at the 70 MeV Cyclotron of LNL-INFN [J]. Physics Procedia, 2012, 26: 284-293.
- [158] Acosta-Urdaneta G C, Bisello D, Esposito J, et al. ANEM: the future neutron production target for single event effect studies at LNL [J]. Il Nuovo Cimento, 2015, 38C: 184.
- [159] UG116. Device Reliability Report (v10.8.2) [R]. San Jose: Xilinx, 2018.
- [160] Leray J L. Effects of atmospheric neutrons on devices, at sea level and in avionics embedded systems [J]. Microelectronics Reliability, 2007, 47: 1827-1835.
- [161] Markus P, Per-Erik T. Measurements and simulations of single-event upsets in a 28-nm FPGA [C]. Topical Workshop on Electronics for Particle Physics, Santa Cruz, California, USA, 2017.
- [162] Chen W, Yang H L, Guo X Q, et al. The research status and challenge of space radiation physics and application [J]. Chinese Science Bulletin, 2017, 62(10): 978-989.

- [163] Yamazaki T, Kato T, Uemura T, et al. Origin analysis of thermal neutron soft error rate at nanometer scale [J]. Journal of Vacuum Science & Technology B, 2015, 33(2): 020604.
- [164] Weulersse C, Houssany S, Guibbaud N, et al. Contribution of thermal neutrons to soft error rate [J]. IEEE Transactions on Nuclear Science, 2018, 65(8): 1851-1857.
- [165] Fang Y P, Oates A S. Thermal neutron-induced soft errors in advanced memory and logic devices [J]. IEEE Transactions on Device and Materials Reliability, 2014, 14(1): 583-586.
- [166] Tian Y S, Hu Z L, Tong J F, et al. Design of beam shaping assembly based on 3.5 MeV radiofrequency quadrupole proton accelerator for boron neutron capture therapy [J]. Acta Physica Sinica, 2018, 67: 142801.
- [167] Uwe S, Hwang C S, Funakubo H. Ferroelectricity in doped hafnium oxide: materials, properties and devices [M]. Burlington: Woodhead Publishing, 2019.
- [168] Czernohorsky M, Seidel K, Kühnel K, et al. High-K metal gate stacks with ultra-thin interfacial layers formed by low temperature microwave-based plasma oxidation [J]. Microelectronic Engineering, 2017, 178: 262-265.
- [169] Chiang Y, Tan C M, Chao T C, et al. Investigate the equivalence of neutrons and protons in single event effects testing: a Geant4 study [J]. Applied Science, 2020, 10: 3234
- [170] Pablo F R V. Evaluation of the SEE sensitivity and methodology for error rate prediction of applications implemented in Multi-core and Many-core processors [D]. Grenoble: Grenoble Alpes University, 2017.
- [171] Tambara L A, Kastensmidt F L, Medina N H, et al. Heavy ions induced single event upsets testing of the 28 nm Xilinx Zynq-7000 all programmable SoC [C]. IEEE Nuclear and Space Radiation Effects Conference, 2015, Boston, MA, USA.
- [172] Ladbury R. Statistical properties of SEE rate calculation in the limits of large and small event counts[J]. IEEE Transactions on Nuclear Science, 2007, 54(6): 2113-2119.
- [173] Chen R, Han J W, Zheng H S, et al. Comparative research on "high currents" induced by single event latch-up and transient-induced latch-up [J]. Chinese Physics B, 2015, 24(4): 046103.
- [174] Joplin M. A method for characterization of single-event latchup technologies as a function of geometric variation [D]. Chattanooga: University of Tennessee at Chattanooga, 2018.
- [175] Kasap S, Wachter E W, Zhai X J, et al. Survey of soft error mitigation techniques applied to LEON3 soft processors on SRAM-Based FPGAs [J]. IEEE Access, 2020, 8: 28646-28658.
- [176] Goloubeva O, Rebaudengo M, Reorda M, et al. Software-implemented hardware fault tolerance [M]. Berlin: Springer, 2006.
- [177] Koren I, Krishna C. Fault-tolerant systems [M]. San Francisco: Elsevier, 2007.
- [178] Algirdas A. Fault-tolerant systems [J]. IEEE Transactions on Computers, 1976: C-25 (12): 1304-1312.
- [179] Alfredo B, Stefano D C, Giorgio D N, et al. A watchdog processor to detect data and control flow errors [C]. 2003 9th IEEE On-Line Testing Symposium, Kos, Greece, 2003.
- [180] Jacob B, A Review of watchdog architectures and their application to cubesats [OL]. 2010-04-28. https://www.beningo.com/wp-content/uploads/images/Papers/WatchdogArchitectureReview.pdf
- [181]Ramakrishna P V. Approaches to multiprocessor error recovery using an on-chip interconnect subsystem [D]. Amherst: University of Massachusetts Amherst, 2010.
- [182] Adria B d O, Lucas A T, Fernanda L K. Applying lockstep in dual-core ARM cortex-A9 to mitigate radiation-induced soft errors [C]. IEEE 8<sup>th</sup> Latin American Symposium on Circuits & Systems (LASCAS), Bariloche, Argentina, 2017.
- [183] Jordan D A, Jennings C L, Michael J W. Neutron radiation beam results for the Xilinx ultrascale+ MPSoC [C]. 2018 IEEE Radiation Effects Data Workshop (REDW), Waikoloa, HI, USA, 2018.
- [184] Cadenas J O, Sherratt R S, Pablo H, et al. Parallel pipelined array architectures for real-time histogram computation in consumer devices [J]. IEEE transactions on consumer electronics. 2011, 57 (4): 1460-

1464.

- [185] Archana H R, Vasundara P K S. An investigation towards effectiveness in image enhancement process in MPSoC [J]. International Journal of Electrical and Computer Engineering, 2018, 8(2): 963-970.
- [186] Yang W T, Du B Y, He C H, et al. Reliability assessment on 16 nm Ultrascale+ MPSoC using fault injection and fault tree analysis [J]. Microelectronics Reliability, 2021, 120: 114122.
- [187] Sohag K. An overview of fault tree analysis and its application in model based dependability analysis [J]. Expert Systems With Applications, 2017, 77: 114-135.
- [188]Kim S, Somani A K. Soft error sensitivity characterization for microprocessor dependability enhancement strategy [C]. IEEE International Conference on Dependable Systems and Networks, Washington, D.C., USA, 2002.
- [189] Nishmitha N K. Tutorial on partial reconfiguration of image processing blocks using vivado and SDK [OL]. 2016-04-09. [2021-07-10] http://ivpcl.unm.edu/ivpclpages/Research/drastic/PRWebPage/PRLabReport.pdf
- [190] Llamocca D, Pattichis, M. Dynamic energy, performance, and accuracy optimization and management for separable 2-D filtering for digital video processing [J]. ACM Transactions on Reconfigurable Technology and Systems, 2015, 7(4): 1-30.
- [191] Yang W T, Li Y H, He C H. Fault injection and failure analysis on Xilinx 16 nm FinFET Ultrascalep MPSoC [J]. Nuclear Engineering and Technology, 2022, in press, DOI: https://doi.org/10.1016/j.net.2021.12.022
- [192] Chen Y Y, Wang Y C, Peng J. SoC-level fault injection methodology in SystemC design platform [C]. Asia Simulation Conference-International Conference on System Simulation and Scientific Computing, Beijing, China, 2008.
- [193] Chen Y Y, Hsu C H, Leu K L. SoC-level risk assessment using FMEA approach in system design with SystemC [C]. IEEE International Symposium on Industrial Embedded Systems, Lausanne, Switzerland, 2009.
- [194] Antonio R N, Daniel G G, Juan P D M, et al. Efficient memory organization for DNN hardware accelerator implementation on PSoC [J]. Electronics, 2021, 10(94): 1-10.
- [195] Sawaguchi S, Nishi H. Slightly-slacked dropout for improving neural network learning on FPGA [J]. ICT Express, 2018, 4: 75–80.
- [196] Hao C, Zhang X, Li Y, et al. FPGA/DNN Co-Design: An efficient design methodology for IoT intelligence on the edge [C]. The 56<sup>th</sup> Annual Design Automation Conference, Las Vegas, USA, 2019.
- [197] WP521. Convolutional neural network with INT4 optimization on Xilinx devices (v1.0.1) [R]. San Jose: Xilinx, 2020.
- [198] Venieris S I, Kouris A, Bouganis C S. Toolflows for mapping convolutional neural networks on FPGAs: a survey and future directions [J]. ACM Computing Surveys, 2018, 51(3): 1-39.
- [199] Myrgård M R. Acceleration of deep convolutional neural networks on multiprocessor system-on-chip [D]. Uppsala: Uppsala University, 2019.
- [200] Nakahara H, Shimoda M, Sato S. A tri-state weight convolutional neural network for an FPGA: applied to YOLOv2 object detector [C]. 2018 International Conference on Field-Programmable Technology (FPT), Naha, Japan, 2018.
- [201] Libano F, Wilson B, Wirthlin M, et al. Understanding the impact of quantization, accuracy, and radiation on the reliability of convolutional neural networks on FPGAs [J]. IEEE Transactions on Nuclear Science, 2020, 67(7): 1478-1484.
- [202] Wang H B, Wang Y S, Xiao J H, et al. Impact of single-event upsets on convolutional neural networks in xilinx zynq FPGAs [J]. IEEE Transactions on Nuclear Science, 2021, 68 (4): 394-401.
- [203] Sabogal S, George A D, Crum G A. ReCoN: A reconfigurable CNN acceleration framework for hybrid semantic segmentation on hybrid SoCs for space applications [C]. 2019 IEEE Space Computing

Conference (SCC), Pasadena, CA, USA, 2019.

- [204] Du B Y, Azimi S, Sio C D, et al. On the reliability of convolutional neural network implementation on SRAM-based FPGA [C]. 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Noordwijk, Netherlands, 2019.
- [205] Vipin K. ZyNet: automating deep neural network implementation on low-cost reconfigurable edge computing platforms [C]. 2019 International Conference on Field-Programmable Technology (ICFPT), Tianjin, China, 2019.
- [206] ZyNet git repository [OL]. 2020-06-10 [2021-05-10]. https://github.com/dsdnu/zynet
- [207] Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition [J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
- [208] Vorgelegt V. Studying deep neural network (DNN) architectures for improving sensitivity in ttH(bb) single lepton channel At the CMS Experiment [D]. Hamburg: University of Hamburg, 2019.
- [209] Androsov K. Identification of tau lepton using deep learning techniques at CMS [C]. 27<sup>th</sup> International Symposium on Nuclear Electronics and Computing, Montenegro, Budva, 2019.

# Achievements

- Weitao Yang, Xuecheng Du, Jinlong Guo, et al. Preliminary single event effect distribution investigation on 28 nm SoC using heavy ion microbeam [J]. Nuclear Instruments and Methods in Physics Research Section B: Beam Interactions with Materials and Atoms, 2019(450): 323-326. (SCI: 000474501400066)
- Weitao Yang, Yonghong Li, Yang Li, et al. Atmospheric neutron single event effect test on Xilinx 28 nm system on chip at CSNS-BL09 [J]. Microelectronics Reliability, 2019(99): 119-124. (SCI: 000496833600013)
- [3] Weitao Yang, Qian Yin, Yang Li, et al. Single-event effects induced by medium-energy protons in 28 nm system-on-chip [J]. Nuclear Science and Techniques, 2019(30): 151. (SCI: 000488225200007)
- [4] Weitao Yang, Yonghong Li, Weidong Zhang, et al. Electron inducing soft errors in 28 nm systemon-Chip [J]. Radiation Effects and Defects in Solids. 2020(175): 745-754. (SCI: 000532149100001)
- [5] Wei tao Yang, Yong hong Li, Yaxin Guo, et al. Investigation of Single Event Effect in 28nm Systemon-Chip with Multi Patterns [J]. Chinese Physics B. 2020(29):108504. (SCI: 000575330300001)
- [6] Wei tao Yang, Boyang Du, Chaohui He, et al. Reliability assessment on 16 nm ultrascale+ MPSoC using fault injection and fault tree analysis [J]. Microelectronics Reliability, 2021(120): 114122. (SCI: 000652343900008)
- [7] Weitao Yang, Xuecheng Du, Yonghong Li, et al. Single-event-effect propagation investigation on nanoscale system on chip by applying heavy-ion microbeam and event tree analysis [J]. Nuclear Science and Techniques, 2021(32):106.
- [8] Weitao Yang, Yang Li, Yonghong Li, et al. Geant4 simulation of proton & neutron single event effect in 28nm system-on-chip [C]. 3<sup>rd</sup> International Conference on Radiation Effects of Electronic Devices, Chongqing, China, 2019.
- [9] Weitao Yang, Xuecheng Du, Yonghong Li, et al. Microbeam heavy ion and event tree analysis investigating single event effect propagation in system-on-chip [C]. 2020 IEEE Nuclear and Space Radiation Effects Conference, Santa Fe, USA, 2020.
- [10] Weitao Yang, Yonghong Li, Chaohui He, Fault injection on 16nm FinFET ultrascale+ MPSoC [C]. 4<sup>th</sup> International Conference on Radiation Effects of Electronic Device, Xi'an, China, 2021.
- [11] Zhiliang Hu, Weitao Yang, Yonghong Li, et al. Atmospheric neutron single event effect in 65 nm microcontroller units by using CSNS-BL09 [J]. ACTA PHYSICA SINICA, 2019(68): 238502. (SCI: 000501344000033)
- [12] Tingting Wang, Weitao Yang, Bo Li, et al. Radiation-Resistant CsPbBr3 Nanoplate-Based Lasers
   [J]. ACS Applied Nano Materials. 2020(3): 12017-12024. (SCI: 000603402500037)