# POLITECNICO DI TORINO Repository ISTITUZIONALE

On Assessing the Robustness of RISC-V Soft Cores for Space Systems by Mission-Tailored SEU Analysis

**Original** 

On Assessing the Robustness of RISC-V Soft Cores for Space Systems by Mission-Tailored SEU Analysis / Vacca, Eleonora; De Sio, Corrado; Azimi, Sarah; Sterpone, Luca. - ELETTRONICO. - (In corso di stampa). (Intervento presentato al convegno 31st IEEE International Conference on Electronics Circuits and Systems tenutosi a Nancy (FRA) nel 18-20 November 2024).

Availability: This version is available at: 11583/2992722 since: 2024-09-24T07:46:30Z

Publisher: IEEE

Published DOI:

Terms of use:

This article is made available under terms and conditions as specified in the corresponding bibliographic description in the repository

IEEE postprint/Author's Accepted Manuscript Publisher copyright

©9999 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collecting works, for resale or lists, or reuse of any copyrighted component of this work in other works.

(Article begins on next page)

# On Assessing the Robustness of RISC-V Soft Cores for Space Systems by Mission-Tailored SEU Analysis

Eleonora Vacca, Corrado De Sio, Sarah Azimi, Luca Sterpone Politecnico di Torino, Turin, Italy

*Abstract***— We propose a methodology for evaluating RISCV suitability for space missions, which combines radiation test and mission environment characterization with SEU hardware emulation to estimate the system mean-time-to-failure accurately.** *Index Terms***— FPGA, RISCV, SEU**

#### I. INTRODUCTION

Over the past ten years, RISC-V has attracted growing interest in the space domain. While SPARC-based systems are still widely used, transitioning to more modern technologies is becoming increasingly urgent [\[1\] .](#page-4-0) The RISC-V ISA is one of the primary candidates for the succession. RISC-V ISA is open and royalty-free with active standardization. Such characteristics provide a rapidly growing environment, leading to the proliferation of either licensed or free IP solutions and attracting the interest of industry and academia. Such characteristics enable many appealing approaches to empowering RISC-V solutions for space applications, such as low-level architectural modifications for radiation mitigation and ISA extension for high-performance purposes.

Interest in Commercial off-the-shelf (COTS) solutions for space applications is also rising. Since using radiation-hardened devices introduces substantial limitations due to costs, availability, and technological advancement, relying on COTS devices is becoming a common strategy [\[2\]](#page-4-1) [-\[5\] .](#page-4-2) In particular, the high customization and performance offered by Field Programmable Gate Arrays (FPGAs) facilitate the deployment of customized design solutions and mitigation approaches. As proven by their application in missions such as NASA's DAR[T\[3\] a](#page-4-3)nd ESA's HERA [5], they are currently adopted in spacecraft on-board computing systems, spanning from algorithm accelerators [\[1\] t](#page-4-0)o main processing systems [\[2\]](#page-4-1) [-\[5\] .](#page-4-2)  The synergism between the RISC-V open standard and the high flexibility and customization of FPGA systems originated different solutions for implementing soft core processors in the FPGA logic, including reliability-oriented solutions [\[6\]](#page-4-4) 

However, one of the main concerns in using COTS FPGA is the Single Event Effect (SEE) sensitivity. For this reason, radiation testing is required to evaluate the system's robustness. However, many components must be tested within a limited schedule and constrained budget, especially for space missions. As a result, radiation testing is often performed to investigate SEE cross sections, such as SEUs, and disruptive events like Single Event Functional Interrupt (SEFI) or Single Event Latchup (SEL) [\[7\]](#page-4-5) [\[8\] .](#page-4-6) However, due to the high complexity of modern systems, traditional SEU characterization is no longer sufficient for assessing the robustness of such systems. The topology of the electronic circuit implemented in the configurable hardware of the FPGA is defined by the Configuration Data (CDATA) stored in the Configuration Memory (CRAM). Data corruption in CRAM due to radiation may compromise the system's integrity, affecting the overall mission success [\[7\] .](#page-4-5) However, the analysis methodology for such complex systems is still in the early stages and not standardized. The SEU cross-section of the device is insufficient for determining the systems' robustness since other elements must be considered, such as design sensitivity to SEUs, area, and the software stack in the case of soft processors. Also, their efficiency can depend on the SEU rate expected for the particular mission when considering specific mitigation solutions, such as scrubbing or redundancy. In a preliminary mission design phase, having a comprehensive evaluation of a flight computer module architecture combined with a target technology sensitivity is crucial in developing effective mitigation strategies and avoiding excessive spacecraft shielding.

# *Main Contribution*

We propose an efficient methodology to estimate the mean time to failure of complex systems, such as RISC-V soft processors, related to different space mission scenarios by combining radiation testing for technology sensitivity assessment, radiation environment simulation, and hardware fault emulation campaigns. The methodology allows the evaluation of systems implemented in FPGA and mitigation strategies against SEUs tailored to specific mission analysis. It accounts for how technology and mission profile will affect the SEU rate and how radiation-induced CRAM modification propagates from the CRAM level to the system level, providing a realistic and result-oriented estimation of the system sensitivity.

The adaptability of the methodology is proven by applying it to an open-source RISC-V soft processor. The processor has been hardened by Triple Modular Redundancy (TMR) and evaluated for space applications in Geostationary Earth Orbit (GEO) and Low Earth Orbit (LEO) environments. An AMD Artix-7 28nm CMOS technology FPGA has been characterized by radiation testing for different energies. Exploiting the OMERE [\[10\] r](#page-4-7)adiation environment characterization tool, we evaluated the SEU rate affecting the technology when adopted in the GEO and LEO environment, using two satellite orbits currently operative as a case study. The RISC-V system implemented on the target FPGA part has been evaluated to estimate its CRAM SEUs tolerance, evaluating different software benchmarks. The methodology comprehensively estimates the system's expected mean time to failure, considering technology characterization, radiation environment, and application, allowing for an estimation of robustness and mitigation efficiency.

### II. EXPERIMENTAL ANALYSIS

The current section presents the proposed methodology and its adaptability by applying it to a hardened version of an opensource architecture to be evaluated for different space mission profiles.

# *A. Methodology Overview*

The proposed methodology is schematized in Figure 1. It enables the evaluation of different solutions for space missions. Radiation testing is conducted once to characterize the sensitivity of the target FPGA device. Subsequently, the resulting SEU cross-section can be utilized to assess various radiation environment scenarios, relying on space environment radiation tools. In parallel, specific soft-processor architectures and mitigation approaches can be evaluated against SEUs. Combining the system error rate against SEUs with the mission estimated SEU rate, we can predict the MTTF for the target mission profiles. The methodology enables easy comparison of various technologies and solutions. With access to SEU crosssection data, which can also be sourced from literature and the actual device, it is possible to identify if a solution could meet the mission constraints and select the optimal solution for the target space mission.



Figure 1. Overview of the proposed methodology.

# *B. Platform under Evaluation and Device Characterization*

RISC-V ISA has originated a vast proliferation of commercial and open-source IP cores. As a case study, we propose evaluating a hardened version of the open-source NEORV32 processor [\[9\] .](#page-4-8) The NEORV32 processor is a tiny, highly customizable, microcontroller-like RISC-based solution. We enhanced the system with global TMR to propose a version of NEORV32 suitable for space applications. We included floating-point and multiplication-division units to support the ZFINX and M ISA extensions.

To test the functionality and SEU sensitivity of the processor system, we selected a set of general-purpose software benchmarks: *Whetstone*, *Dhrystone*, *Linkpack*, and *Matmul*. The *Whetstone* and *Dhrystone* benchmarks are performance benchmarks widely used for systems evaluation, including the suitability of the LEON3 processor for deep space missions in

As a COTS device, we selected a Zynq XC7Z020 SoC. The SoC includes a 28 nm high-k metal gate (HKMG) CMOS programmable AMD Artix7 FPGA, where the NEORV system has been implemented. In Table I, the power consumption and resource utilization of the system are reported.





To evaluate the expected SEU rate for a specific mission profile for the COTS device, we must characterize the device's memories cross-section to SEUs at different protons and heavy ion energies. Due to the number of sensitive elements, criticality, and radiation sensitivity, SEU in CRAM is the primary source of failure for such devices. Proton crosssections for CRAMs have been obtained from a proton radiation test we performed to characterize the target FPGA technology. The radiation test has been carried out at the PSI facility using proton fluxes of energies ranging from 16 MeV up to 200 MeV [\[7\] .](#page-4-5) Cross-section data for CRAM against protons is reported in Fig. 2.



Figure 2. Cross section of 28 nm Zynq's CRAM against proton.

The heavy ion cross-section has been retrieved from heavy ion radiation test results published in [\[11\] c](#page-4-10)onducted in CERN on the same technology. Authors considered heavy ions with 2.9, 8.8, and 12.45 LET.

# *C. Radiation Environments Characterization*

The space mission feasibility assessment for the target RISCV processor requires considering the radiation environment where it is expected to operate. As cases of study, we selected two benchmark satellite orbits to cover GEO and LEO environment characteristics. To focus on real-case scenarios and prove the generality of the approach, we retrieved orbit parameters from two current operative satellites, presented in Table 2.





The EUTELSAT 10B is a telecommunication satellite launched on November 23, 2022, and entered service in July 2023, operating in GEO. The SENTINEL-3B satellite is dedicated to studying Earth's oceans and vegetation, part of the Copernicus program, launched on April 25, 2018, operating in LEO. The orbit parameters have been used to configure the OMERE radiation environment tool [\[10\] f](#page-4-7)or a target mission duration of 1 year. For each orbit, two distinct analyses are carried out to evaluate how the radiation sources are affected by solar activity and, consequently, how the TMR processor will react to environmental changes. By simulating two launch dates, one occurring in 2020 and one in 2024, we can simulate the radiation source during the solar minimum and the solar maximum of the current solar cycle, respectively. Based on the mission parameters, we characterized the environment relying on the OMERE tool. The trapped particle fluxes are derived by adopting AP8MIN and AP8MAX, while the GCR ISO 15390 model is selected to evaluate the galactic cosmic ray contributions. Finally, the solar particle contributions are estimated by adopting the ESP model that is compliant with the ECSS 10-04 standard. Once all the particle fluxes in the target environment are obtained, the SEE rate can be calculated. This step is strongly technology-dependent. Hence, we provided the tool with both the proton and heavy ions SEU cross-section per bit to achieve a consistent estimation. The resulting SEU bit per day estimation for the different orbit conditions and solar activity is reported in Figure 3. To reflect a real-world use case in calculating the SEU rate, we assessed various aluminum shielding configurations, ranging from 1.5 to 5 mm in increasing thickness. Given the same technology, the SEU sensitivity strongly depends on the environmental condition. Results show that the worst-case working condition could be the LEO environment during Sun minimum, as the contribution of trapped protons increases. On the other hand, both GEO conditions behave better than LEO. Indeed, despite being almost fully exposed to GCRs, the contribution of trapped protons is missing, leading to a lower SEU rate per day. The estimated SEU rates will serve as inputs to link the application error rate of the RISCV processor and the mission's environmental conditions to compute the mean time to failure.



Figure 3. Estimated SEU rate in XC7Z020 for different environmental conditions and Al shielding thickness.

## *D. Application Error Rate Prediction*

Characterizing the device without considering the implemented circuit characteristics can result in inaccuracies when evaluating the system's sensitivity when considering FPGA-based designs. Even if the corruption of the content of CRAM can lead to modification of the implemented circuit, eventually leading to system failure, not all CRAM bits are equally sensitive since many of the resources available on FPGAs are not used due to the FPGA paradigm. Additionally, mitigation approaches such as TMR can significantly reduce the impact of SEUs on the system's robustness when affecting logic and CRAM. Relying only on the SEU cross-section of the device can produce an inaccurate estimation, and application and mitigation approaches must also be considered to estimate the system's actual sensitivity to SEUs. It is worth emphasizing how, due to the programmable nature of FPGA, such analysis is crucial to link the system's reliability with the mission profile since the SEU sensitivity can vary significantly depending on the configuration and the solution adopted.

We propose an application error rate prediction based on SEU emulation in the configuration memory. SEUs in CRAM are known to be the dominant mechanism of soft errors in SRAM-based FPGA. One of the advantages of the proposed solution is that it evaluates the actual system without recurring to the simulation or model of the device, which can lead to inaccurate estimation, especially since device architecture is usually vendor's confidential. Additionally, hardware emulation is much faster than simulation and easily parallelizable. We used AMD's essential bits and SEM-IP features to emulate SEUs in CRAM. SEM-IP is a core that can implemented in the FPGA logic to access and manipulate CRAM content. Essential bits are a subset of the CRAM that may induce failure in the system if corrupted. They allow the analysis to focus only on the memory's sensitive portion. The SEU emulation campaign accumulated SEUs in the Essential Bits until the system generated faulty output or halted. The system was evaluated using the software application after each single SEU emulation. The failure rate was evaluated from 1,000 experiments, with accumulation for a total of emulated SEUs ranging from 20,000 to 30,000 depending on application reliability. The exponential reliability functions of different applications are reported in Figure 4, while the classification of the failure in System Halt or Silent Data Corruption (SDC) is resumed in Table 3.



Figure 4. Application Reliability estimation from fault emulation

Table 3. System faulty behavior classification.

| Fault Type | Whetstone | Linpack | Dhrystone | Matmul |
|------------|-----------|---------|-----------|--------|
| Halt [%]   | 0.96      | 0.96    | 0.99      | 0.92   |
| SDC [%]    | 0.04      | 0.04    | $_{0.01}$ | 0.08   |

### *E. MTTF Estimation*

Up to this point, the evaluation of the device's SEU sensitivity and the processor system has remained disjointed. The last step of the methodology aims to evaluate the expected robustness of the system within the framework of the specific space mission. To compute the Mean Time to Failure (MTTF) for the various mission scenarios, we need to relate the sensitivity of the technology with the application system. The application error rate prediction focused the analysis on the essential bits of the design. The system under test has 1,816,228 essential bits, equal to about 5% of the total CDATA. Notably, any SEU occurrences in other CRAM bits are unlikely to impact the processor's functionality. We derive the expected number of SEUs daily affecting TMR NEORV32's essential bits by multiplying the SEU rate with the number of essential bits. The results are reported in Table 4.

| Table +. Essential's bit DEOs per day per design |             |           |           |           |  |  |
|--------------------------------------------------|-------------|-----------|-----------|-----------|--|--|
| Al thickness                                     | $LEO - Sun$ | $LEO-Sun$ | $GEO-Sun$ | $GEO-Sun$ |  |  |
| $\lceil$ mm $\rceil$                             | Min         | Max       | Min       | Max       |  |  |
| 1.5                                              | 5.25E-02    | 3.79E-02  | 2.07E-02  | 3.71E-02  |  |  |
| 2.0                                              | 5.13E-02    | 3.62E-02  | 2.02E-02  | 3.30E-02  |  |  |
| 2.5                                              | 5.03E-02    | 3.50E-02  | 1.97E-02  | 2.98E-02  |  |  |
| 3                                                | 4.93E-02    | 3.39E-02  | 1.93E-02  | 2.75E-02  |  |  |
| 4                                                | 4.75E-02    | 3.21E-02  | 1.84E-02  | 2.40E-02  |  |  |
|                                                  | 4.54E-02    | 3.02E-02  | 1.77E-02  | 2.16E-02  |  |  |

Table 4. Essential's bit SEUs per day per design

The MTTF is obtained by considering the application error rate and expected essential bits SEU per bit per design. Please note that while the expected SEU rate in memory is the same for all the applications since both the target device and the soft processor implementation are the same, the application error rate varies among applications. For analysis purposes, we divided the set of benchmarks into two groups, *Floating-Point* (FP) based and *Memory intensive* (MEM) ones. Such division is based on task behaviors. FP group approximates the Whetstone and Linpack behaviors, while MEM is the *Dhrystone* and *Matmul* behavior. Additionally, the application error rates resulting from the evaluation step are similar within the group, and the average error rate of the two applications within the group has been used. The MTTF due to different combinations of shielding thickness, LEO and GEO environments, and solar activity is reported in Figure 5.



Figure 5. MTTF due to different combinations of shielding thickness, LEO and GEO environments, and solar activity.

The worst-case scenario is the execution of FP applications in LEO orbits in minimum solar activity, where increasing shielding thickness does not effectively improve the processor's

MTTF, which is almost stable at 1.0 years. In general, it seems GEO benefits from shielding more significantly than LEO. The estimated MTTFs suggest that the NEORV32 processor robustness, despite the TMR hardening, suffers when employed in LEO orbits while surviving up to 5 years in GEO when executing MEM-intensive workload. However, our evaluation does not consider natural shielding from the spacecraft and its geometry, which also influences the device exposure. Finally, additional mitigation techniques, such as external scrubbing [\[12\] ,](#page-4-11) could increase the system's robustness, even if they come with a cost.

#### III. CONCLUSION

The paper introduces a methodology that can be adopted in the reliability assessment of FPGA-based systems in the early and advanced stages of space mission planning, considering technology, mission profile, and application. The methodology has been applied to a hardened version of the NEORV32 RISC-V. Despite sharing the same architecture and technology, our analysis highlighted how a LEO or GEO mission can influence MTTF by up to a factor of 5. These findings highlight the flexibility of the methodology in assessing the suitability of complex systems in specific space missions, especially during early development phases, with reduced radiation testing.

### <span id="page-4-3"></span>**REFERENCES**

- <span id="page-4-0"></span>[1] G. Furano et al., "A European Roadmap to Leverage RISC-V in Space Applications," 2022 IEEE Aerospace Conference (AERO), Big Sky, MT, USA, 2022, pp. 1-7, doi: 10.1109/AERO53065.2022.9843361.
- <span id="page-4-1"></span>[2] C. Guerra et al., "Geostationary, Multi-Carrier Mesh Network Enabled by On Orbit Reprogrammable Commercial FPGAs," 2023 IEEE Aerospace Conference, Big Sky, MT, USA, 2023, pp. 1-10.
- <span id="page-4-9"></span>[3] D. L. Bekker et al., "Performance Analysis of Standalone and In-Fpga LEON3 Processors for Use in Deep Space Missions," 2019 IEEE Aerospace Conference, Big Sky, MT, USA, 2019, pp. 1-17, doi: 10.1109/AERO.2019.8742194.
- [4] S. Zhan et al., "The Design and Verification of the DART Single Board Computer FPGA," 2021 IEEE Aerospace Conference (50100), Big Sky, MT, USA, 2021, pp. 1-11, doi: 10.1109/AERO50100.2021.9438523.
- <span id="page-4-2"></span>[5] D. M. Marcos and A. Valverde Carretero, "DHS architecture for HERA Deep Space Mission," 2023 European Data Handling & Data Processing Conference (EDHPC), Juan Les Pins, France, 2023, pp. 1-6, doi: 10.23919/EDHPC59100.2023.10396073.
- <span id="page-4-4"></span>[6] Barbirotta, M. et al., "Evaluation of Dynamic Triple Modular Redundancy in an Interleaved-Multi-Threading RISC-V Core". J. Low Power Electron. Appl. 2023, 13, 2.
- <span id="page-4-5"></span>[7] E. Vacca et al., "Failure rate analysis of radiation tolerant design techniques on SRAM-based FPGAs", Microelectronics Reliability, Volume 138, 2022, 114778, ISSN 0026-2714, DOI: Volume 138, 2022, 114778, ISSN 0026-2714, DOI: 10.1016/j.microrel.2022.114778.
- <span id="page-4-6"></span>[8] D. Rizzieri et al., "Programmable SEL Test Monitoring System for Radiation Hardness Assurance," 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks Supplemental Volume (DSN-S), Porto, Portugal, 2023, pp. 217-223.
- <span id="page-4-8"></span>[9] S. Nolting, "The NEORV32 RISC-V Processor. 10.5281/zenodo.8260609
- <span id="page-4-7"></span>[10] N. Sukhaseum et al., "Statistical estimation of uncertainty for single event effect rate in OMERE,"12th European Conference on Radiation and Its Effects on Components and Systems, 2011, pp. 401-407.
- <span id="page-4-10"></span>[11] V. Vlagkoulis et al., "Single Event Effects Characterization of the Programmable Logic of Xilinx Zynq-7000 FPGA Using Very/Ultra High-Energy Heavy Ions," in IEEE Transactions on Nuclear Science, vol. 68, no. 1, pp. 36-45, Jan. 2021, doi: 10.1109/TNS.2020.3033188
- <span id="page-4-11"></span>[12] M. Berg et al., "Effectiveness of Internal Versus External SEU Scrubbing Mitigation Strategies in a Xilinx FPGA: Design, Test, and Analysis," in IEEE Transactions on Nuclear Science, vol. 55, no. 4, pp. 2259-2266, Aug. 2008, doi: 10.1109/TNS.2008.2001422.