Development Flow for On-Line Core Self-Test of Automotive Microcontrollers

Paolo Bernardi, Member, IEEE, Riccardo Cantoro, Student Member, IEEE, Sergio De Luca, Ernesto Sánchez, Senior Member, IEEE, Alessandro Sansonetti

Abstract—Software-Based Self-Test (SBST) is an effective methodology for devising the on-line testing of Systems-on-Chip (SoCs). In the automotive field, a set of SBST programs to be run during mission mode is also called Core Self-Test (CST) library. This paper introduces many new contributions: (1) it illustrates the several issues that need to be taken into account when generating test programs for on-line execution; (2) it proposed an overall development flow based on ordered generation of test programs that is minimizing the computational efforts; (3) it is providing guidelines for allowing the coexistence of the CST library with the mission application while guaranteeing execution robustness. The proposed methodology has been experimented on a large industrial case study. The coverage level reached along 1 year of team work is over 87% of Stuck-At fault coverage and execution time is compliant with the ISO26262 specification. Experimental results show that alternative approaches may request excessive evaluation time thus making the generation flow unfeasible for large designs.

Index Terms—Microprocessors and microcomputers, Reliability and Testing, Software-Based Self-Test

1 INTRODUCTION

The diffusion of electronic systems in the automotive field is increasing at a fast pace, and car makers constantly demand from electronic manufacturers for faster, less expensive, less power-consuming and more reliable devices. Microprocessor-based systems are employed in cars for a great variety of applications, ranging from information to engine and vehicle dynamics control, including safety-related systems such as airbag and braking control.

The use of such devices in safety- and mission-critical applications raises the need for total dependability. This requirement translates in a series of system audit processes that need to be applied throughout the product lifecycle. Some of these processes are common in today’s industrial design and manufacturing flows, and include risk analysis, design verification and validation, performed since the early phases of product development, as well as various test operations during and at the end of manufacturing and assembly steps. Increasingly often, additional test operations need to be applied also during the product mission life, and may include periodic on-line testing and/or concurrent error detection. The reliability requirements need to be met by trading off fault/ error coverage capabilities with admissible implementation costs of the selected solutions.

Within the scope of microprocessor-based integrated systems, the Software-Based Self-Test (SBST) approach has been addressed for a long time by different teams in the research community [1][2][3]. SBST techniques basically consists in letting the CPU running a sequence of code words dedicated to excite and propagate to error the largest set of faults possibly affecting the circuit [4]. Compared to hardware-based test solutions, such as Built-In Self-Test (BIST), it presents many advantages, including the possibility of autonomously testing [5] and diagnosing [6] both the microprocessor and the controllable peripherals in normal mode of operation, without introducing any hardware modifications, and at-speed test application (i.e., at the circuit nominal frequency). Nonetheless, SBST methodologies raise some issues that have been limiting their application in industry throughout the years: those issues regard writing efficient and effective test programs and devising suitable methodologies for test application.

While in the manufacturing test arena BIST solution are often preferred because achieving high coverage in a short time and with a simple development flow, regarding on-line test application, SBST is standing up as the preferable solution for periodically monitor the system health without inferring the normal mission behavior [7][8]. A recognized solution adopted by the industry relies in periodic test application of a Core Self-Test (CST) library composed of SBST test programs. As depicted in figure 1, the microprocessor is periodically forced to execute a self-test code able to detect the possible occurrence of permanent faults in the processor core itself and the peripherals connected to it. Such procedures are specifically tailored to activate possible faults and then compress and store the self-test results in an available memory space or raising a signal when the test has not ended correctly.

As far as test program generation is concerned, many approaches can be found in the literature, employing manual or automated approaches, which are suited to target different processor architectures and fault models as described in [4]. However, setting up an efficient CST development
flow implicates a number of additional issues to be solved regarding test program generation, organization and grading [9]. Coping with such on-line requirements means introducing additional rules to be respected along test program generation and additional code parts need to be added, that may impact on the CST development time.

The requirement for a robust on-line execution of self-test programs to be run on-line;

1. The constraints to be taken into account when generating self-test programs to be run on-line;

2. The requirement for a robust on-line execution of self-test programs in coexistence with an embedded operating system;

3. An effective development flow organization aiming at minimizing the computational efforts.

The technical content of the paper deals with the most relevant aspects of on-line test programs characteristics and development flow. The paper progresses the state-of-the-art by describing and discussing:

1. Identifying possible memory constraints from the point of view of the OS restrictions, and indispensable locations to be reserved for test purposes.

2. Software Self-Test Generation Constraints

Software-Based Self-Test is widely perceived as proper method for an accurate and non-invasive autonomous test. In a few words, a test program is made running and signaling misbehavior by simply exercising the processor functionalities. This process intrinsically respects power constraints since the test programs make the processor to work under the same conditions available in the mission mode; they do not ask for additional test circuits, and are quite cheap in terms of features and commodities required to the test equipment.

When dealing with Core Self-Test, which has to be applied on-line, the test programs have to share processor resources with the mission application, i.e., the Operating System (OS) who is managing mission tasks; this coexistence introduces very strong limitations compared to manufacturing tests through SBST:

1. Cores Self-Test programs need to be compliant with a standard interface, enabling the Operating System to handle them as normal processes. This interface must guarantee processor status preserving and restoration, even in case of higher priority requests (e.g., preemption);

2. The CST programs need to be generated following execution time constraints, due to the resources occupation that can be afforded by the mission environment. In particular, this is strictly required when a test cannot be interrupted because using critical resources (i.e., special purpose registers);

3. There is a strong limitation in terms of memory resources usage, due to the mission code and data characterizing the OS. To face front this issue, it is recommended to

- Provide the CST as a set of precompiled programs stored in binary images to be run along mission mode, possibly scheduled and loaded by the operating system;

- Not to refer to any absolute addresses when branching, meaning that the test code can be stored anywhere in the memory for being eventually copied and launched from other locations without any functional or coverage drawback;

- Not to refer to absolute addresses when accessing to the data memory;

- Identify possible memory constraints from the point of view of the OS restrictions, and indispensable locations to be reserved for test purposes.

The rest of the paper is organized as following: Section 2 describes the on-line constraints that are encountered while generating test programs. Section 3 details the characteristics a test program should own for guaranteeing robustness and full compliance with the OS. Section 4 is illustrating an effective development flow suitable for large microcontrollers. Section 5 is showing experimental results and section 6 is drawing conclusions.
3 CORE SELF-TEST EXECUTION MANAGEMENT

As briefly described in the introductory section, the inclusion of SBST routines in the mission environment is a critical issue. To face front the problematic aspects of this integrations, we propose to consider three major points beyond generation, which are related to test program execution:

- Cooperation with other software modules, usually related to the mission environment such as the OS
- Context switching and result monitoring
- Robustness in case of faulty behavior, which is strictly related to interruption management.

3.1 Test encapsulation

Considering the cooperation with other software modules, such as the threads launched by the OS, the test program suite needs to be constructed by including key features enabling the test to be launched, monitored and eventually interrupted by higher priority processes of the mission management system.

Figure 2 is graphically depicting how the test program is structured and which memory and peripheral resources need to be configured for test purposes. The test programs are normally stored and executed in the Flash memory.

First of all, in order to be compliant with the mission software environment, a viable and strongly suggested solution is the adoption of the Embedded-Application Binary Interface (EABI) [11], which specifies standard conventions for file formats, data types, register usage, stack frame organization, and function parameter passing of a software program. Thus, every test program includes an EABI prologue and epilogue, in charge of saving and restoring the mission status.

Having the EABI frame created by the test code at its beginning, any scheduler can launch the test execution, e.g., the scheduler available in the OS hosting the test routine. Moreover, we propose the inclusion of extra information needed to setup a proper running environment by a specific test scheduler.

Additional test information encompasses:

- Stack frame size
- Special purpose register setup,
- Memory protection setup and
- Test duration.

These metadata are used by the test program for the setup

1) Duration time (i.e., watchdog setup)
2) Stack frame size (i.e., space available for mission configurations to be saved and local variables of the test program)
3) Processor setup (i.e., special purpose register ad-hoc values)
4) Memory configuration (i.e., virtual memory initialization)
5) Memory protection (i.e., to manage wrong memory accesses through exceptions)

and at the test program execution end

6) Signature check.

Such a memory structure can be also stored in the mass memory until it is loaded to be run from any portion of the available memory, according to the features already described in section 2 (i.e., relocation).
3.2 Context switching to test procedure

Test programs structured as described section 3.1 are prone to be integrated in any Operating System as normal system tasks. Proper context switching is mainly afforded by the EABI interface; additional setup may vary according to the characteristics of test program and it is managed by the test programs exploiting metadata.

We identify three general cases, each one demanding for proper metadata to be used in setup procedures:

- **Run-time tests**: usually devised to cover computational modules such as arithmetic modules, can be interrupted by mission requests.
- **Non-exceptive tests**: require the manipulation of SPR register for testing sakes, such as for testing the Register File
- **Critical tests**: are intentionally raising interrupts and make use of peripheral cores for testing sakes.

Run-time tests are the easiest to manage: they only require creating a stack frame according to EABI compliancy; stack frame size is minimal. It is suggested to execute this kind of tests with low privileges, i.e., user mode, because they will never request interruptions or privileged instruction execution in the good (not faulty) scenario. No other special setup is required. EABI compliancy can be satisfied during the overall execution, meaning that another OS thread can preempt the test execution.

Non-exceptive tests are less easy to manage because they use resources that are not allowed to be directly used in the EABI context, e.g., special purpose registers. For this category, additional setup steps have to be executed before running the test

- To disable the external interrupts in order to avoid preemption
- To save all special and general purpose registers in a larger stack frame memory area and
- To modify their content according to the processor setup information.

As well, some closing operations are needed at the conclusion of the test execution to restore the initial configuration. Along the execution of these tests, no preemption is allowed because the compliancy with the EABI standard cannot be guaranteed.

When considering Critical tests, more restrictive requests have to be accomplished. Other than saving-restore all registers and disable external interrupt sources, more information need to be saved, such as

- The Interrupt Vector Table (IVT) and the related registers in case an alternative IVT is required for testing purposes
- The current status and control registers of the used peripheral modules, such as the interrupt controller configuration and the MMU.

3.3 Interruption management and robustness

Interrupt mechanisms, which are managing synchronous and asynchronous exceptions, need to be handled with extreme care, because they are not only intentionally raised for testing purposes. There are three types of exception in our view:

- Intentionally provoked exceptions, i.e., to test processor exceptions
- Unexpected, induced by an erroneous execution that is provoked by a faulty configurations
- Mission mode interruptions.
Intentionally provoked interruptions can be synchronous or asynchronous. Situations like system calls, illegal memory access, illegal instructions and privilege related operations are synchronous, since they have to be forced by the code itself. Contrarily, the asynchronous category is raised by means of peripheral cores.

To test exceptions it is therefore necessary both to induce exceptions to rise and to manage them. The mechanism is graphically shown in figure 3. If the circuitries managing the interrupt have not been corrupted by a fault, each single forced exception is correctly managed, meaning that a test specific Interrupt Service Routine (ISR) is accessed. Such an ISR is configured along the scheduling execution and it is replacing the mission one.

The code included in the ISR is also responsible for accumulating significant contents into the signature, e.g., the status registers. In presence of a fault, this standard execution flow may be diverted in such a manner that an exception was intentionally scheduled but it is not raised. In this case, the signature update is not performed and the test, at the end, is not producing the right signature value.

Furthermore, the exception management is also crucial for facing flow deviations due to any kind of fault leading to an unexpected processor internal status and bringing to unexpected synchronous interruptions. Typical cases are legal to illegal instruction format, illegal memory access protected by memory protection unit mechanism. If this situation is occurring during the execution of any test program, the test ISR should ideally be able to recover such a deviation and to record the wrong behavior observed. Some counter measurement can be adopted to identify unexpected interrupt requests, such as performing an assertion in the ISR prologue to check a password stored into a GPR before the interrupt is intentionally raised. A similar method is implemented for checking the correct return from interrupt, e.g., by completing the test execution with an assertion.

This technique is making the test code quite robust, but more work is needed if the processor status become unstable, resulting in spurious and repeated exceptions as well as infinite loops. In the latter case, an external mechanism have to be implemented in order to move the system into a safe status, i.e., by watchdog timer.

These cases are shown in figure 5, where solid lines are showing expected interrupts while dashed are showing the effect in case the processor status is unstable. Along runtime test programs, mission interrupts need to be identified and served as soon as possible, i.e., passed to the OS. By following the EABI standard, it permits to easily manage this case.

### 4 Core Self-Test Development Flow

The major cost and issue in the development flow of a Core Self-Test is constituted by the computational effort required to proceed in a quick generation of the test program suite. In particular, the fault grading process [9], which has to be performed to evaluate the goodness of a test program in fault detection, represents a severe bottleneck. This cost is weigh down by test program infrastructure described in the previous paragraphs and required by the on-line execution.

For instance, just to give the reader an idea about this cost, for a medium sized embedded processor with about 200k stuck-at faults, the required time for fault simulating a 1ms program may ask up to 3 days by using a 2GHz quad-core workstation running 4 fault simulation processes in parallel.

This cost becomes unsustainable if the generation process is iterative [15] and produces many programs to be graded before achieving a good coverage.

Therefore, we propose a methodology for achieving a development time reduction and resources optimization based on the following principles:

1. The embedded processor cannot be tackled as a unique module, but it is better to consider its sub-modules separately (e.g., ALU, CTRL Unit, etc.) meaning that the processor fault universe is selectively divided into several smaller fault lists for being effectively attacked while generating the test programs;

2. By facing modules separately, it facilitates parallelization of the development process whether many workstations/ test-engineers are available (see more details in 4.1);

3. By developing a test for a specific sub-module, it is likely to have a side-effect that is the coverage of faults belonging to other sub-modules;

The strategy we propose is based on these principles and it consists on the iterated execution of two steps until all sub-modules are considered:

- To generate (possibly in parallel) the test programs for a set of carefully selected sub-modules (see more details in 4.2) until these are sufficiently covered
- To perform synchronization among the different test programs to evaluate their effectiveness over a larger fault list; to synchronize means to grade test programs generated for a specific sub-module over a different (larger) sub-module list.

A simplified illustration of the proposed flow is shown in figure 4, which considers a microprocessor (CPU) composed of four modules (s1 to s4). In the first step shown by figure 2,B, two modules (s3 and s4) are considered in parallel and graded separately, i.e., during the generation process for module s3 only its own faults are considered. As soon as the coverage of these sub-modules is satisfactory, the generated test programs are graded over the other parts of the CPU as shown in figure 2.C; this synchronization step brings to observe a positive side-effect on the coverage of modules s1 and s2, as well as on s3 when grading the test program for s4, and vice versa. A new generation step is then started on s1 and s2, as depicted in figure 2.D; it is worth to mention that the previous steps were beneficial because the starting fault lists of s1 and s2 have been lightened before facing their generation process.
The advantage of using this approach is mainly resulting in a faster development of the test suite because
- The fault lists to be considered are significantly smaller than the complete fault list, resulting in a faster fault grading that usually takes in the order of minutes to complete;
- After each synchronization step, the number of active faults in new sub-modules is reduced, again leading to a speed-up of the fault simulation process.

4.1 Resources partitioning
As briefly stated in the previous paragraph, the processor division into sub-modules permits to consider many independent fault lists in parallel; their selection is the major issues in the preliminary phase preceding the generation effort.

For being independent, fault lists need to be
- non-overlapping: one fault has to belong to only one fault list
- functionally orthogonal: faults in the same independent fault list need to belong to modules related to one specific functionality

The non-overlapping criteria requires that the same fault have to be considered only one time in the process; on the other hand, when dealing with orthogonality, a fault needs to be included in the most relevant fault list from the point of view of the functionality of the related gate.

The process of selecting the set of fault lists is not trivial. In our flow, this process is accounting on:
- Manuals and documentation of the microcontroller with specific indications about the micro-architecture
- Hierarchy of the microcontroller netlist
- Test engineer expertise.

To identify independent fault lists
- Analysis of the processor functionalities
- Mapping of the functionalities over the microcontroller hierarchical netlist.

It is likely that most of the fault lists derive from specific modules, but it is also frequent that many sub-modules need to be squeezed into a single fault list when related to the same processing functionality. As an example, the faults of a multiplication unit usually constitute an independent fault list. On the contrary, there are several multiplexers that seem to be independent netlist modules, but these ones actually compose the feed-forwarding logic in the processor pipeline; thus, faults belonging to these multiplexers have to be grouped into the same fault list, which is functionally orthogonal and non-overlapping with other modules.

Concerning computational resources allocation, once the independent fault lists have been identified, for maximizing the number of fault lists to be considered in parallel:
- A single sub-modules coverage calculation on a single or many fault grading threads according to
  - Number of available threads per CPU;
  - Number of EDA tool licenses;
  - Fault list size
- Result synchronization by using many threads.

4.2 Optimized test programs generation order
Based on the side effect principle described above, it is crucial to select the most promising order to proceed in the test program generation. The decision needs to be tailored on the specific architecture under analysis.

In the following, we propose some general guidelines for determining the test program generation order considering the most common and widely used microprocessor architectures for automotive.

In our development flow, we organize the generation order according to horizontal and vertical flow rules.

Vertical flow rules demand to split the flow into consecutive levels, such that by testing all the modules into a given level, a large positive side-effect in terms of fault coverage is observed when moving to the next level. We are currently proposing to divide the flow into levels according to the following rules:
1) to consider first those units that can be mapped on specific assembly instructions or specific architectural programming mechanisms;
2) to continue with memorization and control flow resources;
3) to conclude with modules which functionalities are transparent to the programmer.
According to the presented strategy, a synchronization step is needed after completion of the currently considered level before moving to the next; this synchronization step involves all sub-modules of the next level, i.e., to reduce the size of the fault lists to be considered successively.

By looking at the problem in a horizontal manner, it is also possible to identify many parallel branches which are still complying with vertical requirements. This horizontal view consists in identifying branches, so that a negligible side-effect crosses branches belonging to different horizontal views.

Based on the aforementioned rules, for a typical automotive-oriented architecture, we individuate a development flow based on 3 levels and 2 branches. However, additional branches can be added when considering microprocessors equipped with special features, e.g., caches [12], shared memory schemes [13], and Floating-Point unit [14].

We suggest considering two categories of sub-module by first:

level 1 - branch A) ALU sub-modules: easy to test, they ask for the execution of specific arithmetic and logical instructions. Side effect is maximized towards the REGISTER FILE by an accurate selection of registers to be used as operands and in the control flow management.

level 1 - branch B) SPECIAL sub-modules: they encompass Exceptions Management, Branch Prediction and Virtual Memory related modules, e.g., the Memory Management Unit (MMU) module. These sub-modules are hard to cover, requiring specific instructions and sequences of instructions. They will produce a very large positive side-effect on ADDRESS related modules.

As shown in figure 5, once a sufficient coverage on these sub-modules is reached, it is suggested to proceed in a synchronization process. The set of programs developed for level 1A) are evaluated on the REGISTER FILE fault lists, while level 1B) is graded on the ADDRESS related modules. As a result, the number of active faults to be then considered is greatly reduced.

level 2 – branch A) REGISTER FILE: the test of the register file is straightforward, being many papers describing effective sequences to test. In the proposed generation method, it is suggested to reorder instructions and operands in order to induce the usage of DATA DEPENDENCY structures in the PIPELINE.

level 2 – branch B) ADDRESSING modules: by having completed 1B), the most of the faults included in the ADDRESS related modules, such as Branch unit, Effective Address calculation, and Program Counter, are resulting as already covered. This step is therefore a completion of the previous one, which is done mainly by adding memory operation and branches to specific addresses.

A synchronization step is then operated on the PIPELINE and CONTROL UNIT modules, followed by level 3 with no more branches, which consists on an ad-hoc generation step for these level modules.

To complete the process, the entire test suite obtained along this process is evaluated on the whole processor fault universe, eventually by adding refinement programs to cover corner cases and specific configurations not considered along the previous steps.
5 RESULTS ON A 32-BIT AUTOMOTIVE DEVICE

In order to assess its effectiveness, the methodology herein introduced has been applied to a SoC including a 32-bit pipelined microprocessor based on the Power Architecture™. The SoC is employed in safety-critical automotive embedded systems, such as airbag, ABS, and EPS controllers and is currently being manufactured by STMicroelectronics.

5.1 Case of study

The microcontroller’s cost-efficient processor core is built on the Power Architecture technology and designed specifically for embedded applications. The processor integrates a pair of integer execution units, a branch control unit, instruction fetch unit and load/store unit, and a multi-ported register file capable of sustaining six read and three write operations per clock. Most integer instructions execute in a single clock cycle. Branch target prefetching is performed by the branch unit to allow single-cycle branches in many cases. It contains a Memory Management Unit and A Nexus Class 3 module is also integrated.

The 32-bit processor utilizes a five-stage pipeline for instruction execution. These stages are:
- Instruction Fetch (stage 1)
- Instruction Decode/ Register file Read/ Effective Address Calculation (stage 2)
- Execute 0/ Memory Access 0 (stage 3)
- Execute 1/ Memory Access 1 (stage 4)
- Register Write-Back (stage 5)

The stages operate in an overlapped fashion, allowing single clock instruction execution for most of the available instructions.

The integer execution unit consists of a 32-bit Arithmetic Unit (AU), a Logic Unit (LU), a 32-bit Barrel shifter (Shifter), a Mask-Insertion Unit (MIU), a Condition Register manipulation Unit (CRU), a Count-Leading-Zeros unit (CLZ), a 32x32 Hardware Multiplier array, and result feedforward hardware. Integer EU1 also supports hardware division. Most arithmetic and logical operations are executed in a single cycle with the exception of multiply, which is implemented with a 2-cycle pipelined hardware array, and the divide instructions. A Count-Leading-Zeros unit operates in a single clock cycle.

Two execution units are provided to allow dual issue of most instructions. Only a single load/store unit is provided, and only a single integer divide unit is provided, thus a pair of divide instructions cannot issue simultaneously.

5.2 Experimental results

Along one year of team work, we collected results of a wide number of modules. To test processor through software is a deeply explored field; therefore in many cases the technique utilized has been borrowed from the literature and adapted to cover the specific modules of the considered processor core. Table 1 reports the list of generation techniques employed to achieve the high fault coverage of each processor sub-module. On selecting these techniques, we resort to some of the most important proposals regarding test program generation available in today’s literature.

Concerning automatic approaches, we resorted to both ATPG-based techniques, and optimization techniques based on Evolutionary algorithms. Some other techniques (labeled as Deterministic) refer to available solutions that exploit the sub-module regularity in order to propose a well-defined test algorithm. Finally, rows labeled as Manual refer to pure manual strategies performed by the test engineer exploiting the processor user manual, the ISA, as well as the available HDL processor descriptions.

As stated in section 2, the test duration and size of each single test program are on-line requirements that may vary depending on the mission application and physical limits of the microcontroller (e.g., the memory space available). In our case study, the limitations were given both in terms of duration of single programs and overall occupation of the complete test suite. In particular, the maximum duration of a single program labeled as run-time test should not exceed 512 clock cycles, while the FLASH memory area reserved for test purposes was limited to 256kB.

To match these constraints, every kind of generation method needs to be tailored opportunistically:
- ATPG-based generation methods can be constrained by asking the automatic engine for high compression and limiting the generation to a maximum number of patterns; the generated patterns may be eventually transformed into many test programs compliant with duration constraints.
- Fitness values used along Evolutionary computation experiments include program size and length measurement; in such a way, the programs exceeding the imposed limitations were discarded;
- Deterministic and manual require additional efforts by the test engineer to fit the programs length and size; more easily, if too long they can be split into several shorter programs.

Code characteristics fitting on-line requirements were also considered in all cases, such as having relocatable code (absolute branches and access by absolute address to memory locations are not allowed) and resorting to limited portion of memory space (1kB) reserved for testing sake.

As described in section 3, each generated program is encapsulated into the EABI standard frame and includes the additional code sequences that guarantee the test robustness. For the current case of study, the EABI compliant frame is accounting for very few instructions at the beginning of procedures (e.g., 3-5 instructions); this number increases whereas:
- extra registers have to be saved before being used and finally restored to their original values (e.g., non-volatile register or special purpose registers such as the Microprocessor Status Register)
- memory resources need to be protected (e.g., Memory Protection Unit is exploited)
- peripheral microcontroller resources need to be programmed for test robustness (e.g., watchdog timers).
The number of additional instructions required to afford robustness other than raw compliancy with EABI standards are about 20 instructions. Additionally, to further enforce robustness, additional instructions were added when a context switching is purposely forced through exception testing reasons. Concerning the development flow, the fault list generation and the adopted generation order follows the generic indications provided in section 4.

The fault lists were generated mainly according to the processor functionalities which are directly related to specific modules in the netlist hierarchy. There are some exceptions since the considered microcontroller is dual issue and replicated arithmetic modules, such as the adders, were considered as a unique fault list; in a different way, the data-forwarding unit is composed of several multiplexers, which faults are jointly considered. Another interesting case of resource partitioning is related to the multi-port register file that is contributing with two fault lists, the register bank and the register ports (decoders and multiplexers); this is due to both the fault list size that we need to split, and the different functionalities.

Table 2 shows the evolution of the coverage along the development flow. The final fault coverage reached was 87.23% of the fault list that includes around 750k stuck-at-faults.

There are modules not highly covered:
- Exception management modules, because it is not possible to purposely exercise all of them (e.g., it is not possible to forcing a bus error which is asking the exception unit to intervene)
- Branch prediction, program counter and load/store units; due to the memory mapping configuration of the specific system-on-chip, not all bits in the addressing registers can be functionally touched.
- Status and control flags, since many of these flags cannot be used because controlling circuitries outside the processor core.

### TABLE 1

<table>
<thead>
<tr>
<th>Sub-module</th>
<th>Technique</th>
<th>References</th>
</tr>
</thead>
<tbody>
<tr>
<td>Arithmetic adders</td>
<td>Deterministic +</td>
<td>[15][16]</td>
</tr>
<tr>
<td>Division unit</td>
<td>Constr. ATPG +</td>
<td>[17][18]</td>
</tr>
<tr>
<td>Logic unit</td>
<td>Evolutionary</td>
<td></td>
</tr>
<tr>
<td>Multiplication unit</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Shifters</td>
<td>Manual</td>
<td>[19]</td>
</tr>
<tr>
<td>Branch Unit</td>
<td>Deterministic</td>
<td>[20]</td>
</tr>
<tr>
<td>Timers</td>
<td>Manual</td>
<td></td>
</tr>
<tr>
<td>Register bank</td>
<td>Deterministic</td>
<td>[21]</td>
</tr>
<tr>
<td>Register ports</td>
<td>Deterministic</td>
<td>[21]</td>
</tr>
<tr>
<td>EA adder</td>
<td>Loop-based +</td>
<td></td>
</tr>
<tr>
<td>Load store unit</td>
<td>Evolutionary +</td>
<td>[22]</td>
</tr>
<tr>
<td>Program counter</td>
<td>Manual</td>
<td></td>
</tr>
<tr>
<td>Forward unit</td>
<td>Deterministic</td>
<td>[23]</td>
</tr>
<tr>
<td>Decode unit</td>
<td>Deterministic</td>
<td>[24]</td>
</tr>
<tr>
<td>Status/control flags</td>
<td>Deterministic</td>
<td>[25]</td>
</tr>
<tr>
<td>Fetch unit</td>
<td>Deterministic</td>
<td></td>
</tr>
</tbody>
</table>

### TABLE 2

<table>
<thead>
<tr>
<th>Sub-module</th>
<th>#faults</th>
<th>Single 1A</th>
<th>Synchro 1A</th>
<th>Single 1B</th>
<th>Synchro 1B</th>
<th>Single 2A</th>
<th>Synchro 2A</th>
<th>Single 2B</th>
<th>Synchro 2A+2B</th>
<th>Single 3</th>
<th>Synchro 3</th>
</tr>
</thead>
<tbody>
<tr>
<td>FC [%]</td>
<td>FC [%]</td>
<td>FC [%]</td>
<td>FC [%]</td>
<td>FC [%]</td>
<td>FC [%]</td>
<td>FC [%]</td>
<td>FC [%]</td>
<td>FC [%]</td>
<td>FC [%]</td>
<td>FC [%]</td>
<td>FC [%]</td>
</tr>
<tr>
<td>Arithmetic adders</td>
<td>5,996</td>
<td>95.03</td>
<td>97.93</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>98.27</td>
<td>--</td>
<td>98.52</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Divider</td>
<td>19,018</td>
<td>83.98</td>
<td>83.98</td>
<td>--</td>
<td>--</td>
<td>83.99</td>
<td>83.99</td>
<td>--</td>
<td>83.82</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Logic instructions</td>
<td>22,294</td>
<td>76.32</td>
<td>78.57</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>78.70</td>
<td>--</td>
<td>83.34</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Multiplier</td>
<td>78,094</td>
<td>91.18</td>
<td>92.62</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>92.62</td>
<td>--</td>
<td>95.90</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Shifter</td>
<td>14,172</td>
<td>87.95</td>
<td>92.96</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>93.97</td>
<td>--</td>
<td>96.32</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Exceptions</td>
<td>40,718</td>
<td>--</td>
<td>--</td>
<td>66.08</td>
<td>67.17</td>
<td>--</td>
<td>--</td>
<td>68.16</td>
<td>72.48</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Branch prediction</td>
<td>24,489</td>
<td>--</td>
<td>70.91</td>
<td>70.95</td>
<td>--</td>
<td>--</td>
<td>72.67</td>
<td>--</td>
<td>72.67</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Timers</td>
<td>7,683</td>
<td>--</td>
<td>88.21</td>
<td>88.43</td>
<td>--</td>
<td>--</td>
<td>88.46</td>
<td>--</td>
<td>89.70</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Register bank</td>
<td>83,764</td>
<td>--</td>
<td>71.21</td>
<td>--</td>
<td>84.15</td>
<td>--</td>
<td>89.38</td>
<td>--</td>
<td>92.66</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Register ports</td>
<td>126,329</td>
<td>--</td>
<td>69.17</td>
<td>--</td>
<td>94.93</td>
<td>--</td>
<td>97.67</td>
<td>--</td>
<td>98.09</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Program counter</td>
<td>26,060</td>
<td>--</td>
<td>--</td>
<td>66.07</td>
<td>--</td>
<td>68.66</td>
<td>69.42</td>
<td>--</td>
<td>70.09</td>
<td></td>
<td></td>
</tr>
<tr>
<td>EA adder</td>
<td>5,228</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>66.51</td>
<td>--</td>
<td>92.02</td>
<td>93.75</td>
<td>--</td>
<td>94.57</td>
<td></td>
</tr>
<tr>
<td>Fetch unit</td>
<td>71,582</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>69.45</td>
<td>82.39</td>
<td>83.54</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Forward unit</td>
<td>84,758</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>70.95</td>
<td>84.29</td>
<td>84.82</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Status flags</td>
<td>33,277</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>59.31</td>
<td>78.08</td>
<td>78.61</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Control flags</td>
<td>10,328</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>64.21</td>
<td>68.63</td>
<td>69.83</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Decode unit</td>
<td>62,876</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>50.12</td>
<td>92.46</td>
<td>93.08</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Load/Store unit</td>
<td>15,971</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>73.50</td>
<td>75.42</td>
<td>76.73</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Glue logic</td>
<td>19,425</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>--</td>
<td>63.36</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

TOTAL: 756,789 -- 36.07 -- 9.74 -- -- 76.87 -- 87.23
As a complement to the fault coverage measurements, the dimension and duration and coverage of the test along the entire development flow are included in Table 3. Having a frequency of working of 150 MHz, the overall time required for executing all 73 tests is about 0.8ms.

It is interesting to note, how the synchronization phases produce a very strong positive cascade effect over the modules not yet considered; at least the half of the faults of the modules that are going to be considered during the next generation steps were pruned from the list without any additional effort. Table 2 also permits to remark that the synchronization steps cause coverage improvement also for modules of the current and previous levels of the same branch, as described in section 4.2.

A significant advantage in terms of grading time reduction is achieved by a proper development order which is maximizing the cascade effect. As an example of effectiveness, by adopting the proposed order, the generated test programs over the 139,574 faults of arithmetic modules included in 1A (level 1 – branch A) led to a positive side effects on 2A consisting in 147,029 over 210,093 faults (corresponding to about 70%); i.e., these faults are already covered without any specific generation effort for 2A. In other words, the fault simulation experiments carried on level 2A need to consider only 63,064 faults.

To the sake of completeness, we also computed the results obtained by implementing an alternative generation order, considering by first the modules of 2Aand then the ones of 1A. We tackled the 210,093 faults of register bank and ports by obtaining a fault coverage comparable with results in table 2, and evaluated the side effect of such programs over 1A: only 10,318 faults (or 7.4%) were already covered over the total amount of 139,574 faults of the arithmetic modules.

The reduction in the fault list cardinality, achieved by properly ordering sub-modules and synchronizing fault lists, induces a great time gain due to a large reduction of fault simulation efforts. The effect is not limited to successive synchronization, but it permits faster generation iterations as required by evolutionary algorithm.

Table 4 shows the elapsed time for fault simulation in two cases:
1) a raw development flow not using synchronization but simply considering sub-modules separately
2) a development flow following the proposed order and implementing synchronization between levels.

All the experiments were executed on a single core of a 2 GHz processor; the resulting times would be reduced by running multi-process fault-simulations.

It is worth to notice that the fault simulation time becomes excessive if not implementing synchronization. As well, the development order is important to minimize the fault simulation efforts. Supposing again that level 2 branch A has been considered before level 1 branch A, the saved CPU time for fault simulation is decreased from about 37 to 34 hours, which is a negligible gain if compared to those obtained by the proper ordering.

### Table 3

<table>
<thead>
<tr>
<th>Development Flow Step</th>
<th>Number of test programs</th>
<th>Duration [Clock Cycles]</th>
<th>Code size [kB]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single 1A, Synchro 1A</td>
<td>29</td>
<td>8,840</td>
<td>23</td>
</tr>
<tr>
<td>Single 1B, Synchro 1B</td>
<td>8</td>
<td>19,716</td>
<td>11</td>
</tr>
<tr>
<td>Single 2A</td>
<td>10</td>
<td>32,634</td>
<td>36</td>
</tr>
<tr>
<td>Single 2B</td>
<td>8</td>
<td>28,212</td>
<td>17</td>
</tr>
<tr>
<td>Synchro 2A+2B</td>
<td>55</td>
<td>89,402</td>
<td>87</td>
</tr>
<tr>
<td>Single 3</td>
<td>18</td>
<td>26,700</td>
<td>32</td>
</tr>
<tr>
<td>Synchro 3</td>
<td>73</td>
<td>116,102</td>
<td>119</td>
</tr>
</tbody>
</table>

**5.3 Test deployment during mission**

The suite of test programs resulting from the development phases described above is integrated in an industrial demo project for STMicroelectronics. The project handles the whole test set of programs by means of two software modules, and provides the project integrator with a software Application Programming Interface (API), in order to include them in the mission application:

- **Tests for power-on**: 44 test program, including Non-exception and Critical tests, scheduled by an ad-hoc software module named Boot Time Self-Test Module (BTSTM)
- **Tests for Run-time**: 29 Run-Time test programs handled by an AUTOSAR 4.0 Complex Driver [26] named CST Library.

Both CST Library and BTSTM provide configuration capabilities at compile time, in such a way that the project integrator can selectively activate all programs or a subset of the entire suite. It is up to the user of the API to choose suitable test combinations and a scheduled execution order to fulfill the safety requirements of the system.

In the devised demo, after the execution of the tests for power-on that takes about 0.7ms, the run-time tests were scheduled according to some specific requirements for mission integration:

- **Self-test chunks must be less than 5µs long**
- **Self-test interrupts the mission application every 500µs**.

---

**Table 4**

<table>
<thead>
<tr>
<th>Fault simulation time [hours]</th>
<th>1) without synchro</th>
<th>2) with synchro</th>
<th>Saved time</th>
</tr>
</thead>
<tbody>
<tr>
<td>Level 1 Branch A</td>
<td>37</td>
<td>-</td>
<td></td>
</tr>
<tr>
<td>Level 1 Branch B</td>
<td>122</td>
<td>-</td>
<td></td>
</tr>
<tr>
<td>Level 2 Branch A</td>
<td>217</td>
<td>55</td>
<td>74.7%</td>
</tr>
<tr>
<td>Level 2 Branch B</td>
<td>72</td>
<td>23</td>
<td>68.1%</td>
</tr>
<tr>
<td>Level 3</td>
<td>630</td>
<td>195</td>
<td>69.0%</td>
</tr>
</tbody>
</table>
Along mission, using the proposed demo setup, the overall self-test length does not exceed 100µs and a complete self-test is performed in less than 2ms. Availability of the mission application is reduced by around 1% even though that self-test can be preempted at any time.

Along development, the demo test suite was encompassing several verification and validation stages towards software maturity, which were including embedded documentation of the code by means of special comments (e.g., Doxygen tags [27]) that are parsed by external tool for automatically generating user manuals.

Test programs also provide services for returning test results, i.e., error codes such as AUTOSAR DEM errors and malfunctioning signatures computed by the test programs for successive inspection of failing chips. BTSTM assumes that all the available processor functionalities can be exclusively accessed for testing purposes; on the contrary, CST Library has more restrictive requirements.

For validation sake we finally conducted two kinds of experiments emulating the in-field behavior of the system:

1) To verify the fault-free behavior, a sample OS was considered that was intensively triggering mission interrupts while self-test executed at regular intervals as described above; a physical target was programmed with such a complete software environment and left running for several hours, tracing the correctness of the test responses and liveness of the system.

2) To investigate on the robustness in case of faults, a specific fault injection campaign was performed by means of complete simulation (i.e., without fault dropping) in order to classify erroneous behaviors, as previously introduced in section 3.3:
   a) Self-test ends with a wrong signature
   b) Self-test is not ending due to deadlock configuration
   c) Self-test ends with unattended exception management due to
      i) Illegal instruction execution
      ii) Wrong branches in memory areas protected by MPU configuration.

### 6 Conclusions

This work is proposing a development flow for the effective Software-Based Self-Test generation of test programs to be run on-line during the mission of automotive environment. The paper encompasses:

- Identification of on-line constraints and implemented solutions
- Resources distribution and generation order for a most efficient and fast test program generation along the various sub-modules of the entire processor
- Execution management of the SBST library and robustness of its execution

The case of study reported the final results related to a SBST library generated for an industrial 32-bit processor core included in an automotive System-on-Chip manufactured by STMicroelectronics; the coverage figure obtained in 1 year team working is more than 87% over around 750k Stuck-at-faults.

### 7 Acknowledgments

The authors would like to thank Thomas Zsurmant, Renato Meregalli and Giovanni Di Sirio at STMicroelectronics, Matteo Sonza Reorda, Michelangelo Grosso, Lyl Ciganda and Giovanni Squillero of Politecnico di Torino, Oscar Ballan when working a STMicroelectronics, for the useful discussion and contributions to this work.

### References


[27] Doxygen web-site: http://www.stack.nl/~dimitri/doxygen/

Paolo BERNARDI (S’03–M’06) received the M.S. and Ph.D. degrees in Computer Science from Politecnico di Torino, Torino, Italy, in 2002 and 2006, respectively. Since 2001, he has been with the Department of Computer Engineering, Politecnico di Torino, where he is currently an Associate Professor. His interests cover the areas of testing of electronic circuits and systems and the design of fault-tolerant electronic systems. Dr. Bernardi is a member of the IEEE Computer Society.

Riccardo CANTORO received the M.Sc. degree in Computer Engineering from Politecnico di Torino, Torino, Italy in 2013. Since 2014, he is a Ph.D. student in the Department of Computer Engineering, Politecnico di Torino. His main research topic is microprocessor testing.

Sergio DE LUCA Team leader, project leader and Functional Safety expert at the STMicroelectronics; embedded software development for Automotive system-on-chip and real-time applications.

Ernesto SANCHEZ received his degree in Electronic Engineering from Universidad Javeriana, Bogota, Colombia in 2000. In 2006 he received his Ph.D. degree in Computer Engineering from the Politecnico di Torino, where currently, he is an Associate Professor with Dipartimento di Automatica e Informatica. His main research interests include microprocessor testing and evolutionary computation.

Alessandro SANSONETTI manages ST’s Automotive Product Group software design teams based in Italy (Agrate B.za, Naples and Catania) and in France (Le Mans). He has participated as an active member of the AUTOSAR consortium as SPI document owner. Sansonetti, who has worked at ST since 1996, graduated in computer science at the University of Milan.