# POLITECNICO DI TORINO Repository ISTITUZIONALE

# About the correlation between logical identified faulty gates and their layout characteristics

Original

About the correlation between logical identified faulty gates and their layout characteristics / Bernardi, Paolo; Cardone, Lorenzo; Iaria, Giusy; Appello, Davide; Garozzo, Giuseppe; Tancorre, Vincenzo. - (2023). (Intervento presentato al convegno IEEE International Symposium on On-Line Testing and Robust System Design tenutosi a 03-05 July 2023 nel Crete, Greece) [10.1109/IOLTS59296.2023.10224897].

Availability: This version is available at: 11583/2979776 since: 2023-09-18T12:52:53Z

Publisher: IEEE

Published DOI:10.1109/IOLTS59296.2023.10224897

Terms of use:

This article is made available under terms and conditions as specified in the corresponding bibliographic description in the repository

Publisher copyright IEEE postprint/Author's Accepted Manuscript

©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collecting works, for resale or lists, or reuse of any copyrighted component of this work in other works.

(Article begins on next page)

# About the correlation between logical identified faulty gates and their layout characteristics

P. Bernardi, L. Cardone, G. Iaria Politecnico di Torino, Italy

Abstract—Electronics play a significant role in modern society in various areas of our daily lives. Companies producing embedded nano-electronic systems have responded to the everincreasing demand for high-performance chips with the development and production of structurally complex design, both in terms of the number of gates they are composed of and how they are arranged on the silicon surface. Especially devices intended for safety-critical fields, such as the Automotive field, require a thorough and precise testing process before they are fielded.

This paper proposes a correlation analysis between candidate faulty logical gates as possible sources of a given failure identified during the Manufacturing Test Flow and their layout characteristics on the silicon. It is meaningful feedback for manufacturers about the quality of their applied tests. The experimental results are reported for data regarding a production lot of an Automotive System-on-Chip belonging to the SPC58 family produced by STMicroelectronics.

Index Terms—Manufacturing Test, Layout, Reliability

## I. INTRODUCTION

Electronics have become an indispensable part of our lives, permeating every aspect of our daily routines. The rapid evolution of digital systems has led to the proliferation of chips and integrated systems, which have become increasingly complex and ubiquitous [1]. While these advancements have enabled the development of high-performance devices, they have also contributed to the emergence of anomalous behaviors in systems, posing a significant challenge to the industrial electronics sector.

The importance of managing these anomalous behaviors cannot be overstated, particularly in sectors such as aerospace or automotive, where even the slightest defect could potentially result in catastrophic consequences. As such, digital systems must undergo rigorous testing techniques at all stages before marketing, as detailed by a well-defined standard, such as ISO 26262 [2] in the automotive field. Such testing techniques include verifying logical operations during production, assessing reliability and lifespan, and identifying potential vulnerabilities.

The consequences of failing to implement these testing techniques can be dire, as evidenced by past incidents where defective electronics have caused significant harm to human lives and the environment. Therefore, industrial electronics manufacturers must prioritize the safety and reliability of their products and ensure that they meet the highest standards of quality and performance. D. Appello, G. Garozzo, V. Tancorre *STMicroelectronics*, Italy

In today's complex and rapidly evolving electronics industry, it is critical to ensure that manufacturing processes and tests are conducted with the utmost precision and accuracy, trying not to use excessive time and money resources.

In the literature, there are many examples in which methodologies to predict and model the probabilities of faults occurring are proposed, such as in [3]–[7].

In order to contain test processing times, novel techniques that exploit the gate distribution on silicon have been proposed, observing multiple metrics, like the criticality of a given area in [8], or the density of the gates on silicon, proposed by [9], which proposed clustering techniques to group elements on the silicon-based on how many other elements fell within a certain radius from them.

To this end, this paper presents a novel methodology for evaluating the correlation between candidate logical faulty gates identified during the Manufacturing Test Flow with the layout characteristics of the latter. It is based on a comprehensive analysis of the chip layout, and it was applied on results obtained from a production batch of an SPC58 family device produced by STMicroelectronics. By leveraging advanced analytics and data-driven insights, the proposed methodology can help manufacturers to optimize their test flows and improve the quality and reliability of their products.

In summary, the methodology presented in this paper offers a promising approach for improving the Manufacturing Test Flow of Systems-on-Chip, and the experimental results reported for the SPC58 production lot demonstrate the effectiveness of the approach in practice. Going forward, further research and development in this area may contribute to the reliability of electronic devices and the reduction of time required during the testing process.

The paper is organized as follows. Section II introduces the technical background required about the Manufacturing Test Flow and the physical layout of the circuit. Section III introduces the proposed strategy technically, describing the correlation analysis between the logical faulty gates and their layout characteristics. Section IV reports the experimental results calculated over data regarding the production of a lot of an Automotive System-on-Chip belonging to the SPC58 family produced by STMicroelectronics. The reported results have been obtained using a tool developed in C++ capable of performing the proposed correlation analysis. In the end, Section V draws some conclusions on the results and the next steps for developing the work.

This work is supported by Italian Ministry of University and Research under the DM1061.

# II. BACKGROUND

This section provides the reader with the needed notions to better understand the methodology described in the next section. Firstly, the phases of the Manufacturing Test Flow are explained in detail; secondly, the typical physical layout characteristics for complex Systems-on-Chip are reported.

# A. Manufacturing Test Flow

The Manufacturing Test Flow is a crucial process to ensure the quality and reliability of semiconductor devices. It consists of a series of phases applied one after the other to identify and discard any faulty devices produced during the manufacturing process. The primary goal of the manufacturing test flow is to guarantee that only the devices that meet the required specifications and quality standards are delivered to the customers. A possible test flow is shown in Figure 1.



Figure 1: Manufacturing Test Flow

It typically starts with the Wafer Sort phase, performed at the wafer level, where each die on the wafer is tested for primary electrical functionalities. The Package Test phase follows, where the electrical characteristics of the pins are tested. These tests are complemented by structural tests such as scan-based tests and Built-In Self-Test (BIST), which can be applied in the previous phases or in a single one.

For safety-critical applications, an additional phase known as Burn-In has been introduced [10] to exacerbate potential latent faults before the Final Test. All the components are tested during burn-in, putting them under significant stress, above their certified operating conditions. This phase is designed to exacerbate latent defects in the hardware so that they occur during testing and not during their regular operation.

The Final Test is the last phase of the Manufacturing Test Flow and involves applying a mix of structural and functional tests [11] to detect any faults that might have escaped the previous phases. In recent years, System-Level Test [12] has been added at the end of the test flow to verify the correctness of the interactions and communication peripherals of the components. System-Level Test usually consists of running multiple programs that can simulate the operation of the device once it has been released to the market. This form of functional testing does not aim to cover every single defect but to ensure the device's functioning during its typical use.

The Manufacturing Test Flow is a complex and timeconsuming process that requires careful planning and execution to ensure that the final product meets the required quality and reliability standards in reasonable time and costs.

At the end of the test process, manufacturers can have a complete list of gates that could have been the cause of misbehavior in discarded devices. Typically, candidate fault information is used to diagnose failed devices. In fact, by using fault dictionaries techniques, it is possible to have a more or less accurate resolution of the faults that generated the failure of the devices. In recent years, much research has been towards creating fault dictionaries [13] [14].

The proposed methodology is based on the list of candidate faulty gates, and it adds to them their layout characteristics to study their correlation.

## B. Physical layout

The design phase of today's complex automotive Systemson-Chip (SoC) is a critical step in the manufacturing process. This phase is responsible for creating the topology of devices, starting from the synthesized design.

To ensure the same functional behavior, the physical description of transistors in a technology library, such as their length, width, and height, is used to place cells over the layout and route connections between transistors to create the same logic gate. The placement phase follows, where standard cells are placed across the layout rows, creating high-density areas with a high concentration of cells. These areas may generate different types of defects due to electromagnetic interference. Thus, topology information related to the placement of standard cells has become essential in developing and analyzing modern automotive SoCs [15].

For example, some methods use layout analysis to identify critical bridges in critical areas and generate a pattern to test selected critical faults, thereby truncating the fault list [8]; meanwhile, other methods use the density metric to target faults in high-density areas of the circuit [9]. As the complexity of automotive SoCs continues to increase, so does the importance of accurate and efficient layout design, placement, and analysis. Manufacturers must implement innovative techniques to minimize the risk of defects and ensure the quality and reliability of their products.

Figure 2 shows an example of the layout for an Automotive SoC. The colors highlight the different density levels all over the circuit: parts with higher gate density are indicated by a lighter shade of green, while areas with fewer gates have been placed are represented by a darker shade of green.

#### III. THE PROPOSED APPROACH

The proposed approach highlights the relation between the number of times a gate is identified as a candidate during the Manufacturing Test Flow and its layout characteristics. Thus, it is based on analyzing the layout information and the list of candidate faulty gates identified after the production phase. The correlation analysis can be summed up in two types of visualization:

1) a spatial analysis of the relation between the two elements described so far;



Figure 2: Coloured gate-density distribution over the layout of a System-on-Chip.

2) a confusion matrix that correlates the number of neighbors of a given gate and the probability of it failing.

After the Manufacturing Test Flow, the dies that have encountered failures for some of the pattern sets can provide important information about the candidate faulty gates for each of the dies. Therefore, at the end of the process, the manufacturers have a list of possible failures that caused the dies to fail.

The proposed approach is intended to provide manufacturers with a new metric beyond the existing one of the number of occurrences in which each failure appears among the failed die candidates. Therefore, using information derived from the chip layout, the developed methodology cross-references manufacturing data with design data, providing additional information on the location of candidate faulty gates. In particular, thanks to the information extrapolated from the layout about the number of neighbors for each circuit cell, it is possible to define whether each candidate fault is located in a high, medium, or low-density zone.

# A. Candidate faulty gates

The proposed flow takes as input a list of logical gates identified as the candidate for having been the cause of a failure during the Manufacturing Test Flow. Moreover, the list also has information on how many times that specific fault could have been a candidate, that is, the number of dies whose failure could have been caused by it.

For example: considering a production lot of three dies, the respective possible failures during the flow (see Table I) and the coverages of the pattern sets considered (see Table II), the resulting list of candidates with also the number of times that each one appears in a production lot is shown in Table III.

| Die ID | Pattern set failures |
|--------|----------------------|
| 1      | T1, T2               |
| 2      | T3, T4               |
| 3      | T3, T4               |

Table I: Pattern set failures during Manufacturing Test Flow

| Pattern set | Covered Faults |
|-------------|----------------|
| T1          | F1, F2         |
| T2          | F2             |
| T3          | F3, F4         |
| T4          | F3             |

Table II: Pattern set coverages

| Fault ID | # Dies |
|----------|--------|
| F1       | 1      |
| F2       | 1      |
| F3       | 2      |
| F4       | 2      |

Table III: Resulting list of candidates

By leveraging this metric, the proposed work aims to provide a more comprehensive understanding of the underlying faults that lead to manufacturing defects, thereby improving the overall quality and reliability of the manufactured products. Additionally, the insights obtained from this analysis could help develop more effective test strategies.

# B. Density metric

As explained in the previous section, the physical layout of a highly complex circuit, such as those used in the automotive industry, may have some denser areas with a more significant number of adjacent cells. Specifically, each cell has a certain number of neighbors at a certain distance, and the number of neighbors can be used to infer whether the cell is in a dense area.

The distance used to calculate the neighborhood of each cell is computed experimentally as the distance between the two input pins of an AND gate. In this way, the approach could be re-used for different cases of study, also with different technologies.

In the proposed study, the neighborhood information of each cell is used to combine the information from the production test results with the one of the layout of the circuit. In this way, the number of times a fault is a candidate during the Manufacturing Test Flow in a given plot can be crossreferenced with whether or not that fault belongs to a dense zone.

To follow the previous examples, considering an SoC with a list of faults composed of  $\{F1, F2, F3, F4\}$  in Table IV you can see the number of neighbors for each fault.

| Fault ID | # Neighbors |
|----------|-------------|
| F1       | 2           |
| F2       | 2           |
| F3       | 3           |
| F4       | 1           |

Table IV: Number of neighbors for each fault cell



Figure 3: General overview of the proposed methodology

# C. Correlation Analysis

The proposed automated tool cross-references data from manufacturing and data from the circuit design phase to provide cross-metrics that act as feedback to manufacturers.

Specifically, the proposed study can answer the following questions:

- Of the faulty gates that appeared as candidate N times, how many are found in dense areas?
- Of the faulty gates with *M* neighbors, how many appeared as a candidate during production? Furthermore, if they appeared, how many times?

To this end, two different analyses could help better visualize the resulting cross data: *spatial analysis* and *confusion matrices*.

1) Spatial analysis: Figure 2 shows an example of layout for a complex System-on-Chip. In particular, as described in the previous section, it highlights the density level by using different grading of green color.

To upgrade this type of view, the information regarding the candidate faulty gates identified during production can be used and overlapped to the image to have a comprehensive picture of where the faulty gates lay and also the density areas.

The spatial analysis can help to have at a glance the trend of the production lot analyzed, together with the characteristics of the device itself.

2) Confusion Matrices: Confusion matrices are usually used to evaluate the accuracy of a prediction. Thus, the real and predicted values are compared and cross-referenced in rows, representing the real values, and columns, representing the predictions, of a matrix. They are predominantly used in the machine-learning field to validate models of classification algorithms.

They can be helpful to visualize the results of our correlation analysis because we can easily cross the two elements we are considering: the number of neighbors of each faulty gate and the number of times it appeared as a candidate during production.

Thus, as the spatial analysis, they can give manufacturers a comprehensive view of how the production flow is going, together with the layout of the chip developed in the earlier stages by designers. In particular, each matrix cell can represent the correlation between the two metrics analyzed.

## **IV. EXPERIMENTAL RESULTS**

The proposed methodology has been developed as a tool written in C++. It takes as input the list of candidate identified faulty gates and the their neighborhood information, extrapolated from .DEF file with the description of the layout characteristics, and produces both text statistics and graphs (see Figure 3.

The developed tool has been tested using candidate faulty gates identified during the production phase of a lot of Automotive microcontrollers belonging to SPC58 chip family produced by STMicroelectronics [16], and also its relative layout information to generate the cross-reference analysis.

The referred case of study has the following characteristics:

- about 20 million gates;
- about 700 thousand flip-flops;
- multi-core architecture;
- ASIL-D compliant.

# A. Plotted Graphs

The tool is capable of clustering data based on:

- 1) density on the silicon
- 2) number of times the element has been cataloged as a fault candidate during production testing
- 3) physical location on the silicon

These three groupings allow us a multitude of representations, of which we have selected the most representative to illustrate the relation between cell density and location of candidate faulty gates.

Figure 4a and Figure 4b highlight the relation between classes of density and their respective amount of faulty gates. This kind of representation points out the respective distribution of those two metrics. In Figure 4a the blue section of the bar counts up to the number of all the gates that fall in that specific density category, the orange part considers only those gates selected at least once in the group of possible fault candidates. Figure 4b highlights how the classes of density are distributed through the faulty gates. While other figures



(c) Number elements separated by candidates, logarithmic scale



(e) Confusion matrix neighbors vs times they were a candidate



(g) Colored cell density



(b) Candidates separated by neighborhood



(d) Colored cell density and fault density



(f) Confusion matrix neighbors vs times they were a candidate



(h) Colored fault density

Figure 4: Visualizations of plotted graphs

better represent the relation between the number of times a specific element was selected as a candidate and how dense the region was, it shows at a glance the relationship between the three metrics: number of elements, times it was selected as a candidate and density on the silicon.

Figure 4c shows, on a logarithmic scale, the number of elements, separated by the number of times it was identified as a candidate faulty gate. Although very similar to Figure 4b, it helps to show the number of elements that fall in each category, even in a low amount.

Figure 4d takes in exam the third of the clustering methods: physical location. It shows the new spatial representation, consisting of the overlay of a 2D map containing the relevant gate-density data (see Figure 4g) and a second one containing the fault density (see Figure 4h). In the mentioned figure we have a superimposition of two different representations: the density of the gates and the density of the recorded faults.

The gate-density representation shows the dimension of the data through three channels: color luminance, saturation, and hue, which, while not very effective for comparing different sections of the map, give us a general idea of which regions are more or less densely populated.

The fault-density representation, as the main contribution of this work, has a different representation. Instead of being marked uniquely by a color gradient, it is represented by small 2D circular areas. This choice allows us to exploit a fourth property to convey a sense of dimension: in addition to color luminance, saturation, and hue, the size of this area is exploited to give the observer a better understanding of the data shown.

Both representations share the same coordinate space, and they were overlaid to draw a clearer relation between the positions from the two datasets. A shift in the color hue and the presence of a well-defined shape for the fault representation were exploited to distinguish gate and fault densities to allow for a quick visual distinction of the two sources.

The representation gives us a general idea of the distribution of the faults in relation to the gate density. Although it gives us little information on the specific gates, it shows that regions with medium to low gate density tend to have a very limited amount of faults too.

To analyze more in detail the relation between the number of neighbors of each gate and the number of times the latter has been identified as a candidate, we exploited the first and second types of clustering to produce Figure 4e and Figure 4f. It completely abandons the spatial dimension to make room for greater granularity in observing the relation between the two metrics listed earlier.

The color scale represents the probability that, given an element belonging to a specific density class (i.e., number of neighbors), it will be reported a specific number of times during production testing as a potential cause of failure (i.e., number of times it has been a candidate for failure). Because of the large number of elements that fell within certain density classes, the data were then normalized by the number of elements belonging to that class.

# B. Execution performances

In order to prove the scalability of this method we will report the resources usage of the proposed approach. The most critical factor for our application lies in RAM occupancy, as we need to store a lot of information to allow us to map elements to each other. Specifically, assuming we have a list containing each pin of the entire circuit we need enough memory space to save:

- 1) a string containing the unique name of the pin
- 2) an integer that takes into account the number of neighbors of the element
- an integer that counts how many times this element has been a candidate as a possible fault for each fault type (e.g., stuck at 0, stuck at 1, etc.)

To simplify the search and insertion of new elements within the data structure we used a hash map, which, at the cost of a RAM overhead, allows us a constant complexity, not dependent on the number of records, when accessing the saved information. On the case under consideration, execution of this code occupies about 1GB of RAM and takes about 50 seconds to cross-reference data from fault candidates obtained during production phase testing and data from density analysis. This data was obtained by running the code on a single thread and without compressing the cell names. With better representation of the names of each pin we could achieve significantly lower memory occupancy, since the string is the portion of the data structure that occupies the largest portion of our memory. In contrast, the execution time is predominantly read from disk, which is the main performance bottleneck. Consequently, having a more efficient representation of the name of each element within the source files as well would allow us to shorten the execution time of our application.

#### V. CONCLUSION

Electronics are ubiquitous in everyday life, and their reliability is essential, especially in fields where user safety is critical. Testing electronic components that are used in areas such as the automotive industry is of paramount importance.

However, the increasing complexity of devices makes the testing process increasingly complex. For this reason, it is necessary to find new ways to optimize this process without compromising the high-quality standards required.

Our method proposes a helpful tool for designers and manufacturers of electronic devices to optimize the testing process. By leveraging circuit layout information and test results from manufacturing, our automated tool provides crossmetrics on the applied tests' performance.

In addition, in future developments of the tool, other information, such as the number of cell interconnections, will also be considered to provide even more accurate feedback on the coverage of applied test vectors and the types of failures found in production. This innovative approach has the potential to significantly improve the quality and efficiency of the electronic component testing process.

## REFERENCES

- R. Srivastava *et al.*, "Soc time to market improvement through device driver reuse: An industrial experience," in 2012 International Symposium on Electronic System Design, 2012.
- [2] "Iso 26262-[1-10], road vehicles functional safety," 2011.
- [3] I. Chen and A. Strojwas, "Realistic yield simulation for vlsic structural failures," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 6, no. 6, pp. 965–980, 1987.
  [4] J. Pineda de Gyvez and C. Di, "Ic defect sensitivity for footprint-
- [4] J. Pineda de Gyvez and C. Di, "Ic defect sensitivity for footprinttype spot defects," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 11, no. 5, pp. 638–658, 1992.
- [5] J. Khare, D. Feltham, and W. Maly, "Accurate estimation of defectrelated yield loss in reconfigurable vlsi circuits," *IEEE Journal of Solid-State Circuits*, vol. 28, no. 2, pp. 146–156, 1993.
- [6] A. Dalal, P. Franzon, and M. Lorenzetti, "A layout-driven yield predictor and fault generator for vlsi," *IEEE Transactions on Semiconductor Manufacturing*, vol. 6, no. 1, pp. 77–82, 1993.
- [7] G. Allan, "Yield prediction by sampling ic layout," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 19, no. 3, pp. 359–371, 2000.
- [8] P. Maxwell *et al.*, "Bridge over troubled waters: Critical area based pattern generation," in *IEEE European Test Symposium*, 2017.
- [9] G. Iaria et al., "A novel pattern selection algorithm to reduce the test cost of large automotive systems-on-chip," in 2022 IEEE 23rd Latin American Test Symposium (LATS), 2022, pp. 1–6.
- [10] A. Benso et al., "Atpg for dynamic burn-in test in full-scan circuits," in Asian Test Symposium, 2006.
- [11] A. Birolini, "Reliability engineering theory and practice," *Springer*, 2017.
- [12] H. Chen, "Beyond structural test, the rising need for system-level test," in International Symposium on VLSI Design, Automation and Test, 2018.
- [13] I. Pomeranz and S. Reddy, "On dictionary-based fault location in digital logic circuits," *IEEE Transactions on Computers*, vol. 46, no. 1, pp. 48– 59, 1997.
- [14] P. Bernardi, M. Grosso, and M. S. Reorda, "An adaptive tester architecture for volume diagnosis," in 2010 15th IEEE European Test Symposium, 2010, pp. 227–232.
- [15] W. Ruggeri et al., "Innovative methods for burn-in related stress metrics computation," in International Conference on Design Technology of Integrated Systems in Nanoscale Era, 2021.
- [16] STMicroelectronics, "Spc58nn84c3, 32-bit power architecture mcu for high performance applications," https://www.st.com/en/ automotive-microcontrollers/spc58nn84c3.html.