# POLITECNICO DI TORINO Repository ISTITUZIONALE

# A 28-nm CMOS pixel read-out ASIC for real-time tracking with time resolution below 20 ps

Original

A 28-nm CMOS pixel read-out ASIC for real-time tracking with time resolution below 20 ps / Cadeddu, Sandro; Frontini, Luca; Lai, Adriano; Liberali, Valentino; Piccolo, Lorenzo; Rivetti, Angelo; Stabile, Alberto. - ELETTRONICO. - (2020), pp. 1-5. (Intervento presentato al convegno 2020 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC) tenutosi a Boston, MA, USA nel 31 Oct.-7 Nov. 2020) [10.1109/NSS/MIC42677.2020.9507912].

Availability: This version is available at: 11583/2954930 since: 2022-02-14T16:59:22Z

Publisher: IEEE

Published DOI:10.1109/NSS/MIC42677.2020.9507912

Terms of use:

This article is made available under terms and conditions as specified in the corresponding bibliographic description in the repository

Publisher copyright IEEE postprint/Author's Accepted Manuscript

©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collecting works, for resale or lists, or reuse of any copyrighted component of this work in other works.

(Article begins on next page)

# A 28-nm CMOS pixel read-out ASIC for real-time tracking with time resolution below 20 ps

Sandro Cadeddu, Luca Frontini, Adriano Lai, Valentino Liberali, Lorenzo Piccolo, Angelo Rivetti and Alberto Stabile

Abstract–We present the development of a test ASIC, named Timespot1, designed in CMOS 28-nm technology, featuring a 32x32 pixel matrix and a pitch of 55  $\mu$ m. The ASIC is conceived as the first prototype in a series, capable to read-out pixels with timing capabilities in the range of 30 ps and below. Each pixel is endowed with a charge amplifier, a discriminator and a Time-to-Digital-Converter, capable of time resolutions below 20 ps and read-out rates (per pixel) around 3 MHz. The timing performance are obtained respecting a power budget of about 50  $\mu$ W per pixel, corresponding to a power density of approximately 2 W/cm<sup>2</sup>. This feature makes the Timespot1 approach an interesting solution for vertex detectors of the next generation of colliders, where high space and time resolutions will be mandatory requirements to cope with the huge amount of tracks per event to be detected and processed.

#### I. INTRODUCTION

detectors being conceived for the next ERTEX generation of collider experiments will have to cope with a huge amount of tracks per event. To attack this problem, it will be mandatory to operate with pixel sensors having both high space and time resolutions. Typical requirements are space resolutions of about 10 µm and time resolutions below 50 ps at least [1]. Dedicated development activities have already started to study possible technical solutions in this respect. The TIMESPOT project aims at the production of a small-scale demonstrator, which includes both a pixel sensor with a size of 55  $\times$  55  $\mu$ m<sup>2</sup> and a pixel read-out chip satisfying the previously mentioned requirements. This demonstrator includes an ASIC, named Timespot1 and described in this document, which will be bump-bonded to dedicated 3D silicon sensors, having already shown a time resolution in the range between 20 and 30 ps [2,3]. The electronics must have performance to maintain the detector time resolution, both in terms of jitter and time measurement resolution. The system

Manuscript received November 1, 2020. (Write the date on which you submitted your paper for review.) (Corresponding author: Sandro Cadeddu.). This work was supported by the Fifth Scientific Commission (CSN5) of the Italian National Institute for Nuclear Physics (INFN), Project TIMESPOT (CSN5 open-call contest, 2017).

S. Cadeddu and A. Lai are with the I.N.F.N. Sezione di Cagliari, 09042 Cagliari, Italy (e-mail: sandro.cadeddu@ca.infn.it; adriano.lai@ca.infn.it).

L. Frontini, V. Liberali and A. Stabile are with INFN, Sezione di Milano and University of Milano, Via Celoria,16 -20133 Milano, Italy (e-mail: luca.frontini@mi.infn.it; alberto.stabile@mi.infn.it)

L. Piccolo is with INFN, Sezione di Torino, Via P. Giuria, 10125 - Torino, Italy, and Politecnico di Torino, Corso Duca degli Abruzzi, 24, 10129 Torino, Italy (e-mail: lorenzo.piccolo@to.infn.it)

A. Rivetti is with INFN, Sezione di Torino, Via P. Giuria, 10125 - Torino, Italy, (e-mail: rivetti@to.infn.it)

must detect the space-time information and for this reason the implemented architecture is a charge pre-amplification stage followed by a discriminator and a Time to Digital Converter (TDC) for the digitization of the timing information. The power budget available to the electronics is constrained by the power dissipation system employed in the experiments, imposing a power consumption limit around fifty microwatts per-pixel in our case.

### II. GENERAL ARCHITECTURE

The Timespot1 ASIC is designed to provide a readout array, suitable for chip-to-chip bump-bonding with 3D pixel arrays, with a pixel size of  $55 \times 55 \ \mu m^2$ . The Timespot1 chip implements 1024 pixel readout channels, organized in a 32 x 32 matrix, with each channel equipped with its own Analog Front-End and TDC. Fig. 1 shows a block architecture of the chip.



Fig. 1. Timespot1 block architecture

The input channels are connected in groups of 256 to one Read Out Tree (ROT) block, which collects data from the active channels, assigns them a global timestamp and sends data formatted to one of the two serializers connected to LVDS drivers to output the data for the acquisition.

In the chip several structures are implemented for services: DACs for Voltage references, PLL and DCO generating the needed clock frequency. Timespot1 is controlled and configured via I<sup>2</sup>C interfaces.



Fig. 2. Floorplan of the pixel blocks, summing up to a  $32 \times 32$  pixel matrix. Left:  $16 \times 16$  pixel modules. Right: 4 modules abutted to build up a larger matrix. The structure can be repeated to obtain larger sizes.

The array is arranged in  $16 \times 16$  pixel blocks. The size of each readout cell is  $50 \times 55 \ \mu\text{m}^2$ , to save 80  $\mu\text{m}$  every 16 pixels for distribution of analog references and power supplies on one side, and for digital power supplies and the read-out tree on the other side (fig. 2). The pitch of 55  $\mu\text{m}$  of the sensor pixel matrix is kept using a suitable re-distribution layer in the top metal layer.

At the bottom of the 32 x 32 pixel matrix there are the four Read Out Tree (ROT) collecting the data from the pixels (fig. 3). After a global timestamp is added, data are sent out through a serializer.



Fig 3. ROT at the bottom of the pixel matrix.

#### **III. PIXEL ARCHITECTURE**

The pixel circuit is based on an architecture consisting of an amplifier, a discriminator and a TDC, that performs the Time of Arrival (TA) and the Time Over Threshold (TOT) measurements at the same time. The available pixel area is  $50 \times 55 \ \mu\text{m}^2$ . A brief description of the main pixel circuits is given in the following.

## A. Analog Front-End

The analog front-end circuit is presented in Fig. 4.



Fig. 4. Schematic representation of the analog part of the pixel-front-end.

This scheme consists of a charge sensitive amplifier (CSA), directly connected to the sensor, which converts the input current signal to a voltage one. The CSA is followed by a leading-edge discriminator (LED) which generates a digital pulse with a rising edge aligned to the crossing time between the CSA signal and a settable threshold. Every pixel is equipped with a charge injection capacitance for testing purposes. The layout of one pixel-front-end unit is presented in Fig. 5.



Fig. 5. Layout of the analog front-end. The total area is  $15 \times 50 \ \mu m^2$ .

Reference voltages and bias currents are provided by dedicated service blocks, each one of these elements serves many channels. These blocks comprise 4 Sigma-Delta DACs and a digitally programmable bias-generation cell that enables power consumption regulation. These circuits are implemented on the side analog column and they serve 512 channels.

The CSA signal amplitude is proportional to the input charge. It also features a constant current discharge feed-back which will produce a signal with a constant and approximately linear relationship between its amplitude and width. In such a way, the LED pulse can be used to retrieve the input charge by means of its TOT. The TOT measure is also used to correct the threshold crossing time-walk which is a function of the signal amplitude. The scheme has been designed to produce an output signal with a total jitter lower than 20 ps while consuming less than 20  $\mu$ W of static power as indicated by system level simulations presented in Tab. I.

TABLE I. SIMULATED JITTER PERFORMANCE OF A SINGLE PIXEL-FRONT-END CHANNEL INSERTED IN A COMPLETE 1024-CHANNELS-MATRIX SYSTEM. THE SLEW-RATE AND RMS-NOISE FIGURES ARE REFERRED TO THE CSA OUTPUT. THE PER-CHANNEL POWER CONSUMPTION INCLUDES SERVICE BLOCKS

| Sim. Type             | Scher   | matic        | Post-Layout |              |  |
|-----------------------|---------|--------------|-------------|--------------|--|
| Power Regime          | Nominal | Hi-<br>Power | Nominal     | Hi-<br>Power |  |
| Slew-Rate<br>(mV/ns)  | 380     | 540          | 250         | 360          |  |
| Rms noise<br>(mV)     | 5.0     | 4.9          | 3.9         | 3.8          |  |
| Jitter (ps)           | 13.2    | 9.1          | 15.6        | 10.5         |  |
| Power/Channel<br>(µW) | 18.2    | 31.5         | 18.6        | 32.9         |  |

The CSA core voltage amplifier has been implemented using a cascoded variation of a traditional inverter, to boost the total voltage gain of this stage. The inverter scheme has been chosen over a usual common-source one, to increase the slew-rate while maintaining the same power consumption by means of the summed trans-conductance of the two input transistors. Having a well-defined bias point at the two inputs is mandatory to have both the P and N transistors in saturation. This condition is not achievable for both transistors with the same DC point due to process variations. For this reason, an AC coupling has been used to split the two low frequency voltages. The active feed-back loop has been used to define these voltages and to provide the constant current discharge used to compensate the sensor leakage current. Jitter performance of this architecture are presented in Fig. 6.



Fig. 6. CSA simulated jitter performance versus total bias current. Red line: schematic simulation. Blue line: post-layout simulation. The simulated current range corresponds to the actual on-chip programmable one.

Finally, the LED scheme has been implemented with a discrete-time offset correction circuit, used to compensate mismatch-dependent effects and to set an absolute threshold value above a baseline level.

#### B. Time to Digital Converter

The constraints of small area, low power consumption and high resolution drive the architecture choice. To keep at minimum possible levels, the leakage current consumption, which is a very relevant issue in 28-nm CMOS technology, we decide to use High Voltage Thresholds (HVT) transistors. At the same time, we must satisfy the target requirement of a time resolution below 30 ps (LSB). We chose to implement a Vernier architecture [4], with two fully digital Digitally-Controlled-Oscillators (DCO) working at a frequency below 1 GHz. In a Vernier architecture, the resolution is given by the difference between the two periods, so it is possible to set suitable DCO frequencies to obtain a resolution below the required value of 30 ps.



Fig. 7. Vernier Scheme Architecture

The Fully digital DCO scheme is shown is fig. 8. It is realized with a group of tri-state buffers activated according to

the delay step needed, allowing a fine control of the DCO period. The ring is completed with a simple tapped delay line allowing a coarse control of the DCO period, for better compensation of large process and/or environment variations.



Fig. 8. The Fully digital DCO scheme

The DCOs are calibrated using an automatic procedure that sets the period of the DCO\_1 (fig. 8) slightly smaller than the DCO\_0 one. During calibration the period difference, which means the resolution, can be set into four different regimes: High, Mid-High, Mid-Low, Low Resolution. In Table II the post layout resolutions are summarized, for different corners and different resolution regimes.

TABLE II. POST LAYOUT RESOLUTION OBTAINED AT DIFFERENT CORNERS AND FOR DIFFERENT RESOLUTION REGIMES

| Res.<br>Regime | Hi          | igh         | Mid         | High        | Mid         | -Low        | L           | ow          |
|----------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|-------------|
| Corner         | LSB<br>(ps) | RMS<br>(ps) | LSB<br>(ps) | RMS<br>(ps) | LSB<br>(ps) | RMS<br>(ps) | LSB<br>(ps) | RMS<br>(ps) |
| MIN            | 6           | 1.7         | 18          | 5.2         | 25          | 7.5         | 69          | 20.2        |
| ТҮР            | 9           | 2.9         | 20          | 6.1         | 31          | 9.3         | 42          | 12.4        |
| MAX            | 12          | 4           | 30          | 9.2         | 30          | 9.2         | 50          | 14.7        |

The TDC performs two different kind of measurement at the same time. The first, and most precise, is the Time of Arrival (TA) of the input signal coming from the Front-End discriminator. This measure is done with respect to the 40 MHz master clock and its resolution can reach 10 ps LSB. In parallel TDC also performs a Time Over Threshold (TOT) measurement in parallel, for time walk correction, with a resolution around 1ns and a max TOT of 250ns. When an input signal arrives to the TDC, the TDC cannot accept other inputs for 300 ns to complete both the TA and TOT measurement. Therefore, the maximum sustainable rate from a single pixel is 3 MHz.

TABLE III. POWER CONSUMPTION FOR DIFFERENT DATA RATE.

|                   | Total Power Cons. (µW) |
|-------------------|------------------------|
| IDLE              | 20.7                   |
| Calibration       | 552                    |
| Input rate 3MHz   | 175                    |
| Input rate 1MHz   | 69                     |
| Input rate 500kHz | 45                     |
| Input rate 100kHz | 25                     |

The power consumption is one of the main constraints, together with the resolution required. For the TDC the power consumption depends on the input rate, with a leakage power around 3  $\mu$ W. In IDLE mode, while the TDC is waiting an input signal, the DCOs are switched off to minimize the power, with a consumption around 20  $\mu$ W. Tab III summarizes the power consumption for different data rates.

The TDC layout is shown in fig. 9. The cell sizes  $50 \times 31.5 \ \mu m^2$ .



Fig. 9. Layout of the TDC cell. The cell sizes  $50 \times 31.5 \,\mu\text{m}^2$ 

### IV. READ-OUT TREE

The data corresponding to every hit comes from the TDC (23 bit) serialized at 160 MHz. Figure 1 shows the block diagram of the Read Out Tree (ROT) and data serialization circuit. The TDC data is paired with the corresponding time-stamp information (9 bits) that indicates the sequential number of the bunch-crossing when the hit data are generated.

To avoid losing data when a pixel is hit again before the previous data has been read, two cache memories for each TDC are used to store the hit information waiting to be readout.

While the data is stored into the cache memories, an asynchronous ROT [5] sequentially reads the data from the buffers at a rate of 160 MHz and frees them. The ROT generates the geographical coordinates (8 bit) of the pixel from where the hit data comes and implements a zero-suppression feature.

The output of the ROT, made of the 8 bits address and the 32 bits TDC data, is fed to one of two FIFO made of 32 layers working at 160 MHz. The FIFOs are used to mitigate activity peaks that otherwise could cause hit data loss.

The 40 bits output of the FIFO is encoded using a custom transmitting protocol. The protocol is organized in 8-bit words and it is constructed as follows: when data is present at the FIFO output, it is divided in five bytes preceded by a header byte, otherwise the protocol block transmits an idle byte. The idle and header words can be set using the  $I^2C$  interface.

The output block gives a new byte at the output at a frequency of 160 MHz. Each byte is serialized and converted in DDR and transmitted to the output using an LVDS protocol at 1280 Mbit/s.

The maximum output bandwidth is 10.24 Gbit/s and the chip maximum average hit rate is about 200 kHz.

A block scheme of the ROT is shown in fig. 10.



## V. CONCLUSION

The complete set of characteristics of the ASIC and the results from post-layout simulations are presented in this document. In particular, the design solutions are illustrated, which have allowed to achieve the requirements for a highdensity pixel read-out circuit with timing.

The circuit was submitted for fabrication in October 2020. Its sizes  $2.618 \times 2.288 \text{ mm}^2$ . The full chip layout is shown in fig. 11.



Fig. 11. Full chip layout. Its sizes 2.618 x 2.288 mm<sup>2</sup>.

# References

- [1] LHCb Collaboration, "VELO supporting document for the Upgrade-II FTDR", In preparation, 2020.
- [2] A. Lai et al."First results of the TIMESPOT project on developments on fast sensors for future vertex detectors" to appear in Nuclear Instrumentations and Methods in Physics Research, Section A (NIMA), 2020.
- [3] L. Anderlini et al., "Intrinsic time resolution of 3D-trench silicon pixels for charged particle detection", arXive eprint 2004.10881. Submitted to JINST, 2020.
- [4] J. Kalisz, "Review of methods for time interval measurements with picosecond resolution", 2003 Metrologia 41, DOI: 10.1088/0026-1394/41/1/004
- [5] P. Fischer, "First implementation of the MEPHISTO binary readout architecture for strip detectors", NIM Section A, Volume 461, Issues 1– 3, 2001, Pages 499-504.