# POLITECNICO DI TORINO Repository ISTITUZIONALE A 6-to-38Gb/s capture-range bang-bang clock and data recovery circuit with deliberate-current-mismatch frequency detection and interpolation-based multiphase clock generation ## Original A 6-to-38Gb/s capture-range bang-bang clock and data recovery circuit with deliberate-current-mismatch frequency detection and interpolation-based multiphase clock generation / Lin, Wang; Yong, Chen; Chaowei, Yang; Xionghui, Zhou; Mei, Han; Crovetti, PAOLO STEFANO; Pui-In, Mak; Rui P., Martins. - In: INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS. - ISSN 1097-007X. - ELETTRONICO. - 51:5(2023), pp. 1988-2015. [10.1002/cta.3535] Availability: This version is available at: 11583/2978392 since: 2023-05-08T15:55:30Z Publisher: John Wiley & Sons, Ltd. Published DOI:10.1002/cta.3535 Terms of use: This article is made available under terms and conditions as specified in the corresponding bibliographic description in the repository Publisher copyright Wiley postprint/Author's Accepted Manuscript This is the peer reviewed version of the above quoted article, which has been published in final form at http://dx.doi.org/10.1002/cta.3535.This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions. (Article begins on next page) A 6-to-38Gb/s Capture-Range Bang-Bang Clock and Data Recovery Circuit with Deliberate-Current-Mismatch Frequency Detection and Interpolation-Based Multi-Phase Clock Generation Lin Wang <sup>1</sup>, Yong Chen <sup>1</sup>, Chaowei Yang <sup>1</sup>, Xionghui Zhou <sup>1</sup>, Mei Han <sup>1</sup>, Crovetti Paolo Stefano <sup>2</sup>, Pui-In Mak, and Rui P. Martins <sup>1,3</sup> Correspondence: Yong Chen <sup>1</sup> State-Key Laboratory of Analog and Mixed-Signal VLSI and IME/ECE-FST, University of Macau, Macau, China <sup>2</sup> Department of Electronics and Telecommunications (DET), Politecnico di Torino, Torino, Italy <sup>3</sup> on leave from the Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal #### **Abstract** This paper reports a bang-bang clock and data recovery circuit (BBCDR) with an ultra-wide capture range. The circuit exhibits automatic frequency capture and phase locking over a wide 6-to-38Gb/s range without using a frequency detector, allowed by a recently proposed deliberate-current-mismatch technique. Moreover, we accurately obtain an eight-phase clock through analog interpolation of quadrature signals over the whole wide frequency range, by introducing a tunable capacitor array before an inverter-based phase interpolator. A 65-nm prototype of the developed BBCDR occupies an area of 0.07mm<sup>2</sup> and attains a bit error rate of less than 10<sup>-12</sup> under a continuously variable input frequency, with a total power consumption of 24.6 mW for a 32-Gb/s non-return-zero input, thus leading to 0.769-pJ/bit energy efficiency. ### **Index Terms** Bang-Bang clock and data recovery (BBCDR), wide capture range, phase interpolator (PI), frequency detector (FD), switched-capacitor (SC) array, hybrid control circuit (HCC), current mismatch, ring oscillator (RO), R-2R digital-to-analog converter (DAC). #### I. Introduction The exponential growth in data transmission over the last decades is driving a swift evolution in wireline receivers [1-2] with serializer-deserializer (SerDes) systems experiencing a continuously increasing performance demand. In this scenario, multi tens-of-gigabits-per-second SerDes are necessary to accommodate the increasing data traffic quest in a limited number of channels. Moreover, flexible wireline communication systems require a wide capture range [3] to meet the different imperatives of a variety of standards and applications. Meeting both high-speed and wide-capture-range requirements simultaneously without compromising reliability and power consumption is extremely challenging and an area of intense research recently. To cover the input data format from a few gigabits per second to tens of gigabits per second, the "analog" receiver necessitates a clock and data recovery (CDR) circuit with a very wide frequency capture capability [4]. Previous work [5] employs a low-noise crystal as a reference clock, which increases the overall cost and the design complexity. Reference-less CDR mainly targets applications that are not feasible to use an external crystal [6]. However, the reference-less CDR must extract the frequency and phase information from the incoming data, which makes it hard to achieve wide-range frequency-acquisition ability. The stateof-the-art CDRs use a separate frequency detector (FD) [7-13] and dual-loop structure [14-19] to extend the capture range, thus increasing the design complexity and hardware cost. Another evolution has opted for an analog-to-digital converter (ADC)-based scheme, and a great amount of signal processing in the digital domain [20-24], which leads to a large power consumption and loop latency. Even if some solutions have been proposed in [25-31], they either require complex loop topologies [25,31] and large power consumption [26-28] or result in a limited capture range [29-30]. To tackle these issues, we have developed a deliberatecurrent-mismatch frequency acquisition technique in [32-35] that is applied in a single-loop CDR without an external reference and separate FD. In this paper, we further enhanced the technique presented in [32]-[35] to a wider frequency acquisition range to satisfy multi-standard applications. Considering that the frequency capture range in the [32]-[35] technique was limited by the clock generation concept, we propose an interpolation-based multi-phase clock generation to achieve a wider frequency acquisition range as well as a group of band preset signals for fast frequency acquisition at a high data rate. The proposed wide-capture-range CDR enables fast acquisition speed with low hardware cost, low power consumption, and high energy efficiency. Fig. 1. Commonly-used eight-phase clock generation schemes: (a) eight-phase ring oscillator (b) the divider follows the four-phase oscillator and (c) the phase interpolator follows the four-phase ring oscillator. To relieve the constraints on clock generation, wide-range CDR designs commonly use half-rate and quarter-rate schemes. These structures can effectively reduce the requested oscillator frequency to a half or a quarter, relieving the design pressure on clock generation. However, this scheme requires multi-phase clock signals. Furthermore, in a bang-bang clock and data recovery (BBCDR) with a 2x oversampling ratio, as proposed in [13], the clock phase needs to double for the phase detection function [34-36]. Fig. 1 illustrates multi-phase clock generation used in different clock generation schemes. Fig. 1(a) exhibits eight inverter-based cells to build an eight-phase voltage-controlled ring oscillator (RO). This scheme commonly uses a current-starved RO to realize a wide frequency output, resulting in large power consumption, which negatively impacts the CDR energy efficiency. In addition, the eight-phase RO has intrinsic frequency limitations that render it unsuitable to operate at a very high-frequency $f_{CK}$ . As an alternative, a four-phase oscillator generates a quadrature clock signal at a double frequency $2f_{CK}$ [37], starting from an eight-phase clock obtained with a frequency divider [Fig. 1(b)]. However, such a clock generation scheme also has several drawbacks: first, it requires an oscillator operating at a frequency $2f_{CK}$ , much higher than the target clock frequency $f_{CK}$ . Additionally, it also needs a frequency divider with a wide frequency range, which increases the design complexity. Finally, with the divider operating in an open loop, it does not precisely constrain the phase relationship of the generated clock signals. An alternative solution consists in adopting a current-starved RO used to generate a quadrature clock and then obtaining an eight-phase clock signal by interpolation, as in Fig. 1(c) [38]. Compared to Fig. 1(a), the adoption of a four-phase RO greatly reduces power consumption, leading to better energy efficiency. Besides, when compared with the scheme in Fig. 1(b), both the four-phase RO and eight-phase clocks are at the same frequency $f_{CK}$ , which avoids the burden of a very high-frequency oscillator. In this paper, we present an ultra-wide-capture-range BBCDR structure covering a 6-to-38Gb/s input data rate with minimal hardware cost and low power consumption. The proposed BBCDR incorporates a quarter-rate scheme and a bang-bang phase detector (BBPD), which entails an eight-phase clock signal. Aiming to eliminate the problems of the multi-phase clocks in Figs.1(a) and (b), we adopt the scheme in Fig. 1(c) in the proposed BBCDR and an analog, a flexible, low-cost inverter-based PI with its intrinsic frequency range limitations originally overcome by the configurable switched capacitor (SC) array on the output of the clock buffers driving the succeeding PI. The remaining paper has the following organization: Section II details the top architecture of the proposed BBCDR. Then, Section III discloses the circuit implementation of the BBCDR, while Section IV validates the effectiveness of the proposed scheme based on measurements of a 65-nm CMOS prototype. Finally, Section V provides the summary and conclusions. Fig. 2. The top architecture of the proposed BBCDR. #### II. TOP ARCHITECTURE OF THE PROPOSED BBCDR Fig. 2 outlines the top architecture of the proposed BBCDR. It consists of three main blocks: the multiphase clock generation block, the hybrid control circuit (HCC) logic, and the data path, whose structure and operation we describe in detail next. ### A. Multi-Phase Clock Generation We employ a four-phase RO in the proposed BBCDR to enable a wide clock-frequency range at low power consumption, deriving an eight-phase clock from its quadrature outputs by an inverter-based PI module. Due to the wide range of the RO output, the four-phase clock is nearly sinusoidal at high frequencies, while it is closer to a square wave with steep transitions at low frequencies. Such waveforms, however, are not suitable for the inverter-based PI, which relies on smooth, nearly sinusoidal phase inputs. Aiming to enable inverter-based PI over a wide frequency range, we introduce a digitally-controlled switched capacitor (SC) array, automatically tuned by the BBCDR controller to offer a different capacitive load to the clock buffers at different frequencies, to ensure that the PI has an approximately sinusoidal clock signal as input over the whole frequency range. To keep the SC array from affecting the RO frequency, we use a pair of differential buffers to isolate the RO and SC loads. ## B. Quarter-Rate Data Path Fig. 2 shows the data path in the BBCDR loop in the lower part. Eight double-tail flip-flops (DT FF) perform data retiming. DT FF are preferred over other topologies for their higher robustness. Moreover, the transistor sizes in the regeneration phase of the FF are increased to support high-speed data transmission. The circuit feeds the retimed data into the XOR gates and the charge pump (CP) blocks: the XOR gates compare the data signals sampled by adjacent phase clocks and perform phase detection of the 8-phase BBPD, while the CP module [39], operating as a trans-conductor, converts the phase difference between the discriminated clock and data into a current pulse. The output current of the CP block charges or discharges the second-order loop filter (LF), in such a way that a real-time current pulse signal carries the information about the phase difference between the clock and data, which adjusts the VCO control voltage ( $V_{\text{CONT}}$ ) dynamically through the loop filter to control the VCO in close-loop. ## C. Hybrid Control Circuit (HCC) The hybrid control circuit (HCC) mainly comprises a hybrid logic module, and is in charge of the generation of the band control signals for automatic band switching and continuous frequency acquisition. For this purpose, the HCC block consists of two parts: the first part is the VCO band control logic, including a binary counter (band counter), whose content corresponds to the VCO band, and a comparator with hysteresis driven by the $V_{\text{CONT}}$ signal from the data path, whose output transitions drive the clock of the band counter. The band counter can be also preset by the external signals ( $A_{2_{\text{NET}}} - A_{5_{\text{NET}}}$ ), for fast frequency locking. The second part is the mode switch control logic, which is in charge to control the alternate selection between negative and positive SP in the CP. The HCC block interacts with both the data and the clock path, thus controlling the global mode switching and frequency acquisition. ### D. Deliberate-Current-Mismatch-based FD-less Technique The proposed BBCDR is based on a deliberate-current-mismatch-based FD-less technique proposed in our previous paper [32-35] to achieve frequency capture at low hardware cost and low power consumption. Besides, we insert a non-zero SP to retrofit the symmetric PD curve, so that the PD generates a non-zero net output within a cycle slip [40]. For this aim, we insert an additional CP branch (indicated as the SP CP hereafter) controlled by the bias voltages $V_P$ and $V_N$ (Fig. 3), in parallel with the main CP circuit. Depending on the bias voltages $V_P$ and $V_N$ , digitally selected, we can operate the circuit in three modes: the positive-net-current (PNC) mode, when only the upper SP CP branch is on, the negative-net-current (NNC) mode, when only the lower SP CP branch is on, and the zero-net-current output (ZNC) mode, in which both the branches of the SP CP are off. The PNC and NNC non-zero output modes activated during the frequency acquisition mode (FAM) allow FD-less frequency capture ability. By contrast, the ZNC mode activated after frequency locking and with the loop in the phase tracking mode (PTM), achieves a better jitter performance by operating the PD as a normal PD with a ZNC. Fig. 3. Block diagram of the proposed SP-selected CP along with open-loop PD and FD curves ## III. CIRCUIT IMPLEMENTATION ### A. Voltage-Controlled Ring Oscillator The proposed BBCDR includes a current-starved RO that provides a quadrature clock signal, with a wide frequency tuning range. The entire four-phase oscillator consists of two identical cells, with the schematic shown in the right part of Fig. 4(a). By varying the gate voltages $V_A$ and $V_B$ of the bias transistors, we can control the frequency of the current-starved RO, determined by the current flowing through its branches. In the proposed cell, there are two bias transistors in each current path, such that the current flowing through the upper and lower branches will change at the same time while changing the control voltage. This topology is suitable to obtain four-phase clock signals over a very wide tuning range as shown in Fig. 4(c). $V_A$ and $V_B$ are in charge of the coarse tuning in the proposed RO for band switching while $V_{CONT}$ controls the fine-tuning after the selection of a certain band. To simplify the simulation setting, we set the sum of the bias voltages $V_A$ and $V_B$ as 1 V. The result shows that the operating frequency of RO varies from 1 GHz to 14 GHz. Fig. 4(d) plots the post-layout simulation results of the phase noise (PN) profile. Within a cell, the drain terminals of the bias transistors controlled by the same voltage $V_A$ ( $V_B$ ) in different branches are tied together to control their current precisely. We can combine all the PMOS controlled by $V_A$ and NMOS controlled by $V_B$ . To have a symmetrical layout, we assign to each branch a pair of bias transistors of the same size. Fig. 4. (a) Schematic of the four-phase voltage-controlled RO and (b) its layout, (c) tuning range under different biases, and (d) simulated PN. As mentioned in Section II, the insertion of an SC array at the RO output buffers smoothens the quadrature clock at low frequency, as required for proper phase interpolation. To decouple the SC array from the RO, i.e. to prevent the SC array from affecting the operation of the RO and its frequency, we use a differential buffer. Fig. 4(b) shows the complete layout of the RO. To ensure proportionality, the layout is strictly centrally symmetrical. The length of the VCO core is about 32 $\mu$ m, and its width is 20 $\mu$ m. We place the isolated buffer on the right side and its size is close to $10x20~\mu$ m<sup>2</sup>. This placement is beneficial to keep the overall symmetry and reduce the distance mismatch between the four phases of the clock. The RO provides a wide frequency range to support the loop and also enables a wide-capture-range CDR design. Fig. 5. (a) Schematic and (b) design size of the 6-bit R-2R DAC. # B. 6-bit R-2R Digital-to-Analog Converter (DAC) We introduce a 6-bit R-2R digital-to-analog converter (DAC) in our BBCDR to generate the bias voltages $V_A$ and $V_B$ of the RO, as well as to perform coarse frequency tuning. The DAC consists of resistors and switches controlled by the band selection signals $A_1 \sim A_5$ , [Fig. 5(a)], with the design sizes from Fig. 5(b). Two pairs of reference voltages $V_{REFP\_A} \sim V_{REFN\_A}$ and $V_{REFP\_B} \sim V_{REFN\_B}$ define the DAC output range. Depending on the binary value encoded in the $A_1 \sim A_5$ signals, the connection of the resistors will change, respectively, thus changing $V_A$ and $V_B$ . Fig. 6. Block diagram of (a) the control logic and (b) the SC array. Fig. 7. Schematic of (a) the two-stage double-tail FF and (b) the RS latch. ## C. SC Array and Its Control Logic Since the PI requires near sinewave signals with smooth transitions for proper operation, whereas the RO output signal is like a square wave in the low-frequency range, the SC array, acting as a variable load, placed at the output of the RO clock buffers smoothens their output. Considering that the VCO output is naturally a sine wave in the higher frequency range, we should configure a small load capacitor (or no capacitor at all) at high frequency to avoid excessive signal attenuation, whereas a larger load capacitor is necessary at low frequency. Fig. 6(a) displays the control logic and the corresponding truth table of the SC array [Fig. 6(b)]. The generated signal changes the load capacitance of the clock by controlling the PMOS-based switches. When the most significant bit $A_5$ is 0, the total load capacitance decreases as the VCO band, decided by the binary value $A_0 \sim A_4$ , rises. The maximum load capacitance is $63C_0$ (315fF), and is applied in the lowest frequency band. Once the most significant bit $A_5$ is 1, we set all control signal outputs to VDD to turn off the PMOS switches, with the load capacitance completely cut off. The value of $C_0$ (5fF) is determined by post-layout simulation ensures good phase interpolation over the wide frequency range. Since $C_{load}$ takes its maximum value at the lowest band (band0), in which the VCO operates at a relatively low frequency, the additional capacitance results in a minor increase in power consumption. Based on post-layout simulation results, the VCO operates at 2.41 GHz in band0 with 0.5V $V_{CONT}$ , and the total power consumption of the VCO and isolation buffer increases from 5.7mW to 6.16mW with the maximum SC load. ### D. DT FF The data path samples the input over eight uniformly-spaced phases, resulting in a 2x oversampling ratio under a quarter rate operation. Based on Alexander's topology, the modified BBPD employs eight flip-flops to strobe the input data. To retime the input and offer a clear demultiplexed output also at high frequency, we employ a DT FF in the proposed CDR. The schematic of the two-stage DT FF is shown in Fig.7(a). The two-stage structure has less stacking and can therefore operate at a lower supply voltage. The double tail enables both a large current in the latching stage (wide M<sub>1</sub>) for fast latching, independently of the common mode voltage of the input, as well as a small current in the input stage (small M<sub>12</sub>) [41]. Fig.7(b) depicts an RS latch used to convert the DT FF output into an NRZ data format. Fig. 8. Developed inverter-based PI for the clock generation. # E. PI To perform high-resolution phase generation we employed an analog, inverter-based PI based on previous works [42-53] in the proposed BBCDR. Despite its intrinsic phase capture range limitation, this topology is preferable to digital PIs [45-47] for its intrinsic simplicity, adjustable output phase, and low power. To ensure the monotonicity and linearity of the PI, we should carefully design the gain $K_{PI}$ to be constant [48]. Moreover, the analog PI requires a reset logic to limit the control voltage within the range. It follows that the PI has an intrinsic phase capture range limitation and cannot provide $0^{\circ}$ to $360^{\circ}$ phase rotation [48]. We choose this open-loop PI structure instead of a close-loop PI-based DLL [44] with the consideration that the external tunning of supply voltage is suitable to cover the phase error caused by PVT variation. Since we found that a small change in the supply voltage will result in a sufficiently large phase variation to cover the phase tuning requirement, this PI will not result in a large variation in output signal amplitude while tuning the phase.). Moreover, since the PI we introduced in this paper is only used to generate a multi-phase clock signal, the accuracy of the locking process is entirely managed by the BBPD. An accurate digital-controlled PI would have not brought any significant advantage in terms of accuracy in our design, while it would have unnecessarily increased the locking time, since digital-controlled PIs show a tradeoff between accuracy and the number of locking cycles [44]. On the other hand, the analog structure we chose enables continuous fine-grained tuning without increasing the locking time. Fig. 9. Effects of the phase skew on the PD curve (waveforms in (a)-(b) are based on simulation results). The upper part of Fig. 8 shows the structure of the employed PI, with its core followed by four differential buffers to drive the generated clock signals. The same figure also shows the schematic of the drivers. Cells operated at the VDDPII supply voltage generate the quadrature signals $CK_0$ , $CK_{90}$ , $CK_{180}$ and $CK_{270}$ , with the interpolated clock generated by equal-weighted-quadrature signals. VDDPI2 controls the percentage of the phase-leading signal and VDDPI3 is the percentage of the signal with a phase lagging of 90 degrees. The change in the two supply voltages VDDPI can slightly change the weight of the input signal, thus changing the phase generated by interpolation. For example, when $CK_{IP}$ and $CK_{QP}$ generate $CK_{45}$ , with VDDPI2=1.05V and VDDPI3=1V, the generated $CK_{45}$ will be closer to $CK_{00}$ . On the contrary, with VDDPI2=1V and VDDPI3=1.05V, the generated $CK_{45}$ will lag and is closer to $CK_{90}$ . While the SC array is used to smoothen steep clocks (Fig. 9(a)) into sinusoidal waves (Fig. 9(b)) for better phase generation, the interpolated clock phase shows a slight non-ideal deviation over the entire tuning range due to the wide output range of the clock. This clock phase deviation affects the PD curve (Fig. 9), resulting in a shift of the SP point over the x-axis. When interpolating a uniformly distributed clock [Fig. 9(c)], we obtain a symmetrically distributed PD curve [Fig. 9(d)], and the clock phase in itself does not give any contribution to the SP, since the net PD output over a duty cycle is 0. When the interpolated clock is leading [Fig. 9(e)], it expands the interval in which the CP current is negative (Fig. 9(f)), and the PD gets a negative net current for one duty cycle, which is equivalent to having a positive SP point in the x-axis [19]. On the contrary, when the interpolated clock is lagging [Fig. 9(g)], it expands the positive current interval (Fig. 9(h)), and PD results in a positive net current over a duty cycle, which is equivalent to having a negative SP point. We can observe that, during the FAM, the alternately switched upper and lower extra CP branches introduce a change in the SP by translating the PD curve along the y-axis, as previously outlined in Section II-D, while the non-uniform distribution of the clock described above is equivalent to a translation along the x-axis. In the frequency capture phase, the effects of two SP point shifts, due to the deliberately inserted current mismatch, relate to the clock phase - superimposed to shape the PD curve, among which the deliberately inserted SP through the SP CP plays a dominant role. Moreover, fine-tuning the VDDPI can eliminate the SP shift caused by the interpolated clock phase. The robustness of the PI is verified by simulation under PVT variation in Fig. 10. The PI shows a large phase deviation and differential non-linearity (DNL) at low frequency without phase tuning, and better linearity at high frequency. The nonideal clock phase worsen the PD curve asymmetry. Although the negative output region is heavily compressed with a -0.6 DNL clock as shown in Fig. 11(a), we must emphasize that the large PI nonlinearity will not affect the PD output polarity with appropriate SPCP bias (Fig. 11(a)). Although the lagging phase error compresses the negative output region of the PD curve, a large VNN bias will guarantee the PD has a negative output in a cycle slip under NNC mode. The CDR can also achieve a robustness frequency acquisition and band searching without precise PI tuning, as observed in the measurement results reported in Fig. 21 in Section IV of this paper. The phase non-linearity, however, results in a large PD mismatch, which worsens the jitter performance in the phase locked state. The open-loop PI enables artificially external tuning to optimize the jitter performance, which ensures the PI has a high linearity output signal under PVT variations over the entire operating range. The loop has a symmetrical PD curve under ideal clock phase in Fig. 11(b), results in a better frequency tracking and phase locking. Fig. 10 (a) The simulation DNL of PI with and without PI phase tuning under different corners and (b) temperature and (c) voltage variation. Fig. 11 The PD curve under different SPCP biases with (a)-0.6 DNL clock phase (b) ideal clock phase ## F. Details of our HCC The HCC introduced in Fig. 12 combines VCO hopping control logic and mode switching control logic. In the VCO hopping control logic, the hysteresis comparator (i.e., Schmitt trigger) is employed to control the range of $V_{\text{CONT}}$ with its schematic disclosed in Fig. 13. Whenever $V_{\text{CONT}}$ reaches the upper or lower threshold voltage ( $V_{S+}$ and $V_{S-}$ ) of the Schmitt trigger, it will produce an edge signal on SW, converted by an edge detector into a pulse signal BANDSW and fed to the 6-bit binary counter to realize RO band selection. On the other hand, we specifically introduce a set of signals $A_{2 \text{ SET}} \sim A_{5 \text{ SET}}$ to externally control the VCO band. In the same HCC block, the mode switch control logic is in charge to control the loop toggling between FAM and the PTM, by selecting the SP point shift. When MDSW is 0, it assigns PD to FAM, with the deliberate current mismatch inserted for FD-less frequency acquisition. Otherwise, When $V_{CONT}$ is within the preset $V_{REF}$ range ( $V_{REF}$ - $\sim V_{REF+}$ ) and the VCO does not jump to a different band within a fixed preset time (obtained by a 2-bit counter driven by a current-starved oscillator), a DFF maintains the high-level mode switching signal MDSW, as presented in Fig. 12. Then, the BBCDR completed the frequency locking with success and it will switch into PTM with a zero net current output. Due to the SP selection mechanism, the proposed BBCDR can realize both continuous frequency acquisition and automatic band switching. The function of the reference VREF- and VREF+ is to set a reasonable range of VCONT. It makes sure that the control voltage of VCO will not be in an extreme value like 1V or 0V. Once VCONT is out of VREF range, CDR will be forced to FAM mode to continue searching for frequency lock, thus VRFE also acts as a fail-safe mechanism in the loop operation. The value of VREF is set to be close to the threshold of the Schmitt trigger. Although the CDR handles high-speed input data, the HCC operates at a low frequency and the input signal of the counter is a rail-to-rail pulse of several hundred megahertz. Therefore, the HCC block has strong robustness to ensure successful locking. A 6-bit ripple counter included in the HCC block controls the RO band switching as shown in Fig. 14. We specially design the 3<sup>rd</sup> to the 6<sup>th</sup> DFFs of the ripple counter as externally set to allow presetting of the VCO band. Fig. 17 and Fig. 20 illustrate the detailed operation process, with some intermediate processes omitted for fast locking, respectively. Fig. 12. Block diagram of our presented HCC Fig. 13. Schematic and detailed size of the Schmitt trigger We employ the current starved timer to generate the preset time interval (Fig. 2), i.e. the time interval following a $V_{\text{CONT}}$ transition after which the circuit assumes the completion of frequency locking, unless it observes another $V_{\text{CONT}}$ transition before. Fig. 15 depicts the schematic of the current starved timer. We add an inverter-based buffer to increase the driving capability of the output signal. The gate of the PMOS devices in the upper branches and of the NMOS in the lower branches connect, respectively, to the control voltages $V_{\text{CTIMERP}}$ and $V_{\text{CTIMERN}}$ , which allow current tuning, thus adjusting the preset time, over a wide range. Fig. 14. Block diagram of the bit signal counter with band preset function Fig. 15. Schematic and detailed size of the current-starved RO-based timer To ensure the consistency of $V_{\text{CONT}}$ during the entire frequency acquisition, the asymmetrical PD needs to generate a non-zero output current with opposite polarity in adjacent bands. Fig. 16 [34-35] shows the logic of the specific FAM mode control intended to select the polarity of PD net currents over a cycle slip, with the operating mechanism detailed as follows: (1) When the loop is in FAM and SW signal is 0, MUX<sub>1</sub> selects $V_{\text{PP}}$ for $V_{\text{P}}$ to open the extra upper CP current path while MUX<sub>2</sub> selects $V_{\text{N}} = V_{\text{NZSP}} = 0$ to turn off the lower extra CP path. The loop results in an asymmetrical PD curve with a positive net current and the charging of the capacitor in the LF enables $V_{\text{CONT}}$ to continue to increase. (2) When $V_{\text{CONT}}$ rises to the high threshold voltage $V_{\text{S+}}$ of the Schmitt trigger, SW results in 1. In return, the selection of $V_{\text{NN}}$ opens the lower current path and the selection of $V_{\text{PZSP}} = 1$ closes the upper current path; The loop thus results in a negative output, and $V_{\text{CONT}}$ decreases due to the discharge of the LF. (3) MDSW is 0 due to the dynamic variation of $V_{\text{CONT}}$ during the entire frequency acquisition, and we choose a larger bias voltage $V_{\text{CPF}}$ for $V_{\text{CP}}$ to release a faster frequency hunting; When the MDSW goes high after frequency locking, both additional CP paths become off and the selection of a smaller bias $V_{\text{CPP}}$ leads to better jitter performance. | Truth Table | | | | | | | | | | | |-------------|------|----|-------------------|-------------------|-------------------|-------------------|------------------|--|--|--| | Mode | MDSW | SW | V <sub>PSEL</sub> | V <sub>NSEL</sub> | $V_P$ | V <sub>N</sub> | V <sub>CP</sub> | | | | | FAM | 0 | 0 | 1 | 0 | V <sub>PP</sub> | V <sub>NZSP</sub> | V <sub>CPF</sub> | | | | | FAIVI | 0 | 1 | 0 | 1 | V <sub>PZSP</sub> | V <sub>NN</sub> | V <sub>CPF</sub> | | | | | PTM | 1 | 0 | 0 | 0 | V <sub>PZSP</sub> | V <sub>NZSP</sub> | V <sub>CPP</sub> | | | | | | 1 | 1 | 0 | 0 | V <sub>PZSP</sub> | V <sub>NZSP</sub> | V <sub>CPP</sub> | | | | Fig. 16. Implementation and truth table for the SP point selection. Fig. 17 illustrates better the operation of the HCC, which offers the transient waveforms of the HCC internal node. With the loop initially locked at band3 and a *Reset* signal activated at $t_0$ the circuit releases the FAM and the CDR loses the lock. Regardless of $V_{\text{CONT}}$ , the VCO remains settled at band0 with the global *Reset* signal asserted. Nonetheless, even with the *Reset* signal maintained, the Schmitt trigger follows $V_{\text{CONT}}$ to generate the *SW* signal. Simultaneously, the edge detector detects the edge of *SW* and generates a pulse *BANDSW* signal. At $t_1$ , the circuit disables *Reset*, and the counter begins to increase at each pulse of the input *BANDSW* signal. After a continuous search time $T_a$ , the BBCDR achieves frequency locking at $t_2$ . When the timer overflows at $t_3$ , the circuit releases PTM for better jitter performance [54-57]. In Fig. 17(b), with the *Reset* maintained, we can observe the same transient waveforms from Fig. 17(a) when the $A_{2\_SET}$ is active. It can also perceive that the RO jumps from band0 to band4 instantaneously, thus reducing the total time of frequency locking at band9 to $T_b$ . In Fig. 17(c), we show the case when the circuit maintains $A_{3\_SET}$ as a further demonstration: now, with the RO set to band8 immediately with the activation of $A_{3\_SET}$ , we can further reduce the frequency locking time $T_c$ . Fig. 17. Critical transient waveforms of our HCC with band preset signal enabled. ### IV. MEASUREMENT RESULTS Fig. 18 depicts the top layout of the proposed BBCDR fabricated in a 65-nm CMOS. The active area of the prototype is 0.07mm<sup>2</sup> and the LF occupies 91% of the data path. The total area of the PD, CP, and XOR modules enabling data retiming, phase detection, and frequency acquisition are just 0.0026 mm<sup>2</sup>, i.e., 3.7% of the total core area. The multi-phase clock generation occupies 0.0031mm<sup>2</sup> in total, which corresponds to 4.4% of the total active area. The RO and isolation buffer area is 0.0011mm<sup>2</sup>, the SC array occupies 0.0011 mm<sup>2</sup>, and the RDAC is 0.0007 mm<sup>2</sup>. The power breakdown is shown in the lower part. The prototype consumes 24.61 mW with 32 Gb/s input and result in an energy efficiency of 0.769 pJ/bit. | Main Circuits Ar | | | Area Main Circuits | | Area | Auxiliary Circuits | | | | |------------------|-----------------|--------|--------------------|------|--------|--------------------|---------------------------------|---|------------| | Α | PD + CP + XOR | 0.0026 | Е | RDAC | 0.0007 | L <sub>1</sub> | D <sub>OUT2</sub> Output Buffer | | | | В | RO + VCO Buffer | 0.0011 | F | нсс | 0.0005 | L <sub>2</sub> | Douts Output Buffer | | Decoupling | | С | SC Array | 0.0011 | G | LF | 0.064 | L <sub>3</sub> | Clock Output Buffer | , | Capacitor | | D | PI + PI Buffer | 0.0002 | | | | Н | VCONT Output Buffer | | | | Power | Brea | kd | low | r | |-------|------|----|-----|---| |-------|------|----|-----|---| | Sub-Blocks | Power | Proportion | Power | Proportion | | | | | |----------------------------------------------------------------|-------|------------|-------|------------|-------|--|--|--| | VCO | 10.37 | 42.1% | XOR | 1.01 4.1% | | | | | | CP | 1.13 | 4.6% | PD | 4.41 | 18.0% | | | | | PI | 1.50 | 6.1% | HCC | 0.47 | 1.9% | | | | | PI Buffer 5.72 23.2% Energy Efficiency = 0.769 pJ/bit @ 32Gb/s | | | | | | | | | Fig. 18. Chip micrograph with area breakdown. Fig. 19. Experimental setup for the BBCDR testing. Fig. 19 shows the setup employed for the testing of the BBCDR (device under test, DUT) [58]. The channel loss mainly comes from the high-speed connector and the PCB trace before the DUT. In our design, we chose a short connector and we carefully designed the PCB traces on a Rogers substrate, as demanded to minmize high-speed path loss. In such conditions, our prototype achieves robust operation over a wide capture range without an embedded equalizer. We use an HP6626A DC power supply to power up multiple pins of the chip through an LDO on the power supply PCB. We also use the Keysight bit error ratio tester (BERT), i.e., M8040B, for NRZ data generation. The circuit transmits the single-ended NRZ data stream through the remote port of the BERT. We transfer the output clock signal obtained from the DUT to the signal analyzer Keysight N9040B for real-time spectrum and phase noise analysis, to observe the difference between the spectrum of the unlocked clock and the locked clock. In addition, we deliver the clock and data to the oscilloscope to assess the eye diagram of the recovered clock and data. We can also use the oscilloscope to measure the bath curve and BER in the presence of the jitter injection, as well as metrics such as peak-to-peak jitter in the time domain. Baluns convert the differential recovered clock and data signals to single-ended signals for better testing. The oscilloscope Tektronix MDO3024 allows the testing of low-frequency logic signals (e.g. *SW*, *V*<sub>CONT</sub>, *MDSW* previously reported in Fig. 17) from the HCC. These logic control signals facilitate the understanding and verification of the internal operation of the circuit. The memory depth of the oscilloscope is sufficient to capture the complete waveforms of all the HCC signals in a single locking process, as shown in Fig. 20 (b) and Fig. 21. With the help of these waveforms, we can completely verify the fully automatic searching process of the VCO band from high to low, and the operation of the loop switching from FAM to PTM after frequency locking. Moreover, we can easily read out with the instrument scale the time required for these operations. Fig. 20(a) illustrates an idealized timing diagram for comparison and better understanding. Fig. 20(b) plots some acquired waveforms of the low-frequency logical signals from the HCC block to illustrate the settling process. In this example, the location of the target frequency of the NRZ input signal is in band32, with the settling process illustrated for different band preset signal configurations. (1) In case 1, we disable all the band preset signals. After releasing the global *Reset* signal, the BBCDR loses lock, and the circuit forces it to band 000000. Since there is no active band preset signal, the loop searches from the lowest band and finally locks at the target frequency after 32 hops. In this case, the BBCDR requires 14.25 $\mu$ s for frequency acquisition in total. (2) In case 2, we assume the $A_{2\_SET}$ signal is enabled after the release of the *Reset* signal, in such a way that the loop loses lock after the *Reset* signal. After 10 ns the $A_{2\_SET}$ activation causes bit2 to set to 1, with the VCO forced to jump from band0 to band4. Since no sudden current injection or leakage occurs in the CP, the $V_{CONT}$ will remain unchanged at band4. The loop then continues to charge or discharge the LF capacitor for frequency capture, toggling between NNC and PNC modes. Fig. 20(b) highlights with a green box the search time saved from band1 to band3 by this operation. (3) In case3, we trigger $A_{3\_SET}$ after the release of the *Reset* signal, and the loop experiences a band hopping in 10 ns after the time *Reset* signal fails and before the enabling of $A_{3\_SET}$ signal. Therefore, the bit control signal arrives, with band1 swept by the RO and the $A_{3\_SET}$ signal enables the RO to jump from band1 to band9. We save search time from band2 to band8, with the total lock time of the loop reduced from 14.02 µs in case2 to 11.24 µs in case3. (4) In case4, there is the release of the $A_{4\_SET}$ signal and the loop jumps directly from band0 to band16, reducing the lock time to 6.42 µs. (5) Case5, finally releases the $A_{5\_SET}$ , with the VCO quickly set to band32 (100000), which is exactly the location of the target. After a short time, with the target hunted, the loop reaches frequency locking in only 0.39 microseconds. This set of test data illustrates that we can partially ignore the search process in the frequency capture using bit preset signals, which demonstrates the possibility of designing a fully automated, wide-range, fast-locking BBCDR. Fig. 20. (a) Illustration of different band-jumping cases and (b) measured dynamic settling process with band preset signal enabled. It is worth observing that the preset signals do not impair the operation of the loop during the FAM period, as explained below: (1) The $V_{\text{CONT}}$ does not change abruptly when the VCO band changes by the preset signal $A_{2\_\text{SET}}\sim A_{5\_\text{SET}}$ , since we only release the control signal of the counter and the CP does not generate a sudden current to charge/discharge the LF. It follows that $V_{\text{CONT}}$ is continuous during this process. (2) After a sudden change in the VCO band, the loop remains in FAM and it continues searching for the target under the PNC or the NNC modes. Fig. 21. Measured dynamic settling process with band preset signal disabled. Fig. 21 displays the correct setting of the BBCDR over a wide frequency range after the *Reset* signal, where case (a) locks in band 0 immediately at the reset instant with a 6-Gb/s data rate while case (d) settles at 38 Gb/s after a journey through 32 bands. We can describe a detailed process as follows: Initially, let us consider the BBCDR locked in PTM. The BBCDR receives a new target NRZ data with the global Reset signal released at t<sub>0</sub>, thus resetting the VCO to band 0 (where band 0 is the lowest band and band 63 is the highest). Even though $V_{\text{CONT}}$ jumps up and down between $t_0$ and $t_1$ , the VCO settles itself at band0 due to the active **Reset** signal. At $t_1$ , the **Reset** signal goes high and the BBCDR enters the FAM. Since a BBPD with a selected SP point generates a non-zero net current to charge and discharge the LF, V<sub>CONT</sub> wanders up and down between the upper and lower threshold of the Schmitt trigger ( $V_{S+}$ and $V_{S-}$ ) to search for the target frequency and the BBCDR switches between PNC and NNC in neighbor bands alternately. In this way, we obtain a continuous automatic frequency capture. After several band hops, the circuit hunts the correct band and the BBCDR locks at $t_2$ in FAM. When the internal timer of the HCC overflows at $t_3$ , **MDSW** goes high and PTM takes over, like this we can further suppress the ripple on $V_{\text{CONT}}$ reaching better jitter performance by operating the CP with zero net current over a period. Finally, if the input data is out of the capture range of the BBCDR, such as exceeding the highest capture frequency or the lowest capture frequency, the loop will continuously search all the 64 bands, and then starts from the lowest band0 to the next loop. Fig. 21(e) shows the relevant test result. Fig. 22. Measured PN curve, spectrum, eye diagram and jitter histogram at (a) 6 Gb/s, (b) 20 Gb/s, (3) 32 Gb/s and (d) 38 Gb/s. Figs. 22(a)-(d) shows the test results of the recovered clock and data under different locking conditions in response to a pseudo-random binary sequence (PRBS) of length 2<sup>7</sup>-1. We report in detail, the PN, the spectrum, a 4-unit interval (UI) eye diagram, and the jitter histogram at four different data rates, namely: 6 Gb/s, 20 Gb/s, 32 Gb/s, and 38 Gb/s. In Fig. 22(a), the circuit recovers a 1.5-GHz clock with 6-Gb/s data input, and the raw and smoothed integral jitters are 3.2 ps and 1.54 ps, respectively. For the jitter histogram in the time domain, the standard deviation of the jitter is 2.53 ps while the peak-to-peak jitter (jitter<sub>P-P</sub>) is 18.74 ps. At low frequency, the retimed data and clock eye diagram are steeper and closer to a square wave. Fig. 22(b) shows the corresponding recovered clock and data at 20-Gb/s input. The raw and smoothed integral jitters of the clock are 3.11 ps and 1.89 ps, respectively, which is close to Fig. 22(a). The recovered clock histogram has a standard deviation of 3.84 ps and a peak-to-peak jitter of 25.86 ps. Figs. 20(c) and (d) show the test results at 32-Gb/s and 38-Gb/s input, respectively. When the recovery clock is 8 GHz, the raw and smoothed integral jitters reduce to 1.54 ps and 919.5 fs, respectively. At the recovery clock of 9.5 GHz, the Raw and smoothed integral jitter values further decrease to 1.37 ps and 636.4 fs. The reason for the jitter variation is that both $K_{VCO}$ and $K_{PD}$ are changing with frequency, thus the loop parameters are not constant over a wide operating range. We can tackle this problem in future work by designing a tunable LF. In addition, at high frequencies, the clock and data waveforms are closer to a sinusoidal waveform. Fig. 23. Measured: (a) JTF and (b) JTOL under different data rates. TABLE I. Performance Summary and Comparison with the State-of-the-Art. | | | | | | | | 1 | | | | |---------------------------------------------------|-------------------------|-------------------------------------|---------------|------------------------|---------------------------------------|----------------------------------|------------------------------------------|-------------------------------------------|----------------------------------|----------------------------------| | | | This Work | | ESSCIRC'18 [52] | SSC-L'20 [53] | JSSC'21 [22] | JSSC'17 [28] | JSSC'20 [54] | JSSC'22 [33] | JSSC'22 [34] | | Key Technologies | | Single-Loop<br>Quarter-Rate<br>BBPD | | Half-Rate<br>CFD+DC-FD | Single-Loop<br>Half-Rate<br>Linear PD | Single-Loop<br>Half-Rate<br>BBPD | Single-Loop<br>Half-Rate<br>Baud-Rate PD | Single-Loop<br>Half-Rate<br>Extended BBPD | Single-Loop<br>Full-Rate<br>BBPD | Single-Loop<br>Half-Rate<br>BBPD | | Oversampling Ratio | | 2x | | 2x | N/A | 4x | 1x | 2.5x | 3x | 2x | | Need FD? | | No | | Yes | No | Yes | Yes | Yes | No | No | | Data Sampling Clock? | | No | | Yes | Yes | No | No | No | No | No | | Data Rate (Gb/s) | | 6 to 38 | | 4 to 10 | 6.4 to 11 | 4 to 20 | 22.5 to 32 | 6.5 to 12.5 | 23.4 to 29.1 | 47.6 to 58.8 | | Absolute Capture Range (Gb/s) | | 32 | | 6 | 4.6 | 16 | 9.5 | 6 | 5.7 | 11.2 | | Acquisition Speed((Gb/s)/μs) | | 3.19 | | 0.26 | 0.29 | 1.6 | 0.001 | 2 | 8.2 | 9.81 | | Output RMS Jitter (ps) | | 1.81* / 0.919** @8GHz | | 1.05 @5GHz | 1.66 @4GHz | 1.95 @10GHz | 2.6 @12.5GHz | 1.15 @5.9GHz | 0.487@14GHz | 0.416@13GHz | | JTOL (UI <sub>pp</sub> )<br>@ Jitter<br>Frequency | D <sub>IN</sub> (NRZ) | 0.275 @100MHz | 0.275 @200MHz | N/A | 0.14 @100MHz | 0.42 @100MHz | 0.2 @100MHz | 0.34 @200MHz | N/A | 0.45 @100MHz | | | D <sub>IN</sub> (PAM-4) | N/A 0.4 @200MHz | 0.1 @100MHz | | Power Consun | nption (mW) | 24.6 @32Gb/s | | 33 @10Gb/s | N/A | 37.3 @20Gb/s | 87.61 @32Gb/s | 21.13 @10Gb/s | 19.16 | 11 to 13.1 | | Energy Efficiency (pJ/bit) | | 0.769 | | 3.3 | 2.7 | 1.87 | 2.74 | 2.11 | 0.68 | 0.22 to 0.25 | | Supply Voltage (V) | | 1 | | 1 | 1 | 1.2 | 0.9 | 1 | 1.2 / 0.6 | 1 / 0.6 | | Core Area (mr | m²) | 0.07 | | 0.48 | 0.11 | 0.045 | 0.138 | 0.031 | 0.0285 | 0.056 | | CMOS Techno | ology (nm) | | 65 | 28 | 28 | 65 | 28 | 28 | 28 | 28 | <sup>\*</sup> Obtained from oscilloscope with 0.456ps<sub>rms</sub> trigger signal; \*\*Obtained from integrated phase noise from 100Hz to 1GHz. Fig. 23 plots the measured jitter tolerance (JTOL) and the jitter transfer function (JTF) [59-60] of the proposed reference-less BBCDR under locked conditions at three different data rates, i.e. 6 Gb/s, 20 Gb/s, and 32 Gb/s. We test all the JTF curves under a 0.05-UI<sub>PP</sub> jitter injection performed by the Keysight M8040A BERT. As analyzed before, the loop achieves better jitter performance in PTM with the extra CP path disabled by the HCC control logic and the loop operates with a zero-net-current. At 6-Gb/s PRBS input, the recovery clock operates at 1.5 GHz, with a bandwidth of 25 MHz. The loop bandwidths at 20 Gb/s and 32 Gb/s expand to about 60 MHz and 70 MHz, respectively, due to the variations in $K_{VCO}$ at different frequencies, as discussed above. Under jitter injection by the BERT M8040A, Fig. 23(b) shows the measured JTOL curves. We reach the JTOL at 100 MHz at 0.2 UI<sub>PP</sub>, 0.29 UI<sub>PP</sub>, and 0.12 UI<sub>PP</sub> when inputting 6-Gb/s, 20-Gb/s, and 32-Gb/s PRBS patterns, respectively, which demonstrate the robustness of the proposed BBCDR. Benchmarking with previous state-of-the-art in Table I, the proposed BBCDR covers the widest input data rate from 6 Gb/s to 38 Gb/s, indicating an absolute capture range of 32 Gb/s. Employing the current-mismatch-based SP selection mechanism, the proposed prototype achieves an acquisition speed of 3.19 (Gb/s)/µs and an energy efficiency of 0.769 pJ/bit. # V. Conclusions This paper presented a wide-capture-range quarter-rate BBCDR without both external reference and separate FD. Employing the deliberate-current-mismatch technique and a wideband PI-based multi-phase clock generation scheme, the BBCDR covered NRZ input signals ranging from 6 Gb/s to 38 Gb/s automatically. Under the control of the HCC block, the loop obtained a fast and robust frequency acquisition in a single loop. Fabricated in 65-nm CMOS, the proposed prototype occupied an area of 0.07 mm<sup>2</sup> and scored a 0.769-pJ/bit energy efficiency at 32 Gb/s. #### REFERENCES - [1] R. P. Martins, et al., "Bird's-eye view of analog and mixed-signal chips for the 21st century," *Int. J. Circuit Theory Appl.*, vol. 49, no. 3, pp. 746–761, 2021. - [2] R. P. Martins, et al., "Revisiting the frontiers of analog and mixed signal integrated circuits architectures and techniques towards the future Internet of Everything (IoE) applications," *Found. Trends Integr. Circuits Syst.*, vol. 1, nos. 2–3, pp. 72–216, Nov. 2021. - [3] D. Dalton, et al., "A 12.5-Mb/s to 2.7-Gb/s continuous-rate CDR with automatic frequency acquisition and data-rate readback," *IEEE J. Solid-State Circuits*, vol. 40, no. 12, pp. 2713-2725, Dec. 2005. - [4] K. Park et al., "A 4–20-Gb/s 1.87-pJ/b continuous-rate digital CDR circuit with unlimited frequency acquisition capability in 65-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 56, no. 5, pp. 1597-1607, May 2021. - [5] J. Kenney, et al., "A 6.5Mb/s to 11.3Gb/s continuous-rate clock and data recovery," IEEE CICC, San Jose, CA, pp. 1-4, Sep. 2014. - [6] J. Lee, et al., "A 20 G/s full-rate linear clock and data recovery circuit with automatic frequency acquisition," *IEEE J. Solid-State Circuits*, vol. 44, pp. 3590–3602, Dec. 2009. - [7] S. B. Anand, et al., "A 2.75 Gb/s CMOS clock recovery circuit with broad capture range," IEEE ISSCC, San Francisco, CA, pp. 214–215, Feb. 2001. - [8] M. S. Jalali, et al., "An 8mW frequency detector for 10Gb/s half-rate CDR using clock phase selection," IEEE CICC, San Jose, CA, pp. 1-8, Sep. 2013. - [9] M. S. Jalali, et al., "A reference-less single-loop half-rate binary CDR," IEEE J. Solid-State Circuits, vol. 50, no. 9, pp. 2037-2047, Sep. 2015. - [10] S. Choi, et al., "A 0.65-to-10.5 Gb/s reference-less CDR with asynchronous baud-rate sampling for frequency acquisition and adaptive equalization," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 63, no. 2, pp. 276-287, Feb. 2016. - [11] Y. Lee, et al., "An unbounded frequency detection mechanism for continuous-rate CDR circuits," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 64, no. 5, pp. 500-504, May 2017. - [12] Rong-Jyi Yang, et al., "A 155.52 Mbps-3.125 Gbps continuous-rate clock and data recovery circuit," *IEEE J. Solid-State Circuits*, vol. 41, no. 6, pp. 1380-1390, Jun. 2006. - [13] J. Jin, et al., "A 4.0-10.0-Gb/s referenceless CDR with wide-range, jitter-tolerant, and harmonic-lock-free frequency acquisition technique," *IEEE ESSCIRC*, pp. 146-149, Sep. 2018. - [14] N. Kocaman, et al., "An 8.5-11.5-Gbps SONET transceiver with referenceless frequency acquisition," *IEEE J. Solid-State Circuits*, vol. 48, no. 8, pp. 1875–1884, Aug. 2013. - [15] J. Jin, et al., "A 0.75–3.0-Gb/s dual-mode temperature-tolerant referenceless CDR with a deadzone-compensated frequency detector," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 10, pp. 2994-3003, Oct. 2018. - [16] C. Hsieh, et al., "A 1–16-Gb/s wide-range clock/data recovery circuit with a bidirectional frequency detector," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 58, no. 8, pp. 487-491, Aug. 2011. - [17] Y. Tsunoda, et al., "A 24-to-35Gb/s x4 VCSEL driver IC with multi-rate referenceless CDR in 0.13µm SiGe BiCMOS," *IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers*, San Francisco, CA, Feb. 2015. - [18] K. Son, et al., "A 0.42–3.45 Gb/s referenceless clock and data recovery circuit with counter-based unrestricted frequency acquisition," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 67, no. 6, pp. 974-978, Jun. 2020. - [19] S. Lee, et al., "A 650Mb/s-to-8Gb/s referenceless CDR circuit with automatic acquisition of data rate," *IEEE International Solid-State Circuits Conference Digest of Technical Papers*, San Francisco, CA, Feb. 2009. - [20] R. Inti, et al., "A 0.5-to-2.5 Gb/s reference-less half-rate digital CDR with unlimited frequency acquisition range and improved input duty-cycle error tolerance," *IEEE J Solid-State Circuits*, vol. 46, no. 12, pp. 3150-3162, Dec. 2011. - [21] M. H. Perrott, et al., "A 2.5-Gb/s multi-rate 0.25-μm CMOS clock and data recovery circuit utilizing a hybrid analog/digital loop filter and all-digital referenceless frequency acquisition," *IEEE J. of Solid-State Circuits*, vol. 41, no. 12, pp. 2930-2944, Dec. 2006. - [22] G. Shu, et al., "A 4-to-10.5 Gb/s continuous-rate digital clock and data recovery with automatic frequency acquisition," *IEEE J. Solid-State Circuits*, vol. 51, no. 2, pp. 428-439, Feb. 2016. - [23] W. Rahman, et al., "A 22.5-to-32-Gb/s 3.2-pJ/b referenceless baud-rate digital CDR with DFE and CTLE in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3517-3531, Dec. 2017. - [24] C. Yu, et al., "A 6.5–12.5-Gb/s half-rate single-loop all-digital referenceless CDR in 28-nm CMOS," *IEEE JSSC*, vol. 55, no. 10, pp. 2831-2841, Oct. 2020. - [25] J. Yoon, et al., "A DC-to-12.5 Gb/s 9.76 mW/Gb/s all-rate CDR with a single LC VCO in 90 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 3, pp. 856-866, Mar. 2017. - [26] K. Park, et al., "A 27.1 mW, 7.5-to-11.1 Gb/s single-loop referenceless CDR with direct Up/dn control," *IEEE Custom Integrated Circuits Conference (CICC)*, Austin, TX, May 2017. - [27] F. Chen, et al., "A 10-Gb/s low jitter single-loop clock and data recovery circuit with rotational phase frequency detector," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 61, no. 11, pp. 3278-3287, Nov. 2014. - [28] S. Lin, et al., "Full-rate bang-bang phase/frequency detectors for unilateral continuous-rate CDRs," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 55, no. 12, pp. 1214-1218, Dec. 2008. - [29] K. Park, et al., "A 6.7–11.2 Gb/s, 2.25 pJ/bit, single-loop referenceless CDR with multi-phase, oversampling PFD in 65-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 53, no. 10, pp. 2982-2993, Oct. 2018. - [30] R. Shivnaraine, et al., "An 8–11 Gb/s reference-less bang-bang CDR enabled by "phase reset," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 61, no. 7, pp. 2129-2138, Jul. 2014. - [31] K. Park, et al., "A 6.4-to-32Gb/s 0.96pJ/b referenceless CDR employing ML-inspired stochastic phase-frequency detection technique in 40nm CMOS," *ISSCC*, pp. 124-126, Feb. 2020. - [32] X. Zhao, et al., "A 0.0285mm<sup>2</sup> 0.68pJ/bit single-loop full-rate bang-bang CDR without reference and separate frequency detector achieving an 8.2(Gb/s)/μs acquisition speed of PAM-4 data in 28nm CMOS," *IEEE Custom Integrated Circuits Conference (CICC)*, Boston, MA, USA, Mar. 2020. - [33] X. Zhao, et al., "A 0.0285mm² 0.68pJ/bit single-loop full-rate bang-bang CDR without reference and separate FD pulling off an 8.2-Gb/s/μs acquisition speed of the PAM-4 input in 28-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 57, no. 2, pp. 546-561, Feb. 2022. - [34] X. Zhao, et al., "A sub-0.25-pJ/bit 47.6-to-58.8-Gb/s reference-less FD-less single-loop PAM-4 bang-bang CDR with a deliberate-current-mismatch frequency acquisition technique in 28-nm CMOS," *IEEE Journal of Solid-State Circuits*, vol. 57, no. 5, pp. 1358-1371, May 2022. - [35] X. Zhao, et al., "A sub-0.25pJ/bit 47.6-to-58.8Gb/s reference-less FD-less single-loop PAM-4 bang-bang CDR with a deliberately-current-mismatch frequency acquisition technique in 28nm CMOS," 2021 IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2021, pp. 131-134. - [36] X. Zhao, et al., "A 0.14-to-0.29-pJ/bit 14-GBaud/s trimodal (NRZ/PAM-4/PAM-8) half-rate bang-bang clock and data recovery circuit (BBCDR) in 28-nm CMOS," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 68, no. 1, pp. 89-102, Jan. 2021. - [37] Shenggao Li, et al., "A 10-GHz CMOS quadrature LC-VCO for multirate optical applications," *IEEE J. Solid-State Circuits*, vol. 38, no. 10, pp. 1626-1634, Oct. 2003. - [38] K. Yamaguchi, et al., "A 2.5-GHz four-phase clock generator with scalable no-feedback-loop architecture," *IEEE J. Solid-State Circuits*, vol. 36, no. 11, pp. 1666-1672, Nov. 2001. - [39] M.-S. Hwang, et al., "Reduction of pump current mismatch in charge-pump PLL," *Electronics Letters*, vol. 45, no. 3, pp. 135–136, Jan 2009 - [40] S. Huang, et al., "An 8.2 Gb/s-to-10.3 Gb/s full-rate linear referenceless CDR without frequency detector in 0.18 μm CMOS," *IEEE J. Solid-State Circuits*, vol. 50, no. 9, pp. 2048-2060, Sep. 2015. - [41] D. Schinkel, et al., "A double-tail latch-type voltage sense amplifier with 18ps setup+hold time," *IEEE International Solid-State Circuits Conference. Digest of Technical Papers*, San Francisco, CA, Feb. 2007. - [42] T. Masuda et al., "A 12 Gb/s 0.9 mW/Gb/s wide-bandwidth injection-type CDR in 28 nm CMOS with reference-free frequency capture," *IEEE J. Solid-State Circuits*, vol. 51, no. 12, pp. 3204-3215, Dec. 2016. - [43] A. Tharayil Narayanan, et al., "A fractional-N sub-sampling PLL using a pipelined phase-interpolator with an FoM of -250dB," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 7, pp. 1630-1640, July 2016. - [44] G. Wu, et al., "A 1–16 Gb/s all-digital clock and data recovery with a wideband high-linearity phase interpolator," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 24, no. 7, pp. 2511-2520, July 2016. - [45] J. Lin, et al., "A 5-bit phase-interpolator-based fractional-N frequency divider for digital phase-locked loops," 2017 IEEE International Symposium on Circuits and Systems (ISCAS), 2017, pp. 1-4. - [46] A. Goyal, et al., "A high-resolution digital phase interpolator based CDR with a half-rate hybrid phase detector," 2019 IEEE International Symposium on Circuits and Systems (ISCAS), 2019, pp. 1-5. - [47] Y.-H, et al., "A phase-interpolator-based fractional counter for all-digital fractional-N phase-locked loop," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 64, no. 3, pp. 249-253, March 2017. - [48] R. Kreienkamp, et al., "A 10-Gb/s CMOS clock and data recovery circuit with an analog phase interpolator," *IEEE J. Solid-State Circuits*, vol. 40, no. 3, pp. 736-743, March 2005. - [49] A. Balachandran, et al., "A 32-Gb/s 3.53-mW/Gb/s adaptive receiver AFE employing a hybrid CTLE plus LFEQ, edge-DFE and merged data-DFE/CDR in 65-nm CMOS," *IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)*, pp. 221-224, Nov. 2019 - [50] Q. Liao, et al., "A 50-Gb/s PAM-4 Silicon-Photonic Transmitter Incorporating Lumped-Segment MZM, Distributed CMOS Driver, and Integrated CDR,", IEEE Journal of Solid-State Circuits vol. 57, no. 3, pp. 767-780, March 2022. - [51] M. Zhong, et al., "A 4x25Gb/s Serializer with Integrated CDR and 3-Tap FFE Driver for NIC Optical Interconnects," *IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)*, pp. 255-256, Nov. 2021. - [52] M. You, et al., "A 4×25Gb/s De-Serializer with Baud-Rate Sampling CDR and Standing-Wave Clock Distribution for NIC Optical Interconnects," *IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)*, pp. 253-254, Nov. 2021. - [53] B. Abiri, et al., "A 1-to-6Gb/s phase-interpolator-based burst-mode CDR in 65nm CMOS," 2011 IEEE International Solid-State Circuits Conference, 2011, pp. 154-156. - [54] H. -J. Jeon, et al., "A bang-bang clock and data recovery using mixed mode adaptive loop gain strategy," *IEEE Journal of Solid-State Circuits*, vol. 48, no. 6, pp. 1398-1415, June 2013. - [55] X. Ge, et al., "Analysis and verification of jitter in bang-bang clock and data recovery circuit with a second-order loop filter," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 27, no. 10, pp. 2223-2236, Oct. 2019. - [56] Jri Lee, et al., "Modeling of jitter in bang-bang clock and data recovery circuits," *Proceedings of the IEEE 2003 Custom Integrated Circuits Conference*, 2003., 2003, pp. 711-714. - [57] F. A. Musa, et al., "Modeling and design of multilevel bang-bang CDRs in the presence of ISI and noise," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 54, no. 10, pp. 2137-2147, Oct. 2007. - [58] X. Zhao, et al., "A 0.14-to-0.29-pJ/bit 14-GBaud/s trimodal (NRZ/ PAM-4/PAM-8) half-rate bang-bang clock and data recovery circuit (BBCDR) in 28-nm CMOS," IEEE Proc. Asia Pacific Conference on Circuits and Systems (APCCAS), Bangkok, Thailand, pp. 229-232, Nov. 2019. - [59] L. -H. Chiueh, et al., "A 6-Gb/s adaptive-loop-bandwidth clock and data recovery (CDR) circuit," 2014 IEEE Asian Solid-State Circuits Conference (A-SSCC), 2014, pp. 289-292. - [60] H. -R. Kim, et al., "A 6.4–11 Gb/s wide-range referenceless single-loop CDR with adaptive JTOL," *IEEE LSSC*, vol. 3, pp. 470-473, Sep. 2020.