# A 2.5-Gb/s fully-integrated, low-power clock and recovery circuit in 0.18- $\mu$ m CMOS\*

Zhang Changchun(张长春), Wang Zhigong(王志功)<sup>†</sup>, Shi Si(施思), and Guo Yufeng(郭宇峰)

(Institute of RF- & OE-ICs, Southeast University, Nanjing 210096, China)

**Abstract:** Based on the devised system-level design methodology, a 2.5-Gb/s monolithic bang-bang phase-locked clock and data recovery (CDR) circuit has been designed and fabricated in SMIC's  $0.18-\mu$ m CMOS technology. The Pottbäcker phase frequency detector and a differential 4-stage inductorless ring VCO are adopted, where an additional current source is added to the VCO cell to improve the linearity of the VCO characteristic. The CDR has an active area of  $340 \times 440 \,\mu$ m<sup>2</sup>, and consumes a power of only about 60 mW from a 1.8 V supply voltage, with an input sensitivity of less than 25 mV, and an output single-ended swing of more than 300 mV. It has a pull-in range of 800 MHz, and a phase noise of -111.54 dBc/Hz at 10 kHz offset. The CDR works reliably at any input data rate between 1.8 Gb/s and 2.6 Gb/s without any need for reference clock, off-chip tuning, or external components.

**Key words:** clock and data recovery; phase frequency detector; voltage-controlled oscillator; bang-bang; jitter **DOI:** 10.1088/1674-4926/31/3/035007 **EEACC:** 1265

# 1. Introduction

With greater and greater demands on high-volume data transmission, more and more attention is being paid to both high-speed long-haul serial optical communications and shorthaul parallel optical interconnections, which are gaining rapid development. In all communication systems, a clock and data recovery (CDR) circuit is a pivotal building block.

Owing to the diversity of the application environment and demands, the rapid development of IC fabrication processes, mainly CMOS, and the advancement of design techniques, various kinds of CDR have been emerging. However, by and large, continuous-mode CDRs can be classified into three groups<sup>[1]</sup>: filter-type<sup>[3-5]</sup>, PLL-based<sup>[6-12]</sup>, and phase-picking<sup>[13]</sup>.

Among these, the PLL-based CDR, also called the tracking-type CDR, has been in wide use for gigabit data rates, because it has many advantages<sup>[14]</sup>, such as low power, high integration, and automatic phase alignment.

This paper discusses the design and fabrication of a 2.5 Gb/s monolithic PLL-based CDR in SMIC's 0.18- $\mu$ m CMOS technology. The system-level and circuit-level design methods are analyzed firstly. The measurement results are then given.

#### 2. System-level design

Basically, the core of the CDR circuit comprises a phase detector (PD), a voltage to current conversion (V/I) circuit, a loop filter (LF), and a VCO<sup>[16]</sup>. Among these, the choice of the PD has a critical effect on the system structure, jitter characteristics, CDR design procedure, the phase alignment precision, even the means of data recovery (DR), and so on.

There exist two basic types of PD: bang-bang (binary, or early/late) versus linear. In contrast to their linear counterparts,

bang-bang PDs have the unique advantages<sup>[14, 16–18]</sup> of inherent sampling phase alignment, adaptability to multi-phase sampling structures, operating at the highest speed at which a process can make a working flip-flop, avoiding the use of charge pumps owing to high gain, and so on, so a bang-bang PD is employed in this CDR, as shown in Fig. 1.

Unlike a linear PD which detects both the magnitude and the direction of the phase error, a bang-bang PD detects the direction only, so the classical linear control theory cannot be directly applied. Fortunately, due to realistic effects such as metastability and input jitter, the ideally binary characteristic of BBPDs in practice exhibits a finite slope across a narrow range of the input phase difference, where the slope rate, that is, the effective gain of the PD, is inversely proportional to input jitter amplitude<sup>[16, 18, 19]</sup>. Figure 1(a) indicates these effects on the bang-bang PD. Thus, small phase errors lead to linear operation whereas large phase errors introduce "slewing" in the loop<sup>[16]</sup>.

For the sake of compactness, the topology<sup>[16, 20, 21]</sup> where the "bang-bang branch (proportional branch)" and "integral branch" are implicitly combined is chosen over the  $one^{[14, 17, 22]}$  where a separate direct-drive path from the PD to the VCO is used.

As for the system-level design of a CDR, four jitter specifications<sup>[2, 11]</sup> are usually focused on: jitter transfer (JTRAN), jitter peaking (JPEAK), jitter tolerance (JTOL), and jitter generation (JGEN). According to Fig. 1(b), based on the largesignal piecewise-linear model of the bang-bang PD and relevant theories<sup>[16–18, 20–22]</sup>, some analytical equations are derived to estimate JTRAN, JTOL, JGEN, as follows:

$$BW_{JTRAN} \propto \Delta F_{bb} \times DF/JITTER_{pp}, \tag{1}$$

$$BW_{\rm JTOL} \propto \Delta F_{\rm bb} \times DF, \qquad (2)$$

\* Project supported by the National High Technology Research and Development Program of China (No. 2007AA01Z2a5) and the National Natural Science Foundation of China (No. 60806027).

† Corresponding author. Email: zgwang@seu.edu.cn

Received 7 September 2009, revised manuscript received 27 October 2009



Fig. 1. (a) Phase transfer characteristic of a bang-bang PD that exhibits metastability and input jitter. (b) Basic bang-bang CDR architecture.



Fig. 2. Circuit block diagram of the adopted bang-bang CDR.

$$JG_{PP} \propto m \times \Delta F_{bb},$$
 (3)

$$\Delta F_{\rm bb} = I_{\rm CP} \times R \times K_{\rm VCO},\tag{4}$$

$$\zeta \propto R \times C/m, \tag{5}$$

where BW<sub>JTRAN</sub> is the closed-loop JTRAN bandwidth, BW<sub>JTOL</sub> is the JTOL bandwidth, JG<sub>PP</sub> is the peak-to-peak jitter generation,  $\Delta F_{bb}$  is the bang-bang frequency step,  $\zeta$  is the loop stability factor, JITTER<sub>PP</sub> is the peak-to-peak input data jitter, DF is the transition density factor (0.5 for random data), *m* is the number of bit periods of latency around the loop, *I*<sub>CP</sub> is the saturated current of the V/I converter, *K*<sub>VCO</sub> is the VCO gain, *R* is the loop-filter resistor, and *C* is the loop-filter capacitor.

The desired overall target is as follows: low JTRAN bandwidth; small JPEAK; high JTOL bandwidth; and low JGEN<sup>[11]</sup>. So, from Eqs. (1)–(3), it can be found that a subtle trade-off between BW<sub>JTRAN</sub>, BW<sub>JTOL</sub> and JG<sub>PP</sub> must exist in determining  $\Delta F_{bb}$ . For example, a larger  $\Delta F_{bb}$  can improve the jitter tolerance but at the cost of degraded output jitter performance due to the increase of transferred and generated jitter.

As for JPEAK, it has been proven that the jitter transfer of bang-bang slew-limited CDR loops exhibits negligible peaking<sup>[16]</sup>. To ensure that the loop JTRAN function does not exhibit peaking,  $\zeta$  should be made large enough so that the proportional loop dominates during slew limiting (slope overload)<sup>[22]</sup>.

According to Eqs. (3) and (5), *m* should made as small as possible, because smaller latency, *m*, results in smaller jitter generation and enhances the stability of the loop. It also means that a ring VCO is preferable in the bang-bang CDR as compared to an LC VCO where the tuning delay is larger due to the usually higher Q-factor of the LC-tank<sup>[14, 18]</sup>.

Additionally, owing to the narrow loop bandwidth, the capture range of a CDR does not usually exceed a few percent of the data rate, which cannot guarantee that the loop comes to lock without any aid under PVT variations<sup>[2]</sup>. So, some aided acquisition measures<sup>[8, 11, 20]</sup> must be employed, otherwise manual off-chip tuning<sup>[18, 22]</sup> is needed.

## 3. Circuit-level design

As shown in Fig. 2, a digital frequency and phase detector, known as a digital quadratic-correlator, is employed, which can make the loop have a small loop bandwidth, a wide pull-in range, and operate without the need for a local reference clock and off-chip tuning<sup>[2,12]</sup>.

However, the PFD requires I/Q clocks. Mainly,



Fig. 3. Circuit diagram of (a) the PD and (b) tri-state selector.

there exist five methods to generate a quadrature  $clock^{[4, 8-10, 12, 15, 20, 22]}$ : (1) VCO with a  $\pi/2$  delay line; (2) Combination of VCO, polyphase-filter (or R–C C–R filter), and output (or limiters); (3) VCO at double frequency followed by /2 divider; (4) two cross-coupled LC VCOs; and (5) even-stage ring-like VCO. As a trade-off<sup>[15]</sup> between power consumption, area, operation frequency range, I/Q phase precision, phase noise, and so on, an even-stage inductorless ring VCO is selected.

At startup, the input data are compared with two I/Q clock signals, respectively, to produce two beat notes, which then are processed by the FD to deliver a frequency error signal. The generated error signal drives the VCO frequency towards the input data, relinquishing the control to the PD when the frequency error is sufficiently small. The PD then locks the VCO rise edge to the input data edge, and the VCO's down edge is used to retime the data in its center. In this circuit, the retiming is completed in a multiplexer (MUX).

# 3.1. Phase/frequency detector (PFD)<sup>[12]</sup>

It can be seen in Fig. 2 that the PFD is composed of a PD, a QPD and an FD. In fact, the PD, identical to the QPD, shown in Fig. 3(a), is a double-edge-triggered flip-flop (DETFF), which consists of two CML latches and a selector. The DETFF is chosen over a conventional DFF, because it can make use of both edges of the input data to offer more phase error correction information and improve the performance of the loop. The only difference in the FD compared to the PD is the selector which, as shown in Fig. 3(b), is modified to offer a ternary output.

When the frequency acquisition is completed, the FD remains in the third state in which both outputs are high-state. In this manner, the control of the FD over the loop is handed over to the PD, and then the track stage begins.

Another strong point of the PFD is that it can tolerate up to  $\pm 45^{\circ}$  I/Q match error<sup>[12]</sup>, which reduces the requirements on the VCO and raises the robustness of the CDR.

#### 3.2. VCO

As indicated in Fig. 2, a 4-stage inductorless ring VCO is adopted to generate the desired I/Q clocks. Due to the wider loop bandwidth, an inferior phase noise performance can be tolerated, and its inherent wider tuning bandwidth is desired by the bang-bang CDR.

Figure 4 shows a VCO cell in which a current-folding technique is used to alleviate the conflict between the voltage head-



Fig. 4. Circuit diagram of the VCO cell.



Fig. 5. Circuit diagram of V/I converter and loop filter.

room and the sensitivity of the  $VCO^{[2]}$ . The terminal (Vcon) controls the delay of the cell, then the frequency of the VCO, by tuning the intensity of the cross-coupled transistor pair (M3 and M4). PMOS transistors, instead of resistors, are used as the loads, because the latter are hard to precisely control during fabrication.  $V_1$  and  $V_2$  differentially control the currents flowing through the differential pair (M1 and M2) and the cross-coupled transistor pair (M3 and M4) by M5 and M6, respectively, so the total current through loads (M7 and M8) almost remains constant, and then the output voltage swing suffers from less variation across the tuning range. Two current sources  $(I_1 \text{ and } I_2)$  are added to prevent ceasing of the oscillation and improve the linearity of the VCO characteristic, respectively. In fact, the gate lengths of transistors such as M3, M4, M6, M9, M11 and M12 are finely lengthened for the sake of tuning linearity.

There are two reasons why a differential topology is adopted: one is the requirement for differential circuits throughout the CDR and the various advantages they have; another is the better duty cycle they can produce compared to a single-ended one, where the duty cycle has a critical effect on



Fig. 6. Some characteristic simulation waveforms for the CDR. (a)  $f_{int} < 2.5$  GHz. (b)  $f_{int} > 2.5$  GHz.

the performance of the CDR.

#### 3.3. V/I converter and loop filter (LF)

Figure 5 shows the implementation of the V/I converter and loop filter. Actually, the V/I converter is an adder with a singleended high-impedance output, where both the phase error signal and the frequency error signal from the PD and the FD are applied to two input ports, respectively, and an output signal is generated to tune the VCO by the loop filter. In order to reduce the channel-length modulation of M8 and M10, both the gate lengths and widths of the transistors are increased to minimize these effects.

The loop filter is entirely passive, consisting only of resistors and capacitors. The inherent low phase offset from the high gain of the bang-bang PD allows this simple filter, which consumes less power and provides better loop stability, since no higher order poles exist, as when an op-amp is used<sup>[6]</sup>.

Due to the low capacitance density of the MIM capacitor of less than 1 pF/ $\mu$ m<sup>2</sup>, NMOS transistors, about 8 pF/ $\mu$ m<sup>2</sup>, are used as capacitors, especially for the large capacitor C, shown in Fig. 5.

# 4. Simulation analysis

Figure 6 shows some characteristic simulation waveforms for the CDR when  $f_{\text{int}} < 2.5$  GHz and  $f_{\text{int}} > 2.5$  GHz ( $f_{\text{int}}$  is the initial frequency of the VCO). In both Figs. 6(a) and 6(b), and referring to Fig. 2, it can be seen that the top two waveforms ( $Q_{\text{PD}}$  and  $/Q_{\text{PD}}$ ) are the PD output; the third and fourth ( $Q_{\text{FD}}$  and  $/Q_{\text{FD}}$ ) are the FD output; the bottom one is the tuning voltage of the VCO.

According to Fig. 6, just as mentioned in the above sections, during the acquisition stage the FD works, and during the tracking stage both the FD outputs ( $Q_{FD}$  and  $/Q_{FD}$ ) remain high when only the PD operates.

## 5. Experimental results

The chip was designed and fabricated in SMIC 0.18  $\mu$ m CMOS technology. As shown in Fig. 7, the whole IC occupies an area of  $670 \times 760 \ \mu$ m<sup>2</sup>, with a CDR core area of  $340 \times 440 \ \mu$ m<sup>2</sup>.

The performance of the CDR was evaluated on-wafer by a Cascade probe station. Mainly, an Advantest D3186 pulse

760 μm



Fig. 7. Chip photograph of the whole IC.



Fig. 8. Measured VCO tuning characteristic curve.

pattern generator, an Agilent 86100A Infinium DCA widebandwidth oscilloscope, and an E4440a digital spectrum analyzer were employed.

Figure 8 depicts the measured VCO tuning characteristic curve. The VCO achieves a tuning range of more than 800 MHz (> 32%). Although, due to parasitics and imprecise device models, the measured mid-frequency is about 400 MHz lower than predicted by simulation, the desired 2.5 GHz is included in the range all the same.

Figure 9 shows the measured waveform, phase noise, and

#### Zhang Changchun et al.



Fig. 9. Measured (a) waveform, (b) phase noise, and (c) spectrum of the 2.5 GHz recovered clock.

| Table 1. Performance comparison of previously published 2.5 Gb/s CDRs. |                  |                    |                  |                    |                  |                    |
|------------------------------------------------------------------------|------------------|--------------------|------------------|--------------------|------------------|--------------------|
| Parameter                                                              | This work        | Ref. [8]           | Ref. [9]         | Ref. [5]           | Ref. [10]        | Ref. [7]           |
| Process                                                                | 0.18 <b>-</b> μm | 0.18 <b>-</b> µm   | 0.25-µm          | 0.25 <b>-</b> µm   | 0.18 <b>-</b> µm | Si bipolar         |
|                                                                        | CMOS             | CMOS               | CMOS             | CMOS               | CMOS             |                    |
| Off-chip capacitor                                                     | Ν                | Y                  | Y                | Y                  | Ν                | Y                  |
| Off-chip tuning                                                        | Ν                | Ν                  | Y                | Y                  | _                | Ν                  |
| Reference clock                                                        | Ν                | Ν                  | Ν                | Ν                  | Ν                | Y                  |
| $P_{\rm diss}(\rm mW)$                                                 | 60               | 26.1               | 550              | 680                | 120              | 800                |
| Area $(\mu m^2)$                                                       | $670 \times 760$ | $2400 \times 2400$ | $970 \times 970$ | $1490 \times 1000$ | $675 \times 875$ | $2500 \times 2500$ |
| Pull-in range (MHz)                                                    | 800              | 220                | 80               | 40                 | 250              | 400                |
| Phase noise (dBc/Hz)                                                   | –111.5 @ 10 kHz  | –100 @ 1 MHz       | –106 @ 100 kHz   | –110 @ 100 kHz     | –111 @ 10 kHz    | _                  |
|                                                                        | –117.45 @ 1 MHz  |                    |                  |                    |                  |                    |

Table 1 Derformen as a menorizing of manipulation with the d 2.5 Ch/a CDD



Fig. 10. Measured 5 Gb/s multiplexed eye diagram when two 2.5 Gb/s data streams were applied.

spectrum of the 2.5 GHz recovered clock in response to a 2.5 GHz pseudorandom bit sequence (PRBS) of length  $2^{31} - 1$ , from a 1.8 V supply. From Fig. 9, it can be found that the recovered clock has an RMS jitter of 3.69 ps, a duty cycle of 49.6%, a phase noise of -111.54 dBc/Hz at 10 kHz offset ( -117.45 dBc/Hz at 1 MHz offset), and so on.

Figure 10 shows the measured 5 Gb/s multiplexed singleended eye diagram with an RMS jitter of 6.95 ps when two different 2.5 Gb/s data streams were applied to two input ports of the IC, and the data recovery function was embedded in the MUX. Actually, if a single data stream was applied to both ports at the same time, the MUX could serve as a DR unit, for example, as shown in Fig. 11, in which only one 2.0 Gb/s data stream was applied.

The measurement results show that under a 1.8 V supply,



Fig. 11. Measured eye diagrams of recovered clock and data when one 2.0 Gb/s data stream was applied to two input ports at the same time.

the pull-in range of the CDR is actually limited by the available operating range of the VCO, that is, approximately between 1.8 and 2.6 GHz. In other words, the CDR can work properly with any input data rate between 1.8 and 2.6 Gb/s. Two examples are shown in Figs. 9 and 11.

Under a 1.8 V supply, the whole IC consumes a power of about 112 mW, of which about 53% is taken up by the CDR with relevant buffers. The CDR has an input sensitivity of less than 25 mV, and an output single-ended swing of more than 300 mV. Even when the voltage was reduced to 1.45 V, the CDR could work properly. From the 1.45 V supply, the CDR only consumed 23 mW.

A comparison with results from similar, previously published studies is tabulated in Table 1. According to Table 1, it can be seen that this CDR has a better performance.

#### 6. Conclusions

A system-level design methodology for bang-bang CDRs has been devised from relevant developing design theories and practices. Based on the design methodology, a 2.5 Gb/s monolithic PLL-based bang-bang CDR has been designed and fabricated in SMIC's 0.18- $\mu$ m CMOS technology.

The circuit choice and principle of the CDR and its building blocks are discussed and analyzed in detail. A Pottbäcker PFD was selected for a wider acquisition range. A differential 4stage inductorless ring VCO was used, because of its better duty, smaller tuning delay, smaller area cost, and so on. Also, an additional current source was added to the VCO cell to improve the linearity of the VCO characteristic.

The CDR has an active area of  $340 \times 440 \ \mu m^2$ , and consumes a power of about 60 mW from a 1.8 V supply voltage, with an input sensitivity of less than 25 mV, and an output single-ended swing of above 300 mV. Actually, the CDR can work properly under a 1.45 V supply, with a power consumption of only about 23 mW.

The CDR works reliably at any input data rate between 1.8 Gb/s and 2.6 Gb/s without any need for a reference clock, offchip tuning, or external components. It has a phase noise of -111.54 dBc/Hz at 10 kHz offset at 2.5 Gb/s data input.

#### References

- Ahmed S I, Kwasniewski T A. Overview of oversampling clock and data recovery circuits. Electrical and Computer Engineering, 2005, 5: 1876
- [2] Razavi B. Design of integrated circuits for optical communicatons. New York: McGraw-Hill, 2003
- [3] Wang Zhigong. MultiGbits/s data regeneration and clock recovery IC design. Annals of Telecommunications, 1993, 48(3): 132
- [4] Andrea P, Francesco C, Alessandro T, et al. A monolithic GaAs clock and data recovery circuit for 2.5 Gb/s NRZ data steam. 5th European Gallium Arsenide and Related III–V Compounds Applications Symposium, 1997, 9: 263
- [5] Wang Huan, Wang Zhigong, Feng Jun, et al. 2.488 Gbit/s clock and data recovery circuit in 0.35 μm CMOS. Journal of Southeast University, 2006, 22(6): 143
- [6] Gutierrez G, Shyang K. Unaided 2.5 Gb/s silicon bipolar clock and data recovery IC. IEEE Radio Frequency Integrated Circuits Symposium, 1998: 173
- [7] Gutierrez G, Shyang K, Bruce C. 2.488 Gb/s silicon bipolar clock and data recovery IC for SONET (OC-48). IEEE Custom Inte grated Circuits Conference, 1998: 575

- [8] Raja M K, Yan D L, Ajjikuttira A B, et al. A 1.4-psec jitter 2.5-Gb/s CDR with wide acquisition range in 0.18-μm CMOS. ESS-CIRC, 2007, 9: 524
- [9] Chen Yingmei, Wang Zhigong, Xiong Mingzhen, et al. 2.5 Gb/s monolithic IC of clock recovery, data decision, and 1:4 demultiplexer. Chinese Journal of Semiconductors, 2005, 26(8): 74
- [10] Liu Yongwang, Wang Zhigong, Li Wei. 2.5 Gb/s 0.18  $\mu$ m CMOS clock and data recovery circuit. Chinese Journal of Semiconductors, 2007, 28(4): 537
- [11] Declan D, Kwet C, Eric E, et al. A 12.5-Mb/s to 2.7-Gb/s continuous-rate CDR with automatic frequency acquisition and data-rate readback. IEEE J Solid-State Circuits, 2005, 40(11): 2713
- [12] Pottbäcker A, Langmann U. An 8 GHz silicon bipolar clockrecovery and data-regenerator IC. IEEE J Solid-State Circuits, 1994, 29(12): 1572
- [13] Yang C K, Horowitz M A. A 0.8-μm CMOS 2.5 Gb/s oversampling receiver and transmitter for serial links. IEEE J Solid-State Circuits, 1996, 31(12): 2015
- [14] Walker R C. Designing bang-bang PLLs for clock and data recovery in serial data transmission systems. In phase-locking in high performance systems-from devices to architectures. New York: IEEE Press, 2003
- [15] Tiebout M. Low-power low-phase-noise differentially tuned quadrature VCO design in standard CMOS. IEEE J Solid-State Circuits, 2001, 36(7): 1018
- [16] Jri L, Kenneth S K, Razavi B, et al. Analysis and modeling of bang-bang clock and data recovery circuits. IEEE J Solid-State Circuits, 2004, 39(9): 1571
- [17] Chen T S. A 10 Gb/s half-rate clock and data recovery circuit with direct bang-bang tuning. IEEE International Workshop on Radiofrequency Integration Technology, 2005, 11: 57
- [18] Greshishchev Y M, Schvan P. A fully integrated SiGe receiver IC for 10-Gb/s data rate. IEEE J Solid-State Circuits, 2000, 35(12): 1949
- [19] Lee B J, Hwang M S, Lee S H, et al. A 2.5-10-Gb/s CMOS transceiver with alternating edge-sampling phase detection for loop characteristic stabilization. IEEE J Solid-State Circuits, 2003, 38(11): 1821
- [20] Adrian O, Saied B, Jason C, et al. A 40-43-Gb/s clock and data recovery IC with integrated SFI-5 1:16 demultiplexer in SiGe technology. IEEE J Solid-State Circuits, 2003, 38(12): 2155
- [21] Lindor H, David S, Uno N, et al. Low-power fully integrated 10-Gb/s SONET/SDH transceiver in 0.13-μm CMOS. IEEE J Solid-State Circuits, 2003, 38(10): 1595
- [22] Rogers J E, Long J R. A 10 Gb/s CDR/DEMUX with LC delay line VCO in 0.18 μm CMOS. IEEE J Solid-State Circuits, 2002, 37(5): 1781