# A 750 MHz semi-digital clock and data recovery circuit with 10<sup>-12</sup> BER

Wei Xueming(韦雪明)<sup>†</sup>, Wang Yiwen(王忆文), Li Ping(李平)<sup>†</sup>, and Luo Heping(罗和平)

State Key Laboratory of Electronic Thin Films and Integrated Devices, University of Electronic Science & Technology of China, Chengdu 610054, China

**Abstract:** A semi-digital clock and data recovery (CDR) is presented. In order to lower CDR trace jitter and decrease loop latency, an average-based phase detection algorithm is adopted and realized with a novel circuit. Implemented in a 0.13  $\mu$ m standard 1P8M CMOS process, our CDR is integrated into a high speed serial and de-serial (SERDES) chip. Measurement results of the chip show that the CDR can trace the phase of the input data well and the RMS jitter of the recovery clock in the observation pin is 122 ps at 75 MHz clock frequency, while the bit error rate of the recovery data is less than  $10 \times 10^{-12}$ .

**Key words:** clock and data recovery; interpolator; SERDES **DOI:** 10.1088/1674-4926/32/12/125009 **EEACC:** 1280

## 1. Introduction

High speed serial and de-serial (SERDES) data transmission applications such as optical communications systems, backplane data-link routing and chip-to-chip interconnection have become popular in recent years [1, 2]. In a SERDES application, clock information is hidden in serial transmitted data, so the clock and data recovery circuit (CDR) is the pivotal block. It extracts the transmitted data sequence from the distorted received signal and recovers the associated clock timing information. To sample the received data correctly, the CDR adjusts the phase of the local clock to retime the received data according to the phase error between the data and the local clock. To meet the requirement of the bit error rate (BER), the phase of the local clock must align in the center of the data. Many research studies have proposed different clock alignments [3-6]. with phase interpolator (PI) being one of the widely adopted circuits<sup>[5, 6]</sup>. Jitter tolerance refers to the maximum amplitude of jitter that can be tolerated without causing data recovery errors. To improve the jitter tolerance of the CDR, one practice is to reduce the jitter of the sampling clock of the CDR and to reduce the effect of noise in the input samples<sup>[7]</sup>. In this paper, a semi-digital interpolate-based CDR is presented and implemented, in which an average-based phase detection algorithm is adopted to lower the trace jitter of the CDR and reduce the effect of noise in the input samples. Although both the algorithms are average-based, the scale of the average-circuit in this paper is smaller than that described in Ref. [7].

## 2. Clock and data recovery system

The CDR is composed of two loops. One is a phase locked loop (PLL) which is designed as a clock generator to produce the local clock. The PLL generates the four-channel phase clock signal whose phases are spaced at 90°. Its detail is presented in Ref. [8] and will not be discussed in this paper. Another loop is the phase calibration (PC), which is composed of a phase detector (PD) and a phase interpolator (PI). It mainly adjusts the phase of the local clock by comparing the phase between the sampling clock and the input data. To sample the data correctly, the PI selects two of the four clocks and combines its phase according to the control signals that come from the shift register. Finally, the data is sampled by the output clock of interpolator.

As shown in Fig. 1, the input serial data (RX\_DATA) are sampled by the sampling clock (SCLK) in the sampling circuit (SA). The PD detects the transition of the output data of the SA. It judges whether the phase of SCLK is earlier than that of the data or not. If the phase of SCLK is earlier than that of the received data, detection signal UP is '1' and DOWN is '0';



Fig. 1. Block diagram of the CDR.

© 2011 Chinese Institute of Electronics

<sup>†</sup> Corresponding author. Email: scuweixue@gmail.com, pli@uestc.edu.cn Received 22 May 2011, revised manuscript received 22 July 2011



Fig. 2. Circuit schematic of the sense amplifier based flip-flop.

otherwise, UP is '0' and DOWN is '1'. The 4 : 2 multiplexer selects the local clock to be interpolated and the weight of the interpolator is determined by the output of the shift register.

## 3. Clock and data recovery design

#### 3.1. Sampling circuit design

To sample high speed data, a sense amplifier based flipflop (FF)<sup>[9]</sup> is designed as a sampling circuit. The circuit schematic is shown in Fig. 2. To implement FF in optimal speed and power consumptions, the nodes, N1 and N2, must have a little parasitic capacitance.

#### 3.2. Phase detector and the phase interpolator design

To meet the requirement of the BER, the PD and the PI must be designed carefully.

The Alexander phase detector is designed to detect the phase state between the local clock and the input data<sup>[10]</sup>. Its principle is shown in Fig. 3(a). Utilizing three data samples taken by three consecutive clock edges, the Alexander PD can determine whether a data transition is presented, and whether the clock is earlier or later than the data. In the absence of data transitions, all three samples are equal and no action is taken. If the clock is early, the first sample, S1, is unequal to the last two. Conversely, if the clock is late, the first two samples, S1

and S2, are equal but unequal to the last sample, S3. Thus, S1  $\oplus$  S2 and S2  $\oplus$  S3 provide the early-late information. Suppose *T* is the result of the phase detection in each transition.

If S1  $\oplus$  S2 is high and S2  $\oplus$  S3 is low, the sampling clock is late, and T = 0.

If S1  $\oplus$  S2 is low and S2  $\oplus$  S3 is high, the sampling clock is early, and T = 1.

If S1  $\oplus$  S2 is equal to S2  $\oplus$  S3, no data transition is presented, and T = 0.

To lower the frequency of the sampling clock, the CDR uses quadrature clocks for half-rate phase detection, as shown in Fig. 3(b). It is noted that the high speed data would lead to high speed phase detection results if the PD only detects phase state of 1 bit data, which increases the trace jitter of the CDR<sup>[7]</sup>. For this reason, all the phase detection results are averaged before they flow into the shift registers. During the period of the decision, the PD continuously samples the 8 bit data with quadrature clocks. Then, the detection results are averaged and held until the next detection process. Because it must use seven slices to sample the 8-bit data, the drawback of this design is higher layout area cost.

The PD circuit topology is shown in Fig. 3(b). If T[1], T[2], T[3], T[4], T[5], T[6] and T[7] are the results of the phase detection in the transition of each bit, respectively. The decision algorithm of the phase detection is presented as follows.

If 
$$\sum_{k=1}^{7} T[k] > \sum_{k=1}^{7} \overline{T[k]}$$
, then UP = 1, DOWN = 0;  
If  $\sum_{k=1}^{7} T[k] < \sum_{k=1}^{7} \overline{T[k]}$ , then UP = 0, DOWN = 1.

To improve the detection speed and lower the loop latency, a novel average circuit is designed to implement the averagebased decision algorithm. The topology is shown in Fig. 3(c). The average circuit is composed of twenty-two 2:1 multiplexer. Each multiplexer has three input ports (A, B and D) and one output port (C). In the average circuit, the select signal of the multiplexer is the phase detection result T[k]. The maximum delay of the circuit is 7 times of the multiplexer.

In order to trace the jitter of the data, the trace-velocity of the CDR ( $T_{CDR}$ ) must be larger than the drift-velocity of the jitter ( $D_J$ ). The relationship between jitter and data transfer rate is simplified as a proportion and it is presented by

$$D_{\rm J}/D_{\rm JBW} = T_{\rm J}/T_{\rm JBW},\tag{1}$$

where  $D_{\text{JBW}}$ ,  $T_{\text{J}}$  and  $T_{\text{JBW}}$  are the jitter data transfer rate, the total jitter of the data and the data transfer rate, respectively. For the trace-velocity of the digital CDR, it is presented by

$$T_{\rm CDR} = \alpha / N_{\rm D}, \tag{2}$$

where  $\alpha$  is the probability of the detected time versus hold time, and  $N_{\rm D}$  is the total number of the data sequences during the period the phase is detected. The smaller  $N_{\rm D}$  is, the faster  $T_{\rm CDR}$ is. To trace the jitter, it must satisfy the relation by

$$T_{\rm CDR} \ge D_{\rm J}.$$
 (3)

According to Eqs. (1)–(3), it can be shown that



Fig. 3. Phase detection principle and its diagram block. (a) Principle of the Alexander phase detector. (b) Half-rate phase detection block. (c) Schematic of the average circuit.

$$N_{\rm D} \leqslant \alpha T_{\rm JBW} / T_{\rm J} D_{\rm JBW}. \tag{4}$$

Although  $\alpha$  shall be designed as small as possible to decrease the jitter of the CDR,  $\alpha$  too small will reduce the detection accuracy.

Based the requirement of the design, there are specifications such as  $T_{\rm JBW} = 1500$  Mbit/s,  $T_{\rm J} = 0.6$  UI and  $D_{\rm JBW} = 2$  Mbit/s, then

$$N_{\rm D} \leqslant 1250\alpha.$$
 (5)

If the period of the decision of the phase state is  $T_D$  and the shift register is  $N_S$  bits, then

$$T_{\rm D}N_{\rm S} \le 1250\alpha. \tag{6}$$

As analyzed above, it is also needed to consider other reasons such as the tolerable peak-to-peak jitter and layout area, the CDR start to detect the phase state per 16 clock cycles to make  $\alpha$  being 0.33, and the 16-bits shift register is designed.

As shown in Fig. 1, the weight of the phase interpolator is the output of the 16-bits shift register. The inputs of the shift register are the up and down signals which are produced by the PD. The shift-register shifts '1' in on the left and '0' in on the right. Control logic is sensed when the shift register contains all 1 or all 0. This control logic will then produce the control signal, CS, which will select the two local clocks being interpolated, and then the shift register continues to shift in the opposite direction.

A type-I analog interpolator<sup>[6]</sup> is designed so that the interpolated clock inputs can be shared among all 16 shift bits, as shown in Fig. 4. The output voltage of the interpolator,  $v_{out}$ , can be represented by

$$v_{\text{out}} = R\left(\sum_{k=1}^{16} i_{\phi} \cdot \text{SR}[k] + \sum_{k=1}^{16} i_{\varphi} \cdot \overline{\text{SR}[k]}\right).$$
(7)

The phase of the current,  $i_R$ , in each branch is dominated by the clock phase, respectively. Then the phase of the output voltage,  $v_{out}$ , is controlled by the shift register signal SR and the phase of the input clock. The phase changes of the branch current lead to the phase of the output voltage,  $v_{out}$ , variety.

Using HSPICE simulator, the simulation results of the interpolator are obtained and shown in Fig. 5. The simulations are from the extracted layout and the output results of the clocks are



Fig. 4. Circuit schematic of the phase interpolator.



Fig. 5. Simulation results of the clock phase interpolation.

shaped by the buffer. The clocks to be interpolated are  $\phi$  and  $\varphi$ , and the light grey lines are the 16 phase steps between the interpolated clocks. The simulation results show that the largest step of 209 ps is 28 ps and the smallest step is 9 ps. The phase of the sample clock (SCLK) will bounce back and forth between two adjacent steps when the CDR is locked. Therefore, the largest phase step determines the maximum peak-to-peak jitter of the CDR.

### 4. Test results and discussion

The CDR is implemented in TSMC 0.13  $\mu$ m 1P8M CMOS process, and the supply voltage of the CDR is 1.5 V. The die micrograph is shown in Fig. 6.

The frequency of the sampling clock in CDR is 750 MHz. To observe it, the frequency of the sampling clock is reduced by 10X and observed from the outside clock observation pin. The waveform of the recovery clock is shown in Fig. 7. Its jitter RMS is 122 ps while being locked. While in lock, the optimal sampling point of the recovery clock should be located in the center of the recovery data. Figure 8 shows the waveforms of the recovery clock and one of the 10-channels parallel data. From Fig. 8, it can be seen that the clock samples the center



Fig. 6. Die micrograph of the chip.



Fig. 7. Waveform of the recovery clock.



Fig. 8. Waveforms of the recovery data and clock.

point of the data. It shows that the CDR work well.

The bit error rate (BER) of chip is tested by entering different data to the SERDES chip. After on-chip 8B/10B encoding<sup>[11]</sup> and serializing, the input serial data of the CDR are random stream and are recovered by the CDR. The bit error rate of the recovery data is less than  $10^{-12}$ .

The transfer curve of the PI is shown in Fig. 9. In the test



Fig. 9. Test results of the transfer curve of the phase interpolator.

Table 1. Comparison with other work.

| Parameter  | Technology | Transmission | Frequency toler- |
|------------|------------|--------------|------------------|
|            | (µm)       | rate (Gb/s)  | ance (ppm)       |
| Ref. [12]  | 0.13       | 0.4-4        | 400              |
| Ref. [13]  | 0.09       | 1-4.25       | 440              |
| This paper | 0.13       | 0.4-1.5      | 600              |

mode, the control signal (SR) of the interpolator is set manually. If the SR is "000000000000000", the phase of the recovery clock is 0°. In Fig. 9, x-axis is the 16-bits control signal of the interpolator. The numbers of '1' are increased gradually until reaching the max. The transfer curve shows that the phase interpolator has good phase-shift characteristic.

Table 1 lists the frequency tolerance comparison between other papers and this article. The CDR possesses good frequency tolerance.

#### 5. Conclusions

A semi-digital interpolate-based clock and data recovery (CDR) has been designed. To decrease the trace jitter of the CDR and avoid the interference of the input noise, a novel implementation of the average-based phase detection algorithm is proposed. Test results of the chip show the interpolator has good phase-shift performance and low-jitter performance. The CDR can trace the phase of the input data. While in lock state, the jitter RMS of the recovery clock in observation pin is 122 ps at 75 MHz clock frequency. The bit error rate of the recovery data is less than  $10^{-12}$ .

#### References

- Harwood M, Warke N, Simpson R, et al. A 12.5 Gb/s SerDes in 65 nm CMOS using a baud-rate ADC with digital receiver equalization and clock recovery. IEEE International Solid-State Circuits Conference, 2007: 436
- [2] Sorna M, Beukema T, Selander K, et al. A 6.4 Gb/s CMOS SerDes core with feedforward and decision-feedback equalization. IEEE International Solid-State Circuits Conference, 2005: 62
- [3] Nakagawa J, Nogami M, Suzuki N, et al. 10.3-Gb/s burst-mode 3R receiver incorporating full AGC optical receiver and 82.5-GS/s over-sampling CDR for 10G-EPON systems. IEEE Photonics Technol Lett, 2010, 22(7): 471
- [4] Maruko K, Sugioka T, Hayashi H, et al. A 1.296-to-5.184 Gb/s transceiver with 2.4 mW/(Gb/s) burst-mode CDR using dualedge injection-locked oscillator. IEEE International Solid-State Circuits Conference, 2010: 364
- [5] Seong C K, Lee S W, Choi W Y. A 1.25 Gb/s digitally-controlled dual-loop clock and data recovery circuit with an improved effective phase resolution. IEEE International Symposium on Circuits and Systems, 2006: 2113
- [6] Sidiropoulos S, Horowitz M A. A semi-digital dual delay-locked loop. IEEE J Solid-State Circuits, 1997, 32(11): 1683
- [7] Van Ierssel M, Sheikholeslami A, Tamura H, et al. A 3.2 Gb/s CDR using semi-blind oversampling to achieve high jitter tolerance. IEEE J Solid-State Circuits, 2007, 42(10): 2224
- [8] Wei X M, Li P. The self-biased based PLL with fast lock circuit. International Conference on Communications, Circuits and Systems, ICCCAS, 2010: 901
- [9] Nikolic B, Oklobdzija V G, Jia W Y, et al. Improved senseamplifier-based flip-flop: design and measurements. IEEE J Solid-State Circuits, 2000, 35(6): 876
- [10] Razavi B. Challenges in the design of high-speed clock and data recovery circuits. IEEE Commun Mag, 2002, 40(8): 94
- [11] Kim Y W, Shin B, Kang J K. High-speed 8B/10B encoder design using a simplified coding table. IEICE Electronics Express, 2008, 5(16): 581
- [12] Chang K Y K, Wei J, Huang C. A 0.4–4-Gb/s CMOS Quad transceiver cell using on-chip regulated dual-loop PLLs. IEEE J Solid-State Circuits, 2003, 38(5): 747
- [13] Chen L D, Spagna F, Marzolf P, et al. A 90 nm 1–4.25 Gb/s multidata rate receiver for high speed serial links. IEEE Asia Solid-State Circuits Conference, 2006: 391