# 5-Gb/s 0.18-µm CMOS 2:1 multiplexer with integrated clock extraction\*

Zhang Changchun(张长春)<sup>1</sup>, Wang Zhigong(王志功)<sup>1,†</sup>, Shi Si(施思)<sup>1</sup>, Miao Peng(苗澎)<sup>1</sup>, and Tian Ling(田玲)<sup>2</sup>

(1 Institute of RF- & OE-ICs, Southeast University, Nanjing 210096, China) (2 School of Science and Engineering, Southeast University, Nanjing 210096, China)

**Abstract:** A 5-Gb/s 2:1 MUX (multiplexer) with an on-chip integrated clock extraction circuit which possesses the function of automatic phase alignment (APA), has been designed and fabricated in SMIC's 0.18  $\mu$ m CMOS technology. The chip area is 670 × 780  $\mu$ m<sup>2</sup>. At a single supply voltage of 1.8 V, the total power consumption is 112 mW with an input sensitivity of less than 50 mV and an output single-ended swing of above 300 mV. The measurement results show that the IC can work reliably at any input data rate between 1.8 and 2.6 Gb/s with no need for external components, reference clock, or phase alignment between data and clock. It can be used in a parallel optic-fiber data interconnecting system.

Key words: multiplexer; clock extraction; automatic phase alignment; phase frequency detector; voltage-controlled oscillator

**DOI:** 10.1088/1674-4926/30/9/095009 **EEACC:** 1265

### 1. Introduction

In the field of ultra-high-speed IC designs, parallel optical links have become a new research hotspot instead of serial long-haul optical communications, because of their superior advantages over copper interconnects such as longer link distance, increased bandwidth density, smaller cables and connectors, less susceptibility to electromagnetic interference, and potential lower power dissipation<sup>[1,2]</sup>. Moreover, in considerations of fabrication cost, power consumption, integration scale, and compatibility, continuous down-scaling CMOS processes are the preferred candidates over other traditional high-speed processes such as GaAs, InP and SiGe.

Just as in serial optical communications, multiplexers are one of the critical blocks in parallel optical links. However, ordinary multiplexers are not accompanied by CE circuits with the function of APA<sup>[3–7]</sup>, so the clock signal with specified amplitude and frequency must be externally provided and a precise phase alignment between the offered clock signal and the input data signals must be externally maintained by some complex means, which will restrict their application. Indeed, some on-chip techniques<sup>[8,9]</sup> such as delay-locked loop (DLL), manual phase adjustment, and clock multiplying unit (CMU), can be introduced to alleviate these shortcomings, but they cannot completely solve the problem.

This paper presents a 5-Gb/s half-rate multiplexer in 0.18  $\mu$ m CMOS, which is accompanied by a CE circuit to produce the required clock signal and realize the function of precise phase alignment, with no need for external components, refer-

ence clock, or adjustment. It can also reduce the chip size and facilitate the integration which is required for a parallel optical link system.

### 2. System architecture overview

As shown in Fig. 1, the target parallel optical link system consists of 12 independent, but identical channels. In each channel, a 2 : 1 MUX with a function of CE and APA, an LDD, and a VCSEL compose the transmitter, and the receiver comprises a PD, a TIA, an LA, and a 1 : 2 DEMUX with a CDR. A bundle of 850 nm multimode optical fibers is used to transmit optical signals between the transmitters and the receivers. The whole system is intended for interconnections between highperformance processors.

The 2:1 MUX with the function of CE and APA is a critical block in the system, and the work aims at the block used for the system. The system budget requires that under 1.8 V supply voltage, two 2.5 Gb/s differential CML signals with a single-ended swing of less than 300 Vpp are applied, and multiplexed by the block, and then a 5 Gb/s differential CML signal with a single-ended swing of 300 Vpp is produced without any external help.

### 3. Circuit design

A block diagram of the 2:1 MUX accompanied by CE and APA is shown in Fig. 2. Two 2.5 Gb/s differential signals (Din0 and Din1) are multiplexed into a 5 Gb/s differential sig-

<sup>\*</sup> Project supported by the National High Technology Research and Development Program of China (Nos. 2007AA01Z2a5, 2006AA0 1Z239).

<sup>†</sup> Corresponding author. Email: zgwang@seu.edu.cn Received 18 March 2009, revised manuscript received 11 May 2009



Fig. 1. Block diagram of the target parallel optical link system.



Fig. 2. Block diagram of the 2 : 1 MUX with a CE circuit.

nal (Dout), and the required clock signal is extracted from Din0. At the same time, a 2.5 GHz clock is provided for test purposes.

A two-stage input buffer is added for each pair of input signals, for amplification, reshaping, and improving the input sensitivity. A three-stage output buffer is used for amplification, reshaping, and driving the 50  $\Omega$  load. All inputs are terminated with on-chip 50  $\Omega$  resistors, and 100  $\Omega$  resistors are adopted for output matching, where a tradeoff between the signal integrity and power consumption is made.

#### 3.1. Clock extraction subcircuit

The core of the CE circuit comprises a phase frequency detector (PFD), a voltage-to-current conversion (V/I) circuit, a loop filter (LF), and a VCO, which is shown in Fig. 2. Firstly, the PFD compares the input data signal with the clock signal from the VCO to extract and generate the frequency and phase difference information in the form of voltage waveforms. Secondly, the generated signal voltage is converted into a current form by the V/I circuit. Finally, the control signal is generated by the loop filter, which is required by the VCO to produce the clock signal at the desired frequency. The whole process is an



Fig. 3. Block diagram of the PFD.

autonomous feedback loop, which produces and dynamically adjusts the clock signals to one with the desired frequency and phase, according to the input data signal. In other words, besides clock extraction, the CE circuit has another important function of APA, which guarantees the phase alignment between the extracted clock and input data, for the proper operation of the MUX.

Figure 3 shows a block diagram of the PFD<sup>[10]</sup>. It includes two identical phase detectors (PD) and a frequency detector (FD). The PD is composed of two latches and a selector, which is, actually, a double-edge-triggered flip-flop (DETFF). The only difference of the FD from the PD is that the selector used in the FD is a modified version from the one in the PD to generate a ternary output. The input data are compared with the in-phase(I)-clock signal and quadrature(Q)-clock signal by the two PDs, respectively, to produce two beat notes, which then are processed by the FD to deliver a frequency difference signal. As indicated in Fig. 3, one beat note and the frequency difference signal are sufficient and will be applied to the succeeding V/I circuit.



Fig. 4. Block diagram and schematic diagram of the VCO.



Fig. 5. Simulated extracted clock eye diagram (bottom) and multiplexed data eye diagram (top).

As shown in Fig. 4, the 2.5 GHz 4-stage ring VCO<sup>[11]</sup> assumes a source-coupled logic (SCL) topology, in which a currentfolding technique is used to alleviate the conflict between the voltage headroom and the sensitivity of the VCO. Here, PMOS transistors, instead of resistors, are used as the loads, because the latter is hard to precisely control during fabrication. It is indicated in Fig. 4 that two current sources are added in each VCO cell, in order to prohibit ceasing of the oscillation and improve the linearity of the VCO characteristic, respectively.

Because the quality of the clock bears the main responsibility for the jitter and pulse-width distortion of the multiplexed data, the clock extraction subcircuit must be optimized for a clock with a smaller jitter and a better cycle duty. The simulated extracted clock eye diagram is shown in Fig. 5.



Fig. 6. (a) Block diagram and (b) timing diagram of the 5 Gb/s half-rate MUX.

#### 3.2. Multiplexer (MUX) subcircuit

Owing to the lower power dissipation and reduced operating speed, a half-rate MUX is selected. Its core is composed of five latches, a selector and a clock delay buffer, as shown in Fig. 6(a).

These latches are employed to retime and adjust the phase of two input data. At the same time, it can be found in Fig. 6 that the clock signal is delayed immediately before the selector by 1/4 clock period, which, together with the timing offset between the two input data by 1/2 clock period, will



Fig. 7. Schematic diagrams for (a) latch (b) selector in the MUX.

give the optimum phase margin for the selector.

The CMOS current-mode logic (CML), i.e. sourcecoupled logic (SCL), is widely employed across the whole circuit, including latches, selectors and buffers, because of its advantages such as small internal voltage swing, reduced time jitter and crosstalk, and good common mode suppression characteristics. Figure 7 shows the schematic diagrams for a latch and a selector used in the MUX. In fact, all latches and selectors in the whole circuit assume the same topologies as shown in Fig. 7, except the selector in the FD mentioned in Section 3.1.

The phase relation between the two input data, and that between the data and the clock, are so important that extra delay buffers must be inserted and a fine layout must be executed. The simulated multiplexed data is shown in Fig. 5.

## 4. Layout and implementation

The final performance of the circuit is closely related to the layout, so some elaborate measures are taken, such as preserving layout symmetry for CML differential structures, to



Fig. 8. Chip micrograph of the whole IC.

add enough substrate contacts to guarantee the consistent potential of the substrate, especially around key MOS transistors, and to place additional capacitors beside power pads for a clearer supply.

As shown in Fig. 8, the total chip area is  $670 \times 780 \,\mu\text{m}^2$ , including the pads. Two data are applied from the left pads, and the recovered clock signal and the multiplexed data are received from the right pads. The top pads are just ready for testing the CE circuit.

#### 5. Measurement results

The performance of the fabricated MUX was evaluated on-wafer by employing a Cascade Microtech probe station. Two single-ended  $2^{31}$ –1 pseudo-random bit sequence (PRBS) input data instead of differential ones were applied, because the proper multi-output pulse pattern generator was unavailable then, which detracted from the testing performance a little. An Agilent 86100A Infinium DCA wide-bandwidth oscilloscope was employed to receive, display and analyze the output signals.

The measurement results show that under a 1.8 V supply, the pull-in range of the CE loop is actually limited by the available operating frequency range of the VCO, that is, between 1.8 and 2.6 GHz. In other words, the CE circuit can work properly with any data input rate between 1.8 and 2.6 Gb/s, so the multiplexed output data with any rate between 3.6 and 5.2 Gb/s can be obtained, accordingly. Figure 9 shows output single-ended eye diagrams of extracted clock and multiplexed data with a data input rate of 2 Gb/s, 2.5 Gb/s and 2.6 Gb/s, respectively.

Figure 10 shows output single-ended eye diagrams of multiplexed data with a data input rate of 2.5 Gb/s, under a supply of 1.7 V and 2.0 V, respectively. From these figures and Fig. 9(b), it can be found that the higher the power supply, the smaller the pulse-width distortion. The MUX can operate under a 1.6 V supply at a cost of worse pulse-width distortion. At a 1.8 V supply, the IC consumes about 112 mW with input sensitivity of less than 50 mV and output single-ended swing of above 300 mV. However, when a 1.6 V supply is used, the



Fig. 9. Output single-ended eye diagrams of extracted clock and multiplexed data with a data input rate of (a) 2 Gb/s, (b) 2.5 Gb/s and (c) 2. 6 Gb/s, respectively, under a supply of 1.8 V.



Fig. 10. Output single-ended eye diagrams of multiplexed data with a data input rate of 2.5 Gb/s, under a supply of (a) 1.7 V and (b) 2.0 V, respectively.

| Table 1. Performance comparison of publis | shed high-speed MUXs. |
|-------------------------------------------|-----------------------|
|-------------------------------------------|-----------------------|

| Parameter                     | This work        | Ref. [3]         | Ref. [4]                  | Ref. [5]           | Ref. [6]     | Ref. [7]          |
|-------------------------------|------------------|------------------|---------------------------|--------------------|--------------|-------------------|
| Process                       | 0.18-µm CMOS     | 0.18-µm CMOS     | $0.18$ - $\mu$ m SOI CMOS | 0.35-µm CMOS       | 0.35-µm CMOS | 0.2-μm GaAs HEMT  |
| Function                      | 2:1              | 16:1             | 16:1                      | 4:1                | 2:1          | 2:1               |
| Data rate (Gb/s)              | 5                | 2                | 3.6                       | 4.8                | 5            | 10                |
| With CE                       | $\checkmark$     | ×                | ×                         | ×                  | ×            | ×                 |
| $P_{\rm diss}~({\rm mW})$     | 112              | 36.2             | 340                       | 1000               | 135          | 460               |
| Area ( $\mu$ m <sup>2</sup> ) | $670 \times 780$ | $820 \times 950$ | $1750 \times 1750$        | $1000 \times 1050$ | _            | $1300 \times 900$ |

consumed power is below 70 mW.

A comparison with several results from previously published studies on high-speed MUXs is listed in Table 1. According to the data, this MUX is the only one that is accompanied by a CE circuit. Moreover, apart from the complete and perfect function from the CE circuit, in the light of the operating rate, the power consumption, and the chip area, the MUX nonetheless has a better performance over these listed MUXs.

### 6. Conclusions

A 5 Gb/s half-rate 2:1 multiplexer for a parallel optical link system has been designed and fabricated in SMIC's  $0.18 \,\mu\text{m}$  CMOS technology. The MUX IC has an area of 670  $\times$  780  $\mu\text{m}^2$ , and can operate properly at any input data rate between 1.8 and 2.6 Gb/s. Under a 1.8 V supply voltage, the IC consumes a DC power of about 112 mW with an input sensitivity of less than 50 mV and an output single-ended swing of above 300 mV. However, when the supply voltage is lowered to 1.6 V, the total consumed power is below 70 mW.

The MUX can work reliably without any need for external reference clock or manual phase alignment between data and clock because of the existence of the CE with the function of APA. Moreover, the CE circuit is highly integrated and has a wide pull-in range, so neither external adjustment nor components, for example capacitors, are needed.

### References

- Berger C, Kossel M A, Menolfi C, et al. High-density optical interconnects within large-scale systems. Proc SPIE, 2003, 4942: 222
- [2] Kuchta D. 100 Gb/s-class parallel optical interconnects for high productivity computing systems. IEEE LEOS, 2005: 583

- [3] Tang X, Wang X J, Zhang S Y, et al. A 2-Gb/s 16 : 1 multiplexer in 0.18-μm CMOS. IEEE ICMMT, 2008, 2: 868
- [4] Nakura T, Ueda K, Kubo K, et al. A 3.6-Gb/s 340-mW 16:1 pipe-lined multiplexer using 0.18 μm SOI-CMOS technology. IEEE J Solid-State Circuits, 2000, 35(5): 751
- [5] Lu Jianhua, Wang Zhigong, Tian Lei, et al. A 0.35 μm CMOS
  4.8 Gb/s 4 : 1 multiplexer. IEEE CCSWSE, 2002, 1: 824
- [6] Runge K, Thomas P B. 5 Gbit/s 2:1 multiplexer fabricated in 0.35  $\mu$ m CMOS and 3 Gbit/s 1:2 demultiplexer fabricated in 0.5  $\mu$ m CMOS technology. Electrons Lett, 1999, 35(19): 1631
- [7] Xia Chunxiao, Wang Zhigong, Zhu En. A half-rate-clock 2:1 multiplexer in GaAs HEMT technology for 10 Gb/s optic-fiber link systems. Optoelectron Technol, 2004, 24(4): 211 (in Chi-

nese)

- [8] Hai T, Shaeffer D K, Min X, et al. 40-43-Gb/s OC-768 16:1 MUX/CMU chipset with SFI-5 compliance. IEEE J Solid-State Circuits, 2003, 38(12): 2169
- [9] Nakasha Y, Suzuki T, Kano H, et al. A 43-Gb/s full-rate-clock
  4 : 1 multiplexer in InP-based HEMT technology. IEEE J Solid-State Circuits, 2002, 37(12): 1703
- [10] Pottbäcker A, Langmann U. A Si bipolar phase and frequency detector IC for clock extraction up to 8 Gb/s. IEEE J Solid-State Circuits, 1992, 27: 1747
- [11] Razavi B. Design of integrated circuits for optical communications. New York: McGraw-Hill, 2003