# 12 Gb/s 0. 25 \mu m CMOS Low Power 1 4 Demultiplexer \* Ding Jingfeng<sup>†</sup>, Wang Zhigong, Zhu En, Zhang Li, and Wang Gui (Institute of RF- & OE-ICs, Southeast University, Nanjing 210096, China) Abstract: A low power 12 Gb/s single-stage 1 4 demultiplexer (DEMUX) applied in SONET OC-192 is realized in TSMC 's mix-signal 0. 25µm CMOS. All of the circuits are in source coupled FET logic (SCFL) to achieve as high a speed as possible and suppress common mode distortions. This DEMUX is featured for achieving single-stage demultiplexing by using a quarter-rate IQ clock. This method not only reduces the components of the DEMUX but also lowers its power dissipation. The fabricated DEMUX operates error free at 12 Gb/s by 2<sup>31</sup> - 1 pseudorandom bit sequences in on-wafer testing. The chip size is 0. 9mm ×0. 9mm and the power dissipation is only 210mW with a single 2. 5V supply. Key words: demultiplexer; latch; CMOS; optical receiver **EEACC:** 1230B; 7250E **CLC number:** TN722 **Document code:** A **Article ID:** 0253-4177 (2006) 01-0019-05 ### 1 Introduction To meet the rapidly growing demand for advantages in the information infrastructure, China's domestic 2.5 Gb/s SDH backbone transmission network must be expanded. A time-division demultiplexer (DEMUX) is a key component in high speed data transmission. It normally lies at the end of an optic-fiber receiver, which recovers the original low speed parallel bit streams from a high speed serial input. Until now, most DEMUXs operating at more than 10 Gb/s have generally been fabricated in $\mbox{ GaAs }\mbox{ HEM}\,\mbox{ } T^{[1]}$ , Si Ge $\mbox{ BiCMOS}^{[2]}$ , $\mbox{ In}\,\mbox{ P}$ $HEMT^{[3]}$ ,and $InP HBT^{[4]}$ . But they all have the same drawback of high power dissipation with high power supply. Recent achievements in CMOS<sup>[5~9]</sup> have demonstrated that the design of data transmission systems at economical cost, high yield, and high integration is practicable. At a bit rate of more than 10 Gb/s, most of the ones having a smaller feature size of 0. 18µm<sup>[5]</sup> or 0. 12µm<sup>[6]</sup> were fabricated with more advanced technology, and those achieved in 0. 25µm CMOS<sup>[7,8]</sup>, consumed fairly high power. In this paper, a low-power 12 Gb/s 1 4 DE- MUX with IQ clock in TSMC mix-signal 0. 25µm CMOS is described. This architecture employs a quarter-rate IQ clock generated by an IQ frequency divider, which reduces the number of circuit elements and lowers the power dissipation compared with a conventional tree-type DEMUX with the same bit-rate. The fabricated 1 4 DEMUX functions error free at 12 Gb/s with 2³¹ - 1 pseudorandom bit sequences (PRBS) in on-wafer testing. The chip size, including the bonding pads, is only 0. 9mm ×0. 9mm, and its power dissipation is only 210mW with a single 2. 5V power supply. ## 2 Block diagram and timing The block diagram of the on stage 1 4 DE-MUX is shown in Fig. 1. This architecture allows for the demultiplexing of one serial data stream in one stage into four parallel data streams. The DE-MUX consists of a 2 1 IQ static frequency divider, two master-slave D-type flip-flops (MSDFF), two master-slave-master D-type flip-flops (MSMD-FF), two extra latches used for aligning, and several buffers. As shown in the time chart in Fig. 2, the two MSDFFs capture the lead bit on the positive-edge of the 1/2 IQ clock, respectively, and the two <sup>\*</sup> Project supported by the National High Technology Research and Development Program of China (Nos. 2002AA312230 and 2003AA31G030) <sup>†</sup> Corresponding author. Email:dingjingfeng @seu.edu.cn MSMDFFs capture the lead bit on the negative-edge of the 1/2 IQ clock respectively. Thus the input data is taken into four different data streams every four bits and aligned with the positive-edge of the 1/2 IQ clock. In the interest of synchronization, two extra latches are applied after the MSMDFF and MSDFF are aligned with the 1/2 I clock to align their outputs with the positive-edge of the 1/2 Q clock. Compared to the traditional tree topology<sup>[8]</sup>, the one-stage 1 4 DEMUX has the following advantages: (1) Fewer components and simple structure; (2) Low power dissipation. Fig. 1 Block diagram of the one-stage DEMUX Fig. 2 Time chart of the one-stage DEMUX The number of latches in this topology is just 14, including the two latches used in the 1 2 IQ static frequency divider, while a traditional tree type DEMUX usually has 17 latches. Buffers in the middle of the tree type used to adjust the time condition are also eliminated because they are no longer needed in this structure. As the first high speed stage in the tree type which consumes much more power is eliminated, this DEMUX saves power substantially. ### 3 Circuit design Circuit design becomes very challenging when the operation speed of the designed circuit is comparable with the $f_{\rm T}$ of the transistors. In this case, a suitable circuit design is indispensable. The circuits presented in this paper are exclusively designed in source coupled FET logic (SCFL), which is important in ultra high-speed circuit design for lowering the voltage swing and reducing the common mode distortions. The design of this circuit has been optimized by an ADS2003A simulator. Since the CMOS technique has larger variation than the GaAs technique, the function of this circuit is ensured by post simulations with all technique process corners. #### 3.1 Latch circuit In Fig. 3, the schematic of the latch used with a typical SCFL is shown. It samples the input data during the high level of the clock and holds the sampled data during the low level of the clock. By setting the voltage level properly, source followers, commonly applied in GaAs and InP SCFL circuits for level shifting, can be omitted. This method not only improves the speed of the latch but also enhances the symmetry of the schematic topology and the layout. At the speed of 10 Gb/s, it is very difficult to precisely sample and hold the input data. To enhance the performance, all of the data path transistors (NM1 $\sim$ NM4) are the same size and are 4/5of clock transistors (NM5 and NM6). The lower width of the data path devices reduces the parasitic capacitance. The increased gate width of the clock transistors accelerates the slew rate of the tail current when the input clock signal switches. Poly-silicon resistors are used as low capacitor loads instead of pMOS transistors to guarantee high speed operation at a low power supply. The tail current of the latch is set to 1. 0mA. ### 3.2 IQ frequency divider Figure 4 shows the schematic of the 1 2 static frequency divider used in the 1 4 DEMUX. It consists of two latches with negative feedback. It is Fig. 3 Schematic of the latch very important to maintain an accurate 90 ° phase offset of the output IQ clock to reduce the skew of the four channel outputs. Source followers are added directly after its outputs to reduce its load and enhance its drive ability. The topology and layout of the frequency divider can be fully balanced so that a source follower is not needed. To minimize the phase mismatch of the output, the symmetrical layout of the differential clock paths is carefully respected to suppress the common mode noise and stabilize the high frequency ground. Minimum interconnections are preferred, particularly in the feedback lines. All differential signals are placed symmetrically to avoid differences in transit time. Due to the limitation of chip area, only one pair of its outputs is connected to pads for testing. Fig. 4 Schematic of the 1 2 IQ static divider ### 3.3 Output drive circuit Figure 5 shows the schematic of the output drive circuit, which is designed to provide enough voltage swing on external 50 loads. In each stage, the tail current is two times larger than that of the previous stage including the source followers. The first stage of the differential amplifier offers a high voltage swing of 1. 2V. This high swing voltage drives the second stage into a limiting state. An on-chip output termination resistor of 100 is provided to reduce the reflection from the chip. Fig. 5 Schematic of the output drive circuit #### 4 Fabrication and measurement This circuit is realized in TSMC 's standard 0. 25µm single-poly 5-metal (5M1P) CMOS technology of TSMC via the multi-project wafer service (MPW) of our institute. The cutoff frequency $f_{\text{T}}$ of this process is 18. 6GHz. A micro-photo of the fabricated chip is shown in Fig. 6. The chip size is 0. 9mm ×0. 9mm and is determined by the bonding pads. The input data and clock are AC-coupled and terminated with 50 on-chip resistors. The output buffers are designed to drive 50 external loads. An on-chip output termination resistor of 100 is provided to reduce the output return loss compared to the open drain configuration. Fig. 6 Micro-photo of the DEMUX The performance of the fabricated DEMUX measured on-wafer with Caccade was Microtech 's probe station. The error free operating range of this chip was tested from DC up to 12 Gb/ s and the amplitudes of input data and clock signal were 500mV<sub>pp</sub>. Figure 7 shows the measured eyediagram of one single-ended output with a 10 Gb/s 2<sup>31</sup> - 1 PRBS input data and a 5GHz sinusoidal clock signal. The measured rms jitter and rising and falling edges of the eye-diagram are 2.3, 144. 4, and 137. 8ps respectively. Figure 8 shows the measured eye-diagram with a 12 Gb/s 2<sup>31</sup> - 1 PRBS input data and a 6 GHz sinusoidal clock signal. According to 4 parallel eye-diagrams in Fig. 9, their skew is less than 20ps (5 %) and it confirms that the time adjusting method adopted in this chip is practicable. Figure 10 shows the output, divided by two, with the maximum input clock at 7 GHz. The typical power consumption of this chip is 210mW with a single 2.5V supply. Moreover, this chip could work properly under a supply from 2. 1 to 3. 0V. The features of this DEMUX and the one in Ref. [8] are summarized in Table 1. Fig. 7 Eye-diagrams of the DEMUX at 10 Gb/s input Fig. 8 Eye-diagrams at 12 Gb/s input Fig. 9 All of the outputs of the DEMUX at 10 Gb/s input Fig. 10 Measured output of divider at 7 GHz input Table 1 Summary of this DEMUX | | This work | Ref.[8] | |-------------------|---------------------------------------------------------|--------------------------------------------| | Technology | 0. $25 \mu m \text{ CMOS}$<br>$f_t = 18. 6 \text{ GHz}$ | 0. 25µm CMOS<br>f <sub>t</sub> = 18. 6 GHz | | Chip size | 0.9mm <b>x</b> 0.9mm | 1.0mm ×1.0mm | | Maximum speed | 12 Gb/ s | 10 Gb/s | | Supply voltage | 2.5V | 3.3V | | Power dissipation | 210mW | 693mW | | Jitter(rms) | 2.3ps | N/ A | | Rise time | 144.4ps (10 % ~ 90 %) | N/ A | | Fall time | 137.8ps (10 % ~ 90 %) | N/ A | ### 5 Conclusion A 12 Gb/s one-stage 1 4 DEMUX in 0. 25µm CMOS has been designed, fabricated and measured. By using a quarter-rate IQ clock, the topology is simplified and its power dissipation is reduced. The DEMUX works from DC up to 12 Gb/s with only 210mW power dissipation. This DEMUX can be applied in the STM-64 or SONET OC-192 optical receiver. #### References - [1] Lang M, Wang Z, Lao Z, et al. 20-40 Gb/s0. 2-µm GaAs HEMT chip set for optical data receiver. IEEE J Solid-State Circuits, 1997, 32(9):1384 - [ 2 ] Meghelli M , Rylyakov A V , Shan L . 50 Gb/ s Si Ge BiCMOS 4 1 multiplexer and 1 4 demultiplexer for serial communication systems. ISSCC ,2002 :260 - [ 3 ] Sano K, Murata K, Kitabayashi H, et al. 50-Gbps InP HEMT 4 1 multiplexer/1 4 demultiplexer chip set with a multiphase clock architecture. IEEE Trans Microw Theory Tech, 2003,51(12):2548 - [4] Yen J, Case M G, Nielsen S, et al. A fully integrated 43. 2 Gb/s clock and data recovery and 1 4 DEMUX IC in InP HBT technology. ISSCC,2003:240 - [5] Tanabe A, Umetani M, Fujiwara I, et al. 0. 18-µm CMOS 10-Gb/s multiplexer/demultiplexer ICs using current mode logic with tolerance to threshold voltage fluctuation. IEEE J Solid-State Circuits, 2001, 36(6):988 - [6] Kelhrer D, Wohlmuth H, Knapp H, et al. 40-Gb/s 2 1 multiplexer and 1 2 demultiplexer in 120nm CMOS. ISSCC ,2003: 344 - [7] Wang Huan, Wang Zhigong, Feng Jun, et al. 12 Gb/s data decision and 1 2 demultiplexer in 0. 25µm CMOS. Chinese Journal of Semiconductors, 2004, 25 (11):1521 - [8] Tian Lei ,Wang Zhigong ,Chen Haitao ,et al. 10 Gb/ s 1 4 demutiplexer in 0. 25µm CMOS. SPIE ,2001 ,4063:121 - [9] Chen Yingmei, Wang Zhigong, Xiong Mingzhen, et al. - 2. $5\,\text{Gb/s}$ monolithic IC of clock recovery ,data decision and 1 4 demultiplexer. Chinese Journal of Semiconductors ,2005 , 26(8):1532 # 12 Gb/s 0. 25 µm CMOS 低功耗 1 4 分接器 \* #### (东南大学射频与光电集成电路研究所,南京 210096) 摘要: 实现了一种能运用于光传输系统 SONET OC-192 的低功耗单级分接器 ,其工作速率高达 $12\,\mathrm{Gb/s}$ . 该电路采用了特征栅长为 $0.25\,\mu\mathrm{m}$ 的 TSMC 混和信号 CMOS 工艺. 所有的电路都采用了源极耦合逻辑 ,在抑制共模噪声的同时达到尽可能高的工作速率. 该分接器具有利用四分之一速率的正交时钟来实现单级分接的特征 ,减少了分接器器件 ,降低了功耗. 通过在晶圆测试 ,该芯片在输入 $12\,\mathrm{Gb/s}$ 长度为 $2^{31}$ - 1 伪随机码流时 ,分接功能正确. 芯片面积为 $0.9\,\mathrm{mm}$ x0. $9\,\mathrm{mm}$ ,在 $2.5\,\mathrm{V}$ 单电源供电的情况下的典型功耗是 $210\,\mathrm{mW}$ . 关键词:分接器;锁存器;CMOS;光接收机 **EEACC:** 1230B; 7250E 中图分类号: TN722 文献标识码: A 文章编号: 0253-4177(2006)01-0019-05 <sup>\*</sup>国家高技术研究发展计划资助项目(批准号:2002AA312230 和 2003AA31 G030)