# A low-power high-swing voltage-mode transmitter\*

Chen Shuai(陈帅)<sup>1,2,3,†</sup>, Li Hao(李昊)<sup>1,2,3</sup>, Shi Xiaobing(石小兵)<sup>1,3</sup>, Yang Liqiong(杨丽琼)<sup>1,2,3</sup>, Yang Zongren(杨宗仁)<sup>1,2,3</sup>, Zhong Shiqiang(钟石强)<sup>1,3</sup>, and Huang Lingyi(黄令仪)<sup>1,3</sup>

<sup>1</sup>Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China <sup>2</sup>Graduate University of the Chinese Academy of Sciences, Beijing 100049, China <sup>3</sup>Loongson Technologies Corporation Limited, Beijing 100190, China

**Abstract:** A low-power voltage-mode-logic (VML) transmitter fabricated in TSMC 28 nm CMOS technology is presented. The VML driver outputs a high-swing signal and consumes less power than a current-mode-logic (CML) driver. To further reduce power, the driver is divided into two voltage domains by level shifters. Moreover, the proposed driver topology can achieve mutually decoupled impedance self-calibration and equalization control. The measurement result shows that the transmitter merely dissipates 23 mW/channel while exhibiting an 880 mV differential eye height at 4.488 Gb/s.

**Key words:** voltage-mode transmitter; low power; impedance self-calibration; equalization; mutually decoupled **DOI:** 10.1088/1674-4926/33/4/045003 **EEACC:** 6150D; 1280

### 1. Introduction

As the bandwidth of high speed serial links required in the processor interconnect technology such as HyperTransport<sup>[1]</sup> has increased aggressively up to 51.2 GB/s, the power consumption has become a major concern of SerDes system design. In many SerDes circuits, CML drivers have been applied<sup>[2,3]</sup>, however they have drawbacks such as static power dissipation and an inability to provide a large range of termination voltages. VML drivers can overcome these disadvantages<sup>[4]</sup>, which only consume 1/4 output stage power of CML drivers and support high-swing termination voltage. As well as attaining low-power operation, maintaining signal integrity is another key point in SerDes circuit design. To maintain good signal integrity and minimize reflections, the driver's impedance needs to be calibrated to match the transmission line impedance in spite of process and temperature variations. Besides, a forward feedback equalizer (FFE) is commonly used to mitigate the effect of the channel attenuation. However, it is still a challenge in VML driver design to implement the impedance calibration and the equalization independently and efficiently.

We present a source-series-terminated (SST, one type of VML driver) transmitter whose impedance self-calibration and equalization control are mutually decoupled. The pull-up and pull-down impedances can be self-calibrated respectively to tolerate all process variations. The 2-tap FFE has eight programmable settings. To reduce power, the thick-oxide SST output stage operates at 1.2 V while the other thin-oxide devices work at 0.85 V. Our transmitter can output a high-swing signal while the power efficiency is as low as 5.2 mW/Gb/s at 4.488 Gb/s.

## 2. Related work

Figure 1(a) shows the output stage of a CML driver. It can maintain good signal integrity because the current source has high output impedance and the internal resistors can provide good impedance matching, but it burns much more current. The output stage current of the CML driver can be expressed as: (assuming signal 'in' is low)

$$\begin{aligned} I_{\text{CML}} &= I_1 + I_2 \\ &= I_1 + 3I_1 \\ &= \frac{4 \left( V_{\text{outb}} - V_{\text{out}} \right)}{100}. \end{aligned}$$
(1)

Figure 1(b) shows the output stage of a SST driver. It contains a pull-up and a pull-down branch implemented with a



Fig. 1. (a) CML driver and (b) SST driver.

<sup>\*</sup> Project supported by the National Sci & Tech Major Project of China (Nos. 2009ZX01028-002-003, 2009ZX01029-001-003) and the National Natural Science Foundation of China (Nos. 60921002, 61003064, 61050002, 61070025, 61100163, 61133004, 61173001, 60801045).
† Corresponding author. Email: chenshuai@ict.ac.cn

Received 29 August 2011, revised manuscript received 17 November 2011



Fig. 2. Previous SST transmitter architecture.

PMOS or NMOS transistor in series with a poly resistor. The resistance of the transistor is nonlinear and susceptible to process and temperature, so the poly resistor is increased to dominate the total impedance. But this results in a larger width of the transistor, therefore the transistor to poly resistance ratio is determined by the optimum trade-off between linearity accuracy and area. The output stage current of the SST driver can be expressed as: (assuming signal 'in' is low)

$$I_{\text{SST}} = I_3$$
$$= \frac{V_{\text{outb}} - V_{\text{out}}}{100}.$$
 (2)

According to Eqs. (3) and (5), it is concluded that when providing the same output signal, the CML driver will burn four times the output stage current of the SST driver. Although the SST driver is much more power efficient, its impedance matching has to be achieved by additional impedance adjustment circuits. The recent SST transmitter works have been presented in Refs. [5–7], as shown in Fig. 2.

As illustrated in Fig. 2(a), Philpott *et al.*<sup>[5]</sup> employed 64 selectable resistors series-connected to all SST slices to adjust the impedance, but the additional FETs resulted in a voltage headroom penalty<sup>[7]</sup>. Another SST driver, shown in Fig. 2(b) was proposed by Menolfi *et al.*<sup>[6]</sup>, which achieved impedance matching by enabling a certain number of SST slices. It controlled equalization by supplying the enabled slices with different data taps. A disadvantage of this topology was that the equalization tuning was affected by the number of the enabled slices, which meant the equalization tuning and the impedance matching were interdependent. Kossel *et al.*<sup>[7]</sup> presented an improved method based on Ref. [6], which controlled equalization inside each slice, as shown in Fig. 2(c). The equalization

ization was no longer affected by the number of the enabled slices. But it had two drawbacks: (1) the impedance calibration was unable to carry out automatically; (2) both pull-up and pull-down resistances were simultaneously adjusted smaller or larger, which was incorrect in some process corners such as PMOS in the fast corner but NMOS in the slow corner.

### 3. Circuit design

We propose an SST transmitter which can overcome disadvantages mentioned above, and its top-level circuit is illustrated in Fig. 3. The 4:2 SER serializes the quarter-rate data into half-rate data and the following FFE circuit generates full-rate main and post-cursor tap data for 2-tap pre-emphasis; the slice input multiplexer controls the setting of the pre-emphasis coefficient. The level shifter partitions the transmitter into two voltage domains for the sake of low power. The thick-oxide SST output stage operates at 1.2 V (VDD) and the other thin-oxide devices work at 0.85 V core voltage. The SST output stage, comprised of 15 identical slices, is divided into four segments in binary (1X, 2X, 4X, 8X). The output stage combines the main and post-cursor tap data and outputs the pre-emphasized signals. In addition, the driver's output impedance can be adjusted to 50  $\Omega$  with the codes generated by the impedance calibration cell.

#### 3.1. 4:2 SER and 2-tap FFE

As is seen from Fig. 4, a 4-bit parallel input data stream is latched by four flip-flops with the quarter-rate clock. The paths of D[2] and D[3] use two latches to delay the data by half a quarter-rate clock period. Then the multiplexers convert the four path data into the half-rate even and odd data. The even and odd data are converted into the full-rate main tap data with the half-rate clock in the same way. To attain the post-cursor tap data, two additional latches are used to delay the half-rate data by half a half-rate clock period and a multiplexer is used to convert the half-rate data to the full-rate one. The full-rate main tap and post-cursor tap data will be used to de-emphasize the output signal.

#### 3.2. Level shifter

Figure 5 shows the topology of the differential level shifter, which converts the signal swing from 0.85 to 1.2 V. The conventional level shifter only consists of the transistors P2, P3, N1 and N2. However, the cross-coupled P2 and P3 are unable to raise the output from low to high fast enough, therefore the additional transistors P1 and P4 are used to pre-bias the outputs to help the PMOS transistors switch faster. Because the level-shifted voltage (350 mV) is smaller than the threshold voltage of P2 and P3, there is no leakage current when the input signals are logically high.

#### 3.3. SST output stage

Impedance matching: the circuit topology of the SST output stage is shown in Fig. 6. The SST output stage consists of 15 identical parallel slices, which are partitioned into four segments. These slices are all enabled in our topology, which is different from the partially enabled scheme in Refs. [5, 6].



Fig. 3. Proposed SST transmitter.



Fig. 4. 4:2 SER and 2-tap FFE.

Because the total parallel output impedance should maintain 50  $\Omega$  to match the transmission line impedance, the single slice impedance needs to be adjusted to 750  $\Omega$ , which is 15 times of 50  $\Omega$ .

Each slice contains an always-enabled SST branch (4×) and five programmable binary-weighted SST branches (sized from 1× to 16×). The always-enabled branch constrains the maximum impedance value in order to refine the calibration accuracy. The input signals of the five programmable binary-weighted SST branches are controlled by NAND/NOR gates with compensation codes. These codes U\_code<0:4> and D\_code<0:4> are obtained from the impedance calibration cell. They are used to turn on/off the five programmable branches respectively to adjust the slice output impedance to be 750  $\Omega$ .



Fig. 5. Level shifter.

Equalization control: the two-tap equalization is implemented by assigning the four segments with either the main tap or the post-cursor tap data, as shown in Fig. 7. The slice input multiplexers control the input signal of the four segments. When the equalization is turned on, the segments (SST output stage) combine the main tap and the post-cursor tap data to generate pre-emphasized output signals.

Mutually decoupled: when we employ different equalization settings, the impedance of each slice is not changed and the total output impedance remains 50  $\Omega$ . Equalization tuning is achieved by controlling the slice input signals. Meanwhile, the impedance adjustment is done inside the slice, so this architecture removes the dependency between the two functions.

#### 3.4. Impedance calibration cell

The impedance calibration is carried out automatically during the transmitter initialization. The calibration circuit, as depicted in Fig. 8, makes use of the mirror current topology to



Fig. 6. SST output stage.



Fig. 7. Equalization control.

compress the power supply noise. The resistances of pull-up and pull-down dummy branches are calibrated to 750  $\Omega$  respectively to tolerate all process variations.

The calibration principle is as follows: a reference current  $I_1$ , immune to PVT variations, is produced according to the external reference resistance  $R_{\text{ext}}$ . The Ucodes/Dcodes gener-

ated by counters control the changes of  $V_{\text{mid1,2}}$ . When there is a set of code making  $V_{\text{mid1,2}}$  equal  $V_{\text{DD}}/2$ , this set is just what impedance matching needs, and these codes are latched. It is because that at this time  $I_2$  is copied by the mirrors to the pull-up/down dummy branches accurately and the resistances of the dummy branches are equivalent to  $R_{\text{ext}}$  (750  $\Omega$ ). After the dummy branches are calibrated to 750  $\Omega$ , the cell is powered down to save power. The latched codes are sent to the pull-up/down branches (identical to the dummy branches) in each slice. With these codes, the total output impedance of the SST transmitter is adjusted to 50  $\Omega$ .

### 4. Chip measurements

The test board of our 28 nm chip is shown in Fig. 9 and the chip includes our transmitter, PLLs and other modules. The layout of the 3-channel transmitter is depicted in Fig. 10. Each channel employs the same driver topology proposed in this paper. The transmitter core occupies  $220 \times 320 \ \mu\text{m}^2$  and has 7 signal pads including a reference resistor pad.

Monte–Carlo simulations were run 500 times to measure the deviation of the driver's output resistance. The simulation results plotted in Fig. 11 show that the output resistance without calibration may be out of the acceptable range (it is commonly requested to keep the resistance between 45 and 55  $\Omega$  in high speed IO circuit). The driver's resistance after calibration in Monte–Carlo simulations show that all results are included in the requested range between 45 and 55  $\Omega$ .

Figure 12 shows the measured eye diagram of a 4.488 Gb/s PRBS-23 signal with -3 dB pre-emphasis. The measured dif-



Fig. 8. Impedance calibration cell.



Fig. 9. Chip test board.



Fig. 10. Transmitter layout.

ferential eye opening is 880 mV. The measured peak-to-peak jitter is 69 ps. The jitter is greater than the simulation results mainly because of the simultaneous switching output (SSO) noises.



Fig. 11. Monte–Carlo simulation results. (a) Pull-up resistance. (b) Pull-down resistance.

Table 1 shows the design summary of our transmitter and the comparison between the related works.

## 5. Conclusion

A low-power high-swing voltage-mode transmitter is fabricated in TSMC 28 nm CMOS technology. It outputs a differential 880 mVpp signal running at 4.488 Gb/s and the power efficiency is as low as 5.2 mW/Gb/s. Moreover, compared with the previous SST works, our SST transmitter can: (1) selfcalibrate the pull-up and pull-down impedances respectively to tolerate all process variations; (2) control the impedance selfcalibration and the equalization independently.



Fig. 12. Measurement results.

# References

- [1] Hyper Transport Specification 3.10 First Release [online]. Available: http://www.hypertransport.org
- [2] Rylyakov A, Rylov S. A low power 10 Gb/s serial link transmitter in 90-nm CMOS. Compound Semiconductor Integrated Circuit Symposium, 2005: 4
- [3] Higashi H, Masaki S, Kibune M, et al. A 5-6.4-Gb/s 12-channel transceiver with pre-emphasis and equalization. IEEE J Solid-State Circuits, 2005, 40(4): 978
- [4] Bugharbieh K A, Krishnan S, Mohan J, et al. An ultralow-power

Table 1. Design summary and comparison.

|                            | U                 | 2                 |                   |                   |                           |
|----------------------------|-------------------|-------------------|-------------------|-------------------|---------------------------|
| Reference                  | [2] <sup>1)</sup> | [3] <sup>1)</sup> | [5] <sup>2)</sup> | [7] <sup>2)</sup> | Our<br>work <sup>2)</sup> |
| Technology (nm)            | 90                | 110               | 65                | 65                | 28                        |
| Data rate (Gb/s)           | 10                | 6.4               | 20                | 8.5               | 4.488                     |
| Eye open (mVpp)            | 900               | 104               | 300               | 1000              | 880                       |
| Power efficiency           | 17.4              | 23                | 8.3               | 11.3              | 5.2                       |
| (mW/Gb/s)                  |                   |                   |                   |                   |                           |
| Mutually                   | -                 | -                 | Ν                 | Y                 | Y                         |
| decoupled <sup>3)</sup>    |                   |                   |                   |                   |                           |
| Respectively <sup>4)</sup> | _                 | _                 | Y                 | Ν                 | Y                         |

1) CML driver. 2) SST driver. 3) Impedance self-calibration and equalization are decoupled. 4) Pull-up and pull-down resistance are calibrated respectively.

10-Gbits/s LVDS output driver. IEEE Trans Circuits Syst I: Regular Papers, 2010, 57(1): 262

- [5] Philpott R A, Humble S J, Kertis R A, et al. A 20 Gb/s SerDes transmitter with adjustable source impedance and 4-tap feedforward equalization in 65 nm bulk CMOS. Custom Integrated Circuits Conference, 2008: 623
- [6] Menolfi C, Toifl T, Buchmann P, et al. A 16 Gb/s source-series terminated transmitter in 65 nm CMOSSOI. IEEE International Solid-State Circuits Conference, 2007: 446
- [7] Kossel M, Menolfi C, Weiss J, et al. AT-coil-enhanced 8.5 Gb/s high-swing SST transmitter in 65 nm bulk CMOS with 16 dB return loss over 10 GHz bandwidth. IEEE J Solid-State Circuits, 2008, 43(12): 2905