1. Introduction
With the rapid development of integrated circuit (IC) applications such as microprocessors, optical transmission links, and chip-to-chip communications, a transceiver, which is a key component for these applications, has to continually provide high transmission speed with good signal integrity (SI) and low power consumption. In recent years, several high performance wireline communication standards such as PCI express (PCIE)[1] and universal serial bus (USB) have been developed, where parameters of the transmitter and receiver for the serial link have been specified. The specifications (PCIE 2.1 and USB 3.0) support data rates from 2.5 to 5 Gb/s and even higher speeds in the future, however, higher data rate or a longer cable length will degrade the magnitude. Thus, the transmitter is specified with the large output signal swing (800-1200 mV differential peak to peak voltage swing) and a pre-emphasis option of 3.5 dB and 6 dB to compensate the loss. As a result, large power is consumed at the transmitter. Meanwhile, it means that there is a heavy load on the pre-stage circuit, which makes circuit design difficult and complex. Furthermore, the 8b/10b code specified in the standards provides the odd input bits for the transmitter, making the traditional 2
In this paper, a 5 Gb/s serial link transmitter with pre-emphasis for multiple standards is proposed. A current mode (CM) output driver with the reverse scaling technique and bias current filtering (BCF) is implemented. The reverse scaling technique is introduced to save on pre-driver power and mitigate the load of the pre-stage circuit, and the BCF is adopted to suppress the output common mode noise. In order to cooperate with the 8b/10b encoder, a high speed and low power combined serializer with 10 bits input is proposed. The transmitter operates at both 2.5 Gb/s and 5 Gb/s, and the parameters are in full compliance with the PCIE 2.1 and USB 3.0 standard specifications with a significant margin. In the following, Section 2 discusses the typical pre-emphasis driver architectures. In Section 3, the proposed transmitter is described. Finally, the measurement results are shown in Section 4.
2. Pre-emphasis driver architectures
Transmitters usually contain several components, such as data generators and encoders, serializers, output drivers, etc. Among them, the output driver is the most important part of the transmitter, since it is the bottleneck of bandwidth and consumes most of the power of the whole transmitter. The output driver usually employs differential data transmission to improve the immunity and compliance to the noise. Actually, the output driver is a switch bridge driving a differential 100
There are generally two different architectures used for pre-emphasis output drivers: voltage mode (VM)[2-4] and current mode (CM)[5-9]. The equivalent circuit of a 2-tap pre-emphasis VM output driver is shown in Fig. 1(a). The output driver is subdivided into a pull-up branch and a pull-down branch implemented as a PMOS or NMOS switch transistors. Each of the two branches with several slice units is sized to match the transmission line's 50
Figure 1(b) presents the simplified schematic of the 2-tap CM pre-emphasis output driver. The main tap data drives the switches of the main driver to switch the current from one leg to the other, and the current
Table 1 summarizes the VM and CM pre-emphasis output driver used for comparison. It can be deduced that the VM output driver consumes about 1/4 the output driver power of the CM, but it has difficulties in achieving flexible pre-emphasis and swing setting, constant current, and area efficiency. Additionally, the power savings are mitigated by the higher complexity of the impedance control, pre-emphasis setting circuit, and supply regulator. Fundamentally, the CM output driver separates the impedance matching from the switching devices, allowing the flexible current summing technique to implement the pre-emphasis and swing level control. Also, the current source of the CM driver has high output impedance that has less impact on the termination impedance, so it offers better SI performance than that of VM output drivers[9].
![]() |
3. Circuit implementation
Figure 2 shows the transmitter architecture. The 10 bit input parallel data is generated by either a parallel pseudo random bit stream (PRBS) generator or an 8b/10b encoder, running at 500 MHz. These data streams are then converted into a 5 Gb/s data stream by the serializer, and finally sent to the load by the output driver with 2-tap pre-emphasis. The data generator and serializer are worked at 1.2 V power supply with a thin-oxide CMOS, while the output driver is powered by 2.5 V supply with a thick-oxide CMOS.
3.1 Serializer
The serializer is used to convert the 10 bit 500 Mb/s parallel data into 5 Gb/s serial main and post cursor tap data for pre-emphasis operation. The traditional high speed serializer[2] using tree architecture, however, could only convert 2
In order to serialize 10 bit parallel data to a 5 Gb/s data stream with the lowest power, a serializer combined with the SR and tree architecture is implemented. As shown in Fig. 3, a 5:1 SR clocked by both of the half rate clock CLK2 and the select clock SCLK generates an even (Deven) and an odd (Dodd) data stream out of the 10 bit input at the 500 MHz. The relationship of CLK2, CLK10, and SCLK is shown on the right-hand side of Fig. 3. The lowest bit data of the even (or odd) data is first shifted to the output by the MUX, and the highest bit data is shifted at last. The 2:1 MUX at the last stage (the simplest tree architecture, in fact) of the serializer converts the Deven and Dodd into the 5 Gb/s data stream with the half rate clock, using both rising and falling clock edges. Since the 5:1 SR deals with the 10 bits input at low speed and the 2:1 MUX operates at half the speed of the data rate, the whole serializer could work at the high speed data rate with the lowest power consumption.
In order to implement a 2-tap pre-emphasis transmitter, delayed versions of the transmitter data must be created. This is accomplished with the PE shift register. The PE shift register consists of only two latches for the 2-tap pre-emphasis transmitter scheme. The latch clocked by CLK2 outputs the even and odd data for the main tap. The other one, clocked by CLK2b, generates the delayed and inverted even and odd data for the post cursor tap.
3.2 CM pre-emphasis output driver
Based on the comparison of Table 1, the CM output driver has good characteristics and is easy to implement with little overall power overhead. Hence, the CM pre-emphasis output driver is chosen in our design, as shown in Fig. 4. In order to save the power of the whole driver and maintain the driving capability, a reverse scaling technique is used. Moreover, we adopt the BCF technique to reduce the common mode noise. Two full rate data streams for the main stage and post cursor stage are first level-shifted from the 1.2 V power supply domain of the thin-oxide serializer to the 2.5 V domain of the thick-oxide output driver. Since the post cursor stage's current is much lower than the main stage, it is a scaled replica of the main stage.
3.2.1 Analysis of the output signal swing
Figure 5(a) shows a simplified schematic of the 2-tap CM pre-emphasis output driver, where the switches are implemented by the NMOS transistors. The signals A[0] and A[-1] denote the current and the previous bit and An[
VOD_H=ILoadRL=RT(I1+I2)2RT+RLRL. |
(1) |
On the contrary, when A[0] and An[-1] are fed with the inversed data, a lower output signal swing is produced to implement the pre-emphasis. In this case, the equivalent circuit is shown in Fig. 5(c), and the output signal swing is defined by the equation
VOD_L=ILoadRL=I1−I22+RL/RTRL. |
(2) |
In the fully matched case of
VOD_H=14(I1+I2)RL, |
(3) |
VOD_L=14(I1−I2)RL. |
(4) |
Therefore, by properly setting the value of the tail current
3.2.2 Reverse scaling
As mentioned in Section 2, the CM output driver should employ multiple pre-driver stages to guarantee the driving capability. However, the power and the bandwidth of the pre-driver could be the issues. If
Reverse scaling[11, 12] provides bandwidth improvement in amplifiers and equalizers. In this paper, we introduce this concept for the output driver to save power and mitigate the load of the level shifter. As illustrated in Fig. 6, cascading of stages is scaled down in size by a factor of
τk=n=RT(CL+COUT), |
(5) |
where
τk<n=βn−kRT(CINβn−k−1+COUTβn−k)=RT(βCIN+COUT), |
(6) |
where
Obviously, the bandwidth of the pre-driver should be larger than the output stage, which means
τk<n<τk=n. |
(7) |
From Eq. (7), the relationship between
CL⩾βCIN. |
(8) |
In our design,
3.2.3 Bias current filtering (BCF)
As the high speed data sequence is pushed into the CM output driver, the noise will couple to the bias current with the same frequency and finally translates into a part of output common mode voltage noise. In PCIE 2.1, the variation of common mode voltage is specified to be less than 25 mV. The BCF technique, which is usually used in LC VCO design[13], is adopted to the CM output driver in order to suppress the noise. As shown in Fig. 7, passive components
4. Experimental results
The transmitter test chip is fabricated in 65 nm CMOS technology. For the ESD consideration, the output driver is fabricated in 280 nm thick-oxide technology. Figure 9 shows the die micrograph of the transmitter. The entire transmitter occupies an area of 240
A PRBS7 data stream produced by the on chip PRBS generator is sent to the transmitter. An Tektronix DSA 72004C digital analyzer is used to capture an eye diagram and measure jitter. Figures 10(a) and 10(b) show the differential eye diagram after passing though an 5 cm FR4 PCB trace, a connector, and a 1 m RG58 cable without pre-emphasis, at 2.5 Gb/s and 5 Gb/s, respectively. The total channel loss is 1.8 dB at 2.5 Gb/s and 4 dB at 5 Gb/s. The root mean square (RMS) jitter is 5.82 ps for 2.5 Gb/s and 9.94 ps for 5 Gb/s, the eye width can be achieved to 0.8 UIpp at both 2.5 Gb/s and 5 Gb/s; 60% thereof is deterministic jitter mainly due to period jitter which is caused by the spur of the clock and the intersymbol interference (ISI). The required eye height is greater than 800 mV at the transmitter output ports, and hence the measured far end eye height of 823 mV and 511 mV must be de-embedded at data rates of both 2.5 Gb/s and 5 Gb/s with a multiplication factor of 10
The total power consumption of the transmitter with no pre-emphasis is 39.8 mA and 41.2 mA for 2.5 Gb/s and 5 Gb/s operation, respectively, which is much smaller than the identical stage pre-driver technique (60 mA only for output driver). The tiny difference between the two situations is mainly caused by the serializer (1.0 mA at 2.5 Gb/s and 2.2 mA at 5 Gb/s), while the output driver's power consumption is almost constant. This is because the power consumption of the serializer is proportional to the data rate, but the power dissipation of the output driver is defined by the output signal swing and the reverse scaling factor
The performance of the overall transmitter is summarized in Table 2 and compared with prior works with similar architecture. Since the output driver, which dominates the bandwidth and the power consumption, is fabricated in 280 nm thick-oxide technology, the comparison with older technology work is meaningful. The reverse scaling output driver and the combined serializer largely reduce the power consumption. On the other hand, the widest eye width (EW) indicated a better SI performance of the overall transmitter.
![]() |
5. Conclusion
In this paper, the comparison of two typical output driver architectures indicates that the CM output driver is suitable for high speed transmitter design due to its flexibility of controlling the output swing and the pre-emphasis level ratio. Also, based on the output signal swing analysis, the CM output driver has no DDSC fluctuations, which relaxes the power supply regulator. The large output swing specified in the standard dictates the use of a multiple stage pre-driver of the CM output driver, which will cause power and bandwidth issues. These issues can be mitigated by the reverse scaling technique. Additionally, bias current filtering (BCF) is adopted to suppress the output common mode noise. Furthermore, a high speed combined serializer architecture is implemented to relax the timing constraints with 10 bits input. The whole transmitter circuit is implemented in 65 nm CMOS technology. It provides an eye height greater than 800 mV for data rates of 2.5 Gb/s and 5 Gb/s, and the major parameters shown in Table 2 are fully compatible with the PCIE 2.0 and USB 3.0 standard specifications with significant margin. The experimental results show that the proposed techniques have good effects on the transmitter output signal and power efficiency.
Acknowledgement: The authors would like to thank the Tektronix Open Library for the test equipment.