# A high speed sampler for sub-sampling IR-UWB receiver\*

Shao Ke(邵轲), Lu Bo(陆波), Xia Lingli(夏玲琍), and Hong Zhiliang(洪志良)<sup>†</sup>

(State Key Laboratory of ASIC and System, Fudan University, Shanghai 201203, China)

**Abstract:** A high speed sampler for a sub-sampling impulse radio UWB receiver is presented. In this design, the sampler uses a time-interleaved topology with a single track and hold circuit, full custom clock generator, and offset cancelled comparator. These three main blocks are also discussed and analyzed. The circuit was fabricated in 0.13  $\mu$ m CMOS technology. Measurement results indicate that the sampler achieves a maximum 3 GS/s sampling rate. The power consumption of the sampler is 27 mW under a supply voltage of 1.2 V. The total chip area including pads is  $1.4 \times 0.97$  mm<sup>2</sup>.

**Key words:** IR-UWB; sampler; sub-sampling; TH; clock generator **DOI:** 10.1088/1674-4926/31/4/045004 **EEACC:** 2570

### 1. Introduction

Ultra-wideband (UWB) transmission was approved by the FCC in 2002 for several frequency bands (0–960 MHz, 3.1–10.6 GHz, and 22–29 GHz)<sup>[1]</sup>, and has since used in a variety of applications. One of the most discussed applications is high speed rate communications operated in the frequency band from 3.1 to 10.6 GHz. IR-UWB (impulse radio UWB) uses carrier-less short pulse to spread energy over at least 500 MHz of bandwidth. This has the potential for much lower power consumption and higher integration than conventional approaches<sup>[2]</sup>.

Digital IR-UWB receivers offer numerous advantages over architectures based on analogue correlation, such as RAKE<sup>[3]</sup> and Transmitted Reference (TR) system<sup>[4]</sup>, but present implementation challenges in analog-to-digital converters (ADC) resolution and power consumption. Sub-sampling technique resolves this problem. For example, a low-power and highspeed ADC<sup>[5]</sup> is developed to realize 2 GHz sub-sampling rate and a direct sampling approach based on this ADC was proposed in Ref. [6]. Some other analysis of ADC requirements were published in Ref. [2, 7–9]. The results concluded that it can still provide adequate throughput over relatively short distances (<10 m) with power consumption on the order of a milli-Watt when using a 1 bit quantization accuracy. However, these work just based on system level modeling and simulation, while did not present the circuit implementation.

In this paper, a high speed sampler for sub-sampling IR-UWB receiver is designed in 0.13  $\mu$ m CMOS technology. In section 2, the circuit implementation of this sampler is described. Section 3 shows the experimental results. Finally, a conclusion is given in section 4.

## 2. Circuit implementation

### 2.1. Architecture

There are two kinds of samplers implementing in our IR-UWB receiver. The input data frequency and sampling rate of one kind of sampler proposed in this paper are specified at 1-2 GS/s. Since the proposed architecture has a reconfigurable merit in quantization accuracy, it can be adopted for another kind of sampler with a sampling rate beyond 2 GS/s. In order to meet the system requirements, 16 separate time-interleaved channels are needed to convert the signal into data and then deliver to the baseband. It relaxes the bandwidth and speed requirements of individual blocks.

The block diagram of the proposed sampler is shown in Fig. 2. It is composed of five blocks (excluding buffers): TH (track and hold) circuit, channel selection clock generator, channel clock generator, current bias and comparators. Figure 2 shows a single-ended path, but the actual implementation is fully differential.

A TH circuit is implemented for improving the dynamic performance of the sampler. The TH circuit can largely remove the errors due to clock skews, limited input bandwidth, signal-



Fig. 1. Sub-sampling receiver architecture.



Fig. 2. Block diagram of the proposed sampler.

\* Project supported by the National High Technology Research and Development Program of China (No. 2009AA01Z261) and the State Key Laboratory of Wireless Telecommunication, Southeast University.

Received 25 September 2009, revised manuscript received 18 November 2009

© 2010 Chinese Institute of Electronics

<sup>†</sup> Corresponding author. Email: zlhong@fudan.edu.cn



Fig. 3. Proposed TH circuit.

dependent dynamic nonlinearity, and aperture jitter by holding the sampled analog value during quantization<sup>[10]</sup>. To create interleaved channels, the switches which connect TH with comparators have to be turned on one after another. The channel selection clock generator produces 16-phase non-overlapping clocks to control these switches. Auto-zeroing technique is adopted in the comparator design for offset cancellation, and the needed clocks are provided by channel clock generator.

#### 2.2. TH circuit

The TH circuit is critical for achieving good dynamic performance over broadband input signals at gigahertz sampling rate. Figure 3 shows an open-loop TH circuit.

The switches are implemented using single NMOS M1 and M2 without bootstrapping. Although bootstrapping would result in a lower and more constant on-resistance, it is not necessary here for the accuracy. A MIM capacitor  $C_{\rm H}$  is used as a hold capacitor. In open-loop TH circuit, the trade-off between the turn-on resistance of the switch and the holding capacitance limits the speed and accuracy. The TH acquisition time constant  $\tau$  in the tracking mode is given as

$$\tau = R_{\rm on}C_{\rm H} = \frac{C_{\rm H}}{\mu_{\rm n}C_{\rm ox}(W/L)(V_{\rm GS} - V_{\rm TH})},\qquad(1)$$

where  $R_{\rm on}$  is the turn-on resistance of the switch. It can achieve a larger bandwidth by reducing the capacitance and the resistance. In the hold mode, the pedestal error  $\Delta V_{\rm P}$  due to the charge injection is given as

$$\Delta V_{\rm P} = \frac{Q_{\rm channel}}{2C_{\rm H}} = \frac{WLC_{\rm ox}(V_{\rm GS} - V_{\rm TH})}{2C_{\rm H}},\tag{2}$$

where  $Q_{\text{channel}}$  are the charge stored in the MOS transistor channel. It shows that the pedestal error can be reduced by increasing the capacitance and the resistance. The holding capacitor and the aspect ratio of transistor are 80 fF and 10  $\mu$ m/0.13  $\mu$ m, respectively based on two equations above if  $\Delta V_{\text{P}}$  is 80 mV.

When the switches are turned off, the charge  $Q_{\text{channel}}$  must be released. The amount 1/2  $Q_{\text{channel}}$  that flows to the  $C_{\text{H}}$ causes charge injection. Moreover, the gate drain capacitance of the switches causes clock feed-through. By introducing the dummy switches M7 and M8 that are half the size of the switches with gate signal Clkb opposite of Clk, the released



Fig. 4. HD3 and ENOB of TH circuit.



Fig. 5. Timing sequence of clocks in the proposed sampler.

charge will be absorbed by the dummy switches and the influence from Clk on  $C_{\rm H}$  will be eliminated, thus the two effects above are compensated<sup>[11]</sup>. Source followers, which utilize sufficiently large PMOS devices M3 and M4, are used as output buffers in order to drive subsequent 16-channel comparators and keep the offset of device smaller than half an LSB. Meanwhile, the output of small replica source followers M5 and M6 are used to bias the well of the main source followers. This has linearity advantage over a source follower with a well-to-source connection<sup>[12]</sup>.

Figure 4 shows the post simulation results of HD3 (thirdharmonic distortion) and ENOB (effective number of bits) of TH circuit. It indicates that its ENOB closes to 10.3 bits when acquitting samples of a sinusoid wave with 200 mV amplitude at 3 GS/s sampling rate, and the TH circuit delivers samples to the comparators with HD3 of about -63.6 dBc. Simulation results also indicate that the ENOB of TH circuit is still more than 8 bits even when the amplitude of input sinusoid wave becomes 400 mV.

#### 2.3. Clock generator

The function of the clock generator divides the input global clock into two kinds of 16-phase clocks. Figure 5 shows the timing sequence of these clocks. Clk is the global clock from PLL, S01–S16 are channel selection clocks with duty cycle less than 1/16, and C01–C16 are channel clocks with 1/2 duty cycle.

The ring counter architecture which consists of positive edge clock trigger D-flip-flops (DFFs) is used as a core component of the clock generator. To operate at gigahertz frequency with low power consumption, the DFF is implemented based on true single phase circuit (TSPC)<sup>[13]</sup>. The DFF can easily provide a set or reset function by adding a NAND or NOR gate after its output. The ring counter is triggered by Clk, and its



Fig. 6. Block diagram of clock generator.



Fig. 7. Proposed comparator circuit.

output waveforms are Q01-Q16 in Fig. 5.

In order to get the desired channel selection clocks, for example S01, the global clock Clk is used to make NAND operation with Q01, thus the gate output S01 with less than 1/16 duty cycle is guaranteed. The NAND gate in this design contains some delay circuits for Q01 to make sure that only one positive cycle of Clk is included when Q01 is active. The size of transistors in these circuits needs to be carefully adjusted through simulation to overcome the influence from the variation of process and temperature. The channel clocks generation also use Q01–Q16 from ring counter for power saving. As shown in Fig. 6, the output of OR gate which inputs are Q02 and Q10 is used as the clock signal of a standard positive edge clock trigger DFF. At the output terminals of this DFF, two phases of channel clock C04 and C12 with 1/2 duty cycle are gotten.

Since the required two kinds of multi-phase clocks are generated based on the same ring counter, meanwhile each phase in one kind of clock is created by the same circuit, thus the timing sequence among all clocks is satisfied.

#### 2.4. Comparator

The comparator circuit is composed of a differential amplifier with offset cancellation followed by a sense-amplifier based latch<sup>[14]</sup> and a driver, as shown in Fig. 7.

The preamplifier provides both sufficient gain to compensate for the relatively high input referred offset voltage of the latch  $V_{OL}$  and isolation from latch kickback noise. To reduce the preamplifier's contribution to the comparator offset, output offset storage (OOS) is used<sup>[15]</sup>. In this technique, the preamplifier is auto-zeroed by storing the offset voltage at its output capacitor  $C_{\rm C}$ . Auto-zero process not only cancels the preamplifier's offset, but also reduces the preamplifier 1/f noise significantly.



Fig. 8. Die microphotograph.

 $V_{\rm OL}$  is determined by the matching properties of latch's input transistors, it should be well below 0.5 LSB, so the gain of the preamplifier  $A_{\rm V}$  must satisfy the formula:

$$A_{\rm V} = \frac{2^{b+1} V_{\rm OL}}{V_{\rm FS}},\tag{3}$$

where  $V_{\rm FS}$  is the minimum full scale input voltage, and *b* is the quantization accuracy of sampler. When  $V_{\rm FS}$  and  $V_{\rm OL}$  equal 18 mV and 15 mV, respectively,  $A_{\rm V}$  is about 3.33 (10.5 dB). The gain is low so the common-mode feedback is not required in preamplifier.

Because the offset of the preamplifier is completely canceled, thus the total offset only results from the charge injection mismatch between two switches s1 and s2. As a result, the comparator input referred offset  $V_{\rm OS}$  is

$$V_{\rm OS} = \frac{\Delta Q}{A_{\rm V}C_{\rm C}} + \frac{V_{\rm OL}}{A_{\rm V}},\tag{4}$$

where  $\Delta Q$  is the charge injection mismatch of switches s1 and s2.

### **3.** Experimental results

The proposed sampler is implemented in SMIC 0.13  $\mu$ m 1P8M CMOS technology. The die photomicrograph is shown in Fig. 8. It contains all blocks except 15 channel comparators for a feasible test scheme according to the test equipments. The core area is  $0.66 \times 0.38 \text{ mm}^2$ , and the total die area including pads is  $1.4 \times 0.97 \text{ mm}^2$ .

The chip is directly bonded to a 4-layar FR-4 (flame retardant 4, a type of material used for making a PCB) substrate for measurement, as shown in Fig. 9. In PCB design, the reflection parameter ( $S_{11}$ ) is the only loss coefficient that be considered, other loss factors like the parasitic of the PCB trace, discrete devices for differential matching network, bondwire and pads whose values are hard to estimate are all neglected. Actually, these values cause significant degradation in sampler performance.

Table 1 summaries the performance of the measured sampler. The measurement results reported in this paper include all the PCB related parasitic, and all pads are ESD protected. TH circuit and clock generator are the dominant source of power consumption. One channel comparator, however, only consumes  $340 \ \mu A$  average current. Thus when including other 15



Fig. 9. Test PCB.

Table 1. Performance summary.

| Parameter                 | Value                           |
|---------------------------|---------------------------------|
| Input data frequency      | 1.6–2.1 GHz                     |
| Sampling rate (max)       | 3 GS/s                          |
| Power (without CK buffer) | 27 mW @ 1.7 GHz input frequency |
| Supply voltage            | 1.2 V                           |
| Technology                | $0.13 \ \mu m CMOS$             |
| Die area                  | $1.4 \times 0.97 \text{ mm}^2$  |



Fig. 10. Measured sampler output and channel selection clock wave.

channels, the total power consumption will still be comparable with 27 mW.

The measured time domain waveform is illustrated in Fig. 10 with 1.7 GHz input data frequency at 2.97 GS/s sampling rate. When the input clock frequency is higher than 4 GHz, the measured channel selection clock is still correct on the aspect of the timing sequence relationship with the input clock, as shown in Fig. 11.

# 4. Conclusion

A 3 GS/s sampler for a sub-sampling IR-UWB receiver is implemented in a 0.13  $\mu$ m 1P8M CMOS technology. In this sampler, the time-interleaved topology is used with three main blocks such as a single TH circuit, full custom clock generator, and offset cancelled comparator. These blocks are also discussed and analyzed in the paper. The sampler dissipates 27 mW with a 1.2 V supply voltage at 1.7 GHz input data. Based



Fig. 11. Measured input and channel selection clock wave.

on this architecture, the quantization accuracy can be easily improved.

### References

- First report and order. Federal Communications Commission Std. FCC, 02-48, 2002
- [2] O'Donnell I D, Brodersen R W. An ultra-wideband transceiver architecture for low power, low rate, wireless systems. IEEE Trans Vehicular Technology, 2005, 54(5): 1623
- [3] Choi J D, Stark W E. Performance of ultra-wideband communications with suboptimal receivers in multipath channels. IEEE J Selected Areas in Communications, 2002, 20(9): 1754
- [4] Dang Q H, Trindade A, van der Veen A J, et al. Signal model and receiver algorithms for a transmit-reference ultra-wideband communication system. IEEE J Selected Areas in Communications, 2006, 24(4): 773
- [5] Chen S M, Brodersen R W. A 6-bit 600-MS/s 5.3-mW asynchronous ADC in 0.13-μm CMOS. IEEE J Solid-State Circuits, 2006, 41(12): 2669
- [6] Chen S M, Brodersen R W. A subsampling radio architecture for ultrawideband communications. IEEE Trans Signal Processing, 2007, 55(10): 5018
- [7] Hoyos S, Sadler B, Arce G. Monobit digital receivers for ultrawideband communications. IEEE Trans Wireless Communications, 2005, 4(4): 1337
- [8] O'Donnell I D, Brodersen R W. A 2.3mW baseband impulse-UWB transceiver front-end in CMOS. Symposium on VLSI Circuits Dig Tech Papers, 2006: 200
- [9] O'Donnell I D, Brodersen R W. A flexible, low power, DC-1GHz impulse-UWB transceiver front-end. IEEE International Conference on Ultra-Wideband, 2006: 275
- [10] Choi M, Abidi A A. A 6-b 1.3-Gsample/s A/D converter in 0.35μm CMOS. IEEE J Solid-State Circuits, 2001, 36(12): 1847
- [11] Rudy van de Plassche. CMOS integrated analog-to-digital and digital-to-analog converters. Kluwer Academic Publishers, 2003
- [12] Jiang X C, Chang M F. A 1-GHz signal bandwidth 6 bit CMOS ADC with power-efficient averaging. IEEE J Solid-State Circuits, 2005, 40(2): 532
- [13] Huang Q T, Rogenmoser R. Speed optimization of edge-triggered CMOS circuits for gigahertz single-phase clocks. IEEE J Solid-State Circuits, 1996, 31(3): 456
- [14] Heo S, Krashinsky R, Asanovic K. Activity-sensitive flip-flop and latch selection for reduced energy. IEEE Trans VLSI Syst, 2007, 15(9): 1060
- [15] Razavi B. Design of analog CMOS integrated circuits. McGraw-Hill Press, 2001