# VLSI Implementation of a Single-Chip DVB-C Demodulator \*

Tian Junhua<sup>1</sup>, Shen Bo<sup>1</sup>, Su Jianing<sup>1</sup>, Li Zheng<sup>1</sup>, Li Jian<sup>1</sup>, Guo Yawei<sup>2</sup>, and Zhang Qianling<sup>1</sup>

(1 State Key Laboratory of ASIC & System, Fudan University, Shanghai 200433, China)
(2 Shanghai MicroScience Integrated Circuits Co, Ltd, Shanghai 200433, China)

**Abstract :** A single-chip DVB-C quadrature amplitude modulation (QAM) demodulator is proposed ,which integrates a 3. 3V 10bit 40MSPS analog-to-digital converter and a forward error correction decoder. The demodulator chip can support  $4 \sim 256$  QAM with variable bit rate up to 80Mbps. It features a wide carrier offset acquisition range ,optimal demodulation algorithm ,and small circuit area. The chip is implemented in SMIC 0. 25µm 1P5M mixed signal CMOS technology with a die size of 3. 5mm x3. 5mm. The maximum power consumption is 447mW.

**Key words :** QAM demodulator ; VLSI implementation ; carrier recovery ; blind equalization **EEACC :** 1250 ; 1280 ; 2570D

**CLC number :** TN432 **Document code :** A **Article ID :** 0253-4177 (2005) 07-1309-08

## 1 Introduction

Quadrature amplitude modulation (QAM) demodulators are widely used in the physical layer interface of digital cable TV and cable modems<sup>[1,2]</sup>, as well as many other high-speed digital communication transceivers.

According to DVB-C and ITU J83-A specification<sup>[1]</sup>, it is possible to transmit one HDTV or  $2 \sim 4$ SDTV programs in a single 8M Hz/ 6M Hz channel with QAM modulation. For such kinds of wideband applications, it is impractical to implement the demodulation algorithm with general purpose DSPs due to their insufficient performance and high cost. Therefore, dedicated ASIC implementation is the right choice for digital cable TV demodulators.

Several ASIC implementations of QAM demodulators were proposed<sup>[3~9]</sup>. A QAM receiver integrated with a 10bit ADC and FEC decoder was presented in Ref. [3], but it had several shortages, such as lacking the capability to directly sample the signal with 36M/44M IF, a fixed sampling rate with 4x IF frequency, and a low carrier frequency offset acquisition range. Reference [4] introduced a QAM demodulator with a carrier frequency offset range of 80k Hz, which had a relatively large area and less integration level. Reference [5] proposed a highly integrated QAM demodulator, but from the architecture of the carrier recovery loop, it can be concluded that the carrier acquisition range is not very large. Reference [6] introduced a VLSI architecture for a blind QAM demodulator which used a normal four-corner carrier recovery algorithm. Reference [7] proposed a 64/256QAM receiver with a symbol rate up to 8Mbaud, but its implementation loss was large. Reference [8] aimed for low power dissipation but it needed a complex architecture of two additional digital to analog converters (DAC). Reference [9] proposed a low-IF QAM transceiver

Received 16 January 2005 , revised manuscript received 10 March 2005

<sup>\*</sup> Project supported by the Shanghai Municipality IC Design Innovation Project (No. 047062008)

Tian Junhua male, was born in 1978, PhD candidate. His research interest focuses on VLSI design for communication.

Shen Bo male, was born in 1975, PhD. His research interests include ASIC design and SOC design.

Zhang Qianling female, was born in 1936, professor. Her research interests include ASIC design and signal processing.

<sup>©2005</sup> Chinese Institute of Electronics

but with single mode.

In this paper, we propose a monolithic multimode QAM demodulator with an integrated direct IF sampling ADC and FEC decoder. Compared with existing QAM receivers, our demodulator has a large carrier frequency offset correction, robust blind demodulation algorithm and small circuit area.

## 2 Demodulator architecture

### 2.1 Principles of QAM demodulation

Figure 1 shows the block diagram of the proposed QAM demodulator.



Fig. 1 Block diagram of the proposed QAM demodulator

The modulated QAM RF signal is down-converted to 36/44MHz IF by a TV tuner, then the IF signal is filtered with a band-pass surface acoustic wave (SAW) filter to remove out-of-band noise and interference. The output signal of the SAW is amplified by a voltage controlled variable gain amplifier (VGA) to fully utilize the resolution of the ADC. Tuner, SAW, and VGA are placed in the analog front end as discrete components, which will not be integrated into the QAM demodulator.

A 10bit ADC directly samples the VGA output at a clock frequency of 28MHz or 36MHz (according to the IF frequency). The signal is then down-converted to base-band by a digital mixer<sup>[10]</sup> and filtered by a pair of low pass image rejection filters. A timing recovery loop and fully digital interpolator are used to recover the correct sampling frequency/phase of the QAM symbol. Joint carrier recovery and blind equalization cooperate to remove the carrier frequency offset and impairments due to the channel.

The output of the equalizer is sent to a DVB compliant de-interleaver to improve performance over impulse and burst noise. A (204,188) Reed-Solomon (RS) decoder is then used to correct up to 8 bytes of error in a 204 bytes MPEG frame.

Finally, the MPEG transport stream (TS) data from the output of the QAM receiver is sent to an MPEG decoder to recover the video and audio signals.

#### 2.2 Timing recovery

In the QAM demodulator, there are four control loops correlated with each other —automatic gain control (AGC), timing recovery, carrier recovery, and equalization. These loops must be locked to their appropriate states before correct demodulation. It is a major issue in a QAM demodulator to make these loops work properly under various channel distortion. Prior to carrier recovery and equalization, timing in the demodulator must be synchronized to the received symbols by the timing recovery loop shown in Fig. 2.



Fig. 2 Timing recovery loop

A polynomial interpolator that supports variable baud rate adjusts the sampling time by a numerically controlled oscillator (NCO). The resulting timing-recovered data stream at the output of the interpolator is then filtered by the match filter, which is implemented using the canonic signed digit (CSD) architecture. The timing recovery loop consists of a timing error detector ,loop filter, and NCO. The timing error detector calculates timing error from the output of the match filter. NCO employs the filtered timing error signal that passed through the loop filter to produce the variable baud rate clock. When the timing recovery loop has been locked, the received symbol is sampled with correct frequency and phase.

#### 2.3 Joint carrier recovery and blind equalization

The A GC and timing recovery loops work independently in the first step of demodulation; however, carrier recovery and blind equalization must work in cooperation with each other due to the mutual effect between channel distortion and carrier frequency offset.

Figure 3 shows the details of the joint architecture of carrier recovery and blind equalization. The blind equalizer consists of a feed-forward equalizer (FFE) and a decision feedback equalizer (DFE) ,which is coupled with the carrier-recovery loop. The carrier-recovery loop consists of an optimal frequency detector (FD), a phase detector (PD), a loop filter, and a digital control oscillator (DCO).



Fig. 3 Joint carrier recovery and blind equalization

The optimal FD receives the soft decision s(n)of the equalizer to improve the performance of frequency offset acquisition, which is the key technique of the joint architecture. The carrier-recovery loop using the conventional decision directed (DD) algorithm<sup>[3,11]</sup> is sensitive to channel impairment without the optimal FD. The optimal FD can acquire a large frequency offset quickly; even the signal noise ratio (SNR) is relatively low. However, the FD cannot acquire a low steady-state jitter. Thus, in order to lower the steady-state jitter and eliminate the phase offset, the carrier-recovery loop then switches to phase detecting mode when frequency offset is locked. The phase detecting mode needs the s(n) and hard decision h(n) to track the phase offset, which uses a normal DD algorithm.

Additionally, in order to improve the acquisition range of the frequency offset, we incorporate a frequency sweeping control unit in the carrier-recovery loop. When the FD does not lock at in a certain time, a sweeping control unit will adjust the frequency of the digital mixer periodically until the FD locks. With the aid of the frequency sweeping, the performance of our carrier recovery loop is better than those in Refs. [11,12]. For typical DVB-C applications with 6.875 MBaud, our demodulator can acquire a frequency offset up to  $\pm 18$ % baud rate, i. e. more than 1M Hz.

#### 2.4 ISI cancellation and blind equalizer

In high-speed digital communication, the inter symbol interference (ISI) introduced by the channel is the main cause for significant performance degradation. A blind equalizer is indispensable for reducing the ISI of digital TVs without training sequences.

VLSI implementation of the blind equalizer is based on an adaptive finite impulse response (FIR) filter or infinite impulse response (IIR) filter, so the circuit area of such equalizers is very large. In order to design an area-efficient QAM receiver, it is very important to optimize the equalizer.

The intrinsic frequency response of the cable channel, such as ripple and echo, can be modeled as FIR filters, thus we adopt the blind equalizer with an 8-tap FFE and a 16-tap DFE to cancel ISI. The equalizer architecture is shown in Fig. 4, which is implemented in transposed form. The transposed filter has a constant critical path delay and good timing stability.



Fig. 4 Equalizer architecture

The FFE, as shown in the upper side of Fig. 4, can be configured as either *T*-space or T/2-space mode via 3 additional MUXs. The input rate and output rate are both 1/T in *T*-space mode; however, the input rate and output rate are 2/T and 1/T, respectively, in T/2-space mode. The MUXs can control the output with the rate of 1/T in both modes. The improved structure reduces area cost compared to the normal fractionally spaced equalizer (FSE). The lower part of Fig. 4 is the DFE. The FFE and DFE are used to cancel the pre-cursor and post-cursor ISIs, respectively.

The equalizer uses a dual-mode of the constant modulus algorithm (CMA) and least mean square (LMS) algorithm to update the tap coefficients, i. e.  $H_{\rm ffe}(0) \sim H_{\rm ffe}(7)$  and  $H_{\rm dfe}(0) \sim H_{\rm dfe}(15)$ . CMA only concerns the power of QAM signals, which is insensitive to the carrier frequency offset. Therefore CMA is used prior to the carrier recovery when QAM signals are severely distorted by cable channel. But CMA cannot compensate for the channel distortion completely, so LMS is then used to get global convergence with perfect properties once frequency and phase offset have been eliminated.

### 2.5 Other channel impairments and solutions

Besides the carrier frequency offset and ISI described above, channel impairments include (1) AM and FM hum modulation, which are amplitude modulation and frequency modulation, respectively, caused by coupling of low frequency AC power; (2) thermal noise, which is modeled as white Gaussian noise; (3) impulsive and burst noise, which are determined by the surroundings.

Dual analog A GCs are used to compensate for the amplitude attenuation of cable plant response, one is for tuner gain control (the other is for IF amplifier control). Compared with single A  $GCs^{[3-5]}$ , dual A GCs have larger dynamic range and less distortion. AM hum modulation is compensated for by a digital AGC placed after the match filter.

A RS decoder, together with a convolutional de-interleaver with an interleaving depth of 12, is used to resist white noise and burst noise.

## **3** Chip implementation methodology

## 3.1 Design flow

Design of a high performance QAM demodulator chip is a challenging task. Particularly the alldigital QAM demodulator with integrated 10bit ADC needs a mixed-signal design flow.

Firstly, a detailed functional specification of the QAM demodulator is specified which defines the baud rate, mode of QAM constellation, clock frequency, etc. Table 1 is a brief of this demodulator.

Table 1 Functional specification of the demodulator

| QAM mode         | 4/ 16/ 32/ 128/ 256QAM                |  |  |  |  |
|------------------|---------------------------------------|--|--|--|--|
| Baud rate        | 1 ~ 10M                               |  |  |  |  |
| Carrier off set  | ±18 % baud rate                       |  |  |  |  |
| Baud rate offset | ±400ppm                               |  |  |  |  |
| A GC             | Dual PWM output                       |  |  |  |  |
| ADC              | 10bit ,40MSPS                         |  |  |  |  |
| RS decoder       | (204 ,188)                            |  |  |  |  |
| Interleaver      | Convolutional interleaver, depth = 12 |  |  |  |  |
| Output interface | DVB common interface                  |  |  |  |  |
| Clock frequency  | 56/72MHz                              |  |  |  |  |

Secondly, we use C and Matlab to describe functional models of the QAM demodulator. Both floating point and fixed-point models are developed. After extensive algorithm and architecture exploration, we shall get a robust QAM demodulation algorithm suitable for chip implementation.

Thirdly, we define the detailed VLSI architecture such as module partition, system control interface, clock strategies and so on. After evaluating the pros and cons of different architectures, optimized chip architecture is validated.

Finally, we implement the chip according to the optimized architecture. The digital circuits are designed with Verilog HDL while the analog circuits are designed at the transistor level with schematic entry. Verification of the digital and analog circuits adopts advanced mixed signal methodology.

### 3.2 Clock strategies

Clock strategies are very important for reliable chip design. We divide the demodulator into two clock domains:system clock domain ( $clk \_ sys$ ) and 4 × baud rate clock domain ( $clk \_ b4$ ). The  $clk \_ b4$ is derived from  $clk \_ sys$  using a NCO. The rising edge of  $clk \_ b4$  is synchronized with the falling edge of  $clk \_ sys$ ,which ensures reliable data transfer between the two clock domains.

Compared with the clock strategies in Ref. [3], our method is more reliable and greatly facilitates the clock tree generation during logic synthesis and layout design.

The system clock is generated by an on-chip PLL with low phase noise ( - 120dBc/ Hz @ 100kHz). The PLL is programmable to output a 56MHz or 72MHz clock according to different IF frequencies.

### 3.3 Area reduction techniques

Several area reduction techniques are adopted in this demodulator.

Due to the nearly identical function of the imphase and quadrature-phase branches of the QAM demodulator, interleaving is used to reuse the computation units in both branches. This technique is adopted in modules such as image rejection low pass filters, interpolator and match filters.

The complex equalizer occupies a large proportion of the area in a whole chip; therefore, we use a multiplier-sharing technique to reduce the number of multipliers from 4 per tap to only 1 per tap.

## 4 Experimental results

Figure 5 describes the locking process of a 256QAM under severe channel distortion and frequency offset. Fig. 5 (a) shows the constellation after CMA equalization; we cannot really identify the QAM constellation due to frequency offset. Fig. 5 (b) shows the constellation after frequency offset recovery. The constellation does not rotate anymore, but does still have a certain phase offset. After phase recovery, the QAM constellation has the right position, which is shown in Fig. 5 (c). Due to the property of the CMA algorithm, the residual noise is still very large. After switching from CMA to LMS mode, the SNR of the QAM constellation is greatly improved, which can be found in Fig. 5 (d).



Fig. 5 256QAM with frequency offset locked

Figure 6 shows the bit error rate (BER) performance before the FEC decoder. The measured BER performance is very close to the theoretical estimation. Its implementation loss is less than 0. 5dB under 256QAM @BER =  $1 \times 10^{-4}$  (uncoded).



Fig. 6 BER performance of proposed QAM demodulator

The ASIC chip is fabricated in SMIC 0.  $25\mu$ m 1P5M mixed signal CMOS technology. The core area of the chip is 3mm ×3mm and the die size is 3. 5mm ×3. 5mm. The total transistor number is 6. 4 × 10<sup>5</sup> K. The chip is packaged in a 128-pin QFP. The supply voltage of the analog circuit and digital circuit are 3. 3V and 2. 5V, respectively. The maximum power dissipation is 447mW at 6Mbaud.

The die microphotograph of the chip is shown

in Fig. 7. In the figure ,ADC and PLL are located in the upper right corner of the chip. The remaining area is the digital logic circuit and memory unit.



Fig. 7 QAM demodulator die microphotograph

We test the chip in a digital cable TV system, which consists of transmitter, receiver and cable channel. The transmitter comprises a DVD player, DVB-C compliant encoder, and QAM modulator. The receiver includes the analog front end (tuner, SAW filter and IF amplifier), the proposed QAM demodulator, a MPEG decoder, and a TV monitor. From Fig. 8 the oscilloscope shows the correct MPEG TS signals labeled Syn, Error and Valid, etc. The video of high quality (BER < 1  $\times 10^{-12}$ ) shown in Fig. 9 further demonstrates the robust-



Fig. 8 MPEG TS signals



Fig. 9 Test system for QAM demodulator

ness of the proposed demodulation algorithm and its VLSI implementation.

## 5 Conclusion

An area efficient high performance  $4 \sim 256QAM$  demodulator chip has been designed with a maximum bit rate of 80Mbps and a frequency offset acquisition of  $\pm 18$  % baud rate. The demodulator IC integrates high precision analog functions such as a 10bit A/D converter and PLL for an onchip clock generation referenced to an off-chip crystal. The QAM demodulator meets all the requirements of DVB-C/ ITU J83-A.

Table 2 shows the comparison results of other reported QAM demodulators. The advantages of our demodulator are obvious. It features a larger maximum baud rate ,wider carrier offset acquisition range ,high integration level ,and low circuit complexity.

|                   | Tan <sup>[3]</sup> | Yamanaka <sup>[4]</sup> | D Luna <sup>[5]</sup> | Zhang <sup>[6]</sup> | Shin <sup>[7]</sup> | Fukuoka <sup>[8]</sup> | Chang <sup>[9]</sup> | This work    |
|-------------------|--------------------|-------------------------|-----------------------|----------------------|---------------------|------------------------|----------------------|--------------|
| QAM mode          | 4~1024             | 16/ 64/ 256             | 4~256                 | 64                   | 64/256              | 4/16/32/64             | 64                   | 4~256        |
| Baud rate         | 1~7M               | < 8.25M                 | 1~7M                  | 0.875~7M             | < 8 M               | 7M                     | 5.38M                | 1~10M        |
| Carrier off set   | N/ A               | ±80kHz                  | N/ A                  | N/ A                 | N/ A                | ±200k Hz               | ±100 KHz             | ±1.8MHz      |
| Baud rate off set | N/ A               | N/ A                    | N/ A                  | N/ A                 | N/ A                | ±60ppm                 | ±200ppm              | ±400ppm      |
| A GC              | 1 * PWM            | 1 * PWM                 | 1 * PWM               | N/ A                 | N/ A                | 8-bit DAC              | N/ A                 | 2 * PWM      |
| ADC               | 10bit              | -                       | 10bit                 | -                    | 10bit               | 8bit                   | -                    | 10bit        |
| PLL               | Y                  | -                       | Y                     | -                    | -                   | -                      | -                    | Y            |
| RS decoder        | (204,188)          | -                       | (204,188)             | -                    | -                   | (204,188)              | -                    | (204,188)    |
| Interleaver       | Y                  | -                       | Y                     | -                    | -                   | Y                      | -                    | Y            |
| Technology        | 0.5µm              | 0.5µm                   | 0.35µm                | -                    | 0.35µm              | 0.35µm                 | 0.35µm               | 0.25µm       |
| Chip area         | 7mm ×6.7mm         | 12.87mm ×12.49mm        | 8mm ×8mm              | -                    | -                   | N/ A                   | 5.5mm ×5.5mm         | 3.5mm ×3.5mm |
| Tran number       | 650 K              | 880 K                   | 2.3M                  | -                    | 210 K gates         | 640 K                  | 280 K gates          | 640 K        |

Table 2 Comparison of existing QAM demodulators

### References

- [1] ETSI EN 300 429 Ver. 1. 2. 1, Digital Video Broadcasting (DVB); Framing structure, channel coding and modulation for cable systems, 1998
- [2] Data-over-cable service interface specifications. Radio frequency interface specification. SP-RFIv2. 0-105-040407,2004
- [3] Tan L K, Putnam J S, Lu F, et al. A 70-Mb/s variable-rate 1024-QAM cable receiver IC with integrated 10-b ADC and FEC decoder. IEEE J Solid-State Circuits, 1998, 33(12):2205
- Yamanaka K, Takeuchi S, Murakami S, et al. A multilevel QAM demodulator VLSI with wideband carrier recovery and dual equalizing mode. IEEE J Solid-State Circuits, 1997, 32 (7):1101
- [5] DL L J, Tan L K, Mueller D, et al. A single-chip universal cable set-top box/ modem transceiver. IEEE J Solid-State Circuits, 1999, 34(11):1647
- [6] Zhang Yongxue, Fei Haidong, Yu Lixin, et al. Practical implementation of blind equalization, carrier recovery and timing recovery for QAM cable receiver chip. Proceedings of 5th International Conference on ASIC, Beijing, China, 2003:886

- [7] Shin D, Park K H, Sunwoo M H. A 64/256 QAM receiver chip for high-speed communications. Proceedings of 13th Annual IEEE International ASIC/ SOC Conference, Arlington, VA USA,2000:214
- [8] Fukuoka T, Nakai Y, Hayashi D, et al. An area effective lchip QAM LSI for digital CATV. IEEE Trans Consumer Electron, 1997, 43 (3):649
- [9] Chang C C, Shiue M T, Wang C K. A hardware efficient 64-QAM low-IF transceiver baseband for broadband communications. IEEE AP-ASIC2004, Fukuoka Japan, 2004:252
- [10] Shen Bo , Zhang Qianling. A high performance DDFS suitable

for digital video encoder. Chinese Journal of Semiconductors, 2001,22(6):796(in Chinese)[沈泊,章倩苓.一种适用于数字 视频编码器的高性能直接数字频率合成器.半导体学报, 2001,22(6):796]

- [11] Kim K Y,Choi H J. Design of carrier recovery algorithm for high-order QAM with large frequency acquisition range. Proc ICC, Helsinki, Finland, 2001:1016
- [12] Yuan Ouyang, Wang Chinliang. A new carrier recovery loop for high-order quadrature amplitude modulation. Global Telecommunication Conference, Taipei ,2002:478

# 一种单芯片 DVB-C 解调器的 VLSI 实现\*

#### 田骏骅'沈 泊' 苏佳宁'李 铮'李 建'郭亚炜'章倩苓'

(1复旦大学专用集成电路与系统国家重点实验室,上海 200433)(2上海微科集成电路有限公司,上海 200433)

摘要:设计了一个单芯片实现的用于 DVB-C 的 QAM 解调器.片上集成有 3.3V 10 位精度的 40MSPS 模数转换器 及 FEC 前向纠错解码器.该芯片支持 4~256QAM 多种模式,最高码率达 80Mbps,具有宽的载波频偏捕获范围.采 用改进的算法及 VLSI 实现结构,性能稳定,面积优化.采用 SMIC 0.25µm 1P5M 混合信号 CMOS 工艺制造,面积 为 3.5mm ×3.5mm,最大功耗为 447mW.

关键词: QAM 解调器; VLSI实现; 载波恢复; 盲均衡 EEACC: 1250; 1280; 2570D 中图分类号: TN432 文献标识码: A 文章编号: 0253-4177(2005)07-1309-08

<sup>\*</sup>上海市集成电路设计创新资助项目(批准号:047062008)

田骏骅 男,1978年出生,博士研究生,主要从事通信集成电路的算法与 VLSI 设计研究.

沈 泊 男,1975年出生,博士,主要从事专用集成电路与 SOC 芯片设计研究.

章倩苓 女,1936年出生,教授,博导,主要从事信号处理及专用集成电路设计.