# A direct digital frequency synthesizer with high-speed current-steering DAC\*

Yu Jinshan(余金山)<sup>1,2,3,†</sup>, Fu Dongbing(付东兵)<sup>1,2</sup>, Li Ruzhang(李儒章)<sup>1,2</sup>, Yao Yafeng(姚亚峰)<sup>1,2</sup>,

Yan Gang(严刚)<sup>1,2</sup>, Liu Jun(刘军)<sup>1,2</sup>, Zhang Ruitao(张瑞涛)<sup>1,2</sup>,

Yu Zhou(俞宙)<sup>1,2</sup>, and Li Tun(李暾)<sup>3</sup>

(1 National Laboratory of Analog IC's, Chongqing 400060, China)
(2 Sichuan Institute of Solid State Circuits, Chongqing 400060, China)
(3 School of Computer Science, National University of Defense Technology, Changsha 410073, China)

Abstract: A high-speed SiGe BiCMOS direct digital frequency synthesizer (DDS) is presented. The design integrates a high-speed digital DDS core, a high-speed differential current-steering mode 10-bit D/A converter, a serial/parallel interface, and clock control logic. The DDS design is processed in 0.35  $\mu$ m SiGe BiCMOS standard process technology and worked at 1 GHz system frequency. The measured results show that the DDS is capable of generating a frequency-agile analog output sine wave up to 400+ MHz.

**Key words:** DDS; CORDIC; DAC; current steering **DOI:** 10.1088/1674-4926/30/10/105006 **EEACC:** 2570

## 1. Introduction

Direct digital frequency synthesizers, commonly referred to as digital frequency synthesizer (DDS) or DDFS<sup>[1,2]</sup>, play an important role in many areas of digital electronics, such as in digital communications, electronic warfare and radar systems, hydrogen maser receivers, particle accelerators, test and measurement equipment, broadcasting, and medical equipment. The importance of DDS in practical applications is expected to continue growing since the need for higher performance circuits and products is inevitable in future generation products.

A possible DDS implementation<sup>[3]</sup> includes large lookup tables to store sine and cosine values. Another approach<sup>[4–7]</sup> is based on the CORDIC algorithm. In this case, an overflowing accumulator generates the angle, while a rotator using the CORDIC algorithm<sup>[6]</sup> implements angle rotation. In its standard implementation, the CORDIC algorithm accomplishes the required rotation as a sequence of subrotations, where the input of each subrotation stage depends on the output of the previous stage. In Refs. [8–10], a two-stage angle rotation architecture is presented. In the first stage, a coarse rotation of the input vector is performed with the help of a small ROM (storing a few sine/cosine values) and a complex multiplier. After the coarse stage, the residual rotation angle is very small, and its sine/cosine values can be approximated with a few terms of Taylor approximation. The second stage therefore performs the final rotation without requiring any lookup table. Both rotation stages are implemented by using small multipliers. The technique described in Ref. [11] also uses a coarsefine decomposition of the rotation stages. This approach is similar to the memoryless mixed-CORDIC architecture first proposed in Ref. [12].

In this paper, we present a high-speed SiGe BiCMOS direct digital frequency synthesizer, which integrates a DDS core based on the adjusted CORDIC algorithm<sup>[8]</sup>, a high-speed 10bit current-steering mode D/A converter, a serial/parallel interface, and a clock block. The DDS is processed in 0.35  $\mu$ m SiGe BiCMOS standard process technology and worked in 1 GHz system frequency. The measured results show that the DDS is capable of generating a frequency-agile analog output sine wave up to 400+ MHz.

## 2. DDS architecture

The DDS system as shown in Fig. 1 consists of a digital DDS core, serial and parallel interfaces, a 10-bit currentsteering digital-to-analog converter (DAC), and a clock block. The phase accumulator is an overflowing *M*-bit accumulator whose value specifies the instantaneous phase. The *M*-bit value may be truncated to another *N*-bit value, which is fed as the argument  $\theta$  to a phase-to-amplitude converter that computes the digital sin  $\theta$  and cos  $\theta$  values to a precision of *K* bits. The digital sine and cosine values are subsequently converted to their analog values by a DAC.

## 3. DDS core

## 3.1. Basic CORDIC algorith

To design a high-speed DDS core, we need to implement a high-speed phase to amplitude converter and parallel architecture. There mainly exist three kinds of approaches to implement phase to amplitude conversion: table-based, polynomialbased and CORIDC-based. The table-based method takes too

† Corresponding author. Email: yujinshan@yeah.net Received 17 March 2009, revised manuscript received 8 May 2009

<sup>\*</sup> Project supported by the National Natural Science Foundation of China (Nos. 60773025, 60906009) and the Program for Changjiang Scholars and Innovative Research Team in University.



Fig . 1. DDS architecture.

many resources on chip and the polynomial-based method is hard to implement by digital circuits. However, the CORDICbased method is easy to implement by circuits.

The CORDIC algorithm is used to compute the sine and cosine of an angle  $\theta$  specified in radians by iteration. For a given angle  $\theta$ , the computation of sin  $\theta$  and cos  $\theta$  can be viewed as the computation of the *X*-axis and *Y*-axis coordinates ( $X_{\theta}, Y_{\theta}$ ) of a point on the unit circle. This point can be located by rotating a phase counterclockwise from an initial position coincident with the *X*-axis through the angle  $\theta$ .

As shown in Fig. 2, the vector  $\overline{OB}$  can be given by rotating a phase  $\theta$  counterclockwise from the vector  $\overline{OA}$  and the relation between  $\overline{OA}$  and  $\overline{OB}$  is given by Eq. (1).

$$\begin{bmatrix} x_j \\ y_j \end{bmatrix} = \begin{bmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{bmatrix} \begin{bmatrix} x_i \\ y_i \end{bmatrix}$$
$$= \cos \theta \begin{bmatrix} 1 & -\tan \theta \\ \tan \theta & 1 \end{bmatrix} \begin{bmatrix} x_i \\ y_i \end{bmatrix}.$$
(1)

Assume the angle  $\theta$  is given by :

$$\theta = \delta_0 \theta_0 + \delta_1 \theta_1 + \dots + \delta_N \theta_N,$$

where  $\delta_n \in (-1, 1)$ ,  $n = 0, 1, \dots, N$ . The coordination of the vector  $\overline{OB}$  can be given by decomposing the rotation into a sequence of subrotations, as shown in following:

$$\begin{bmatrix} x_{j} \\ y_{j} \end{bmatrix} = \cos \theta_{N} \cos \theta_{N-1} \cdots \cos \theta_{0} \begin{bmatrix} 1 & -\tan \delta_{N} \theta_{N} \\ \tan \delta_{N} \theta_{N} & 1 \end{bmatrix}$$
$$\times \begin{bmatrix} 1 & -\tan \delta_{N-1} \theta_{N-1} \\ \tan \delta_{N-1} \theta_{N-1} & 1 \end{bmatrix} \cdots \begin{bmatrix} 1 & -\tan \delta_{0} \theta_{0} \\ \tan \delta_{0} \theta_{0} & 1 \end{bmatrix} \begin{bmatrix} x_{i} \\ y_{i} \end{bmatrix}$$
$$= K \begin{bmatrix} 1 & -\tan \delta_{N} \theta_{N} \\ \tan \delta_{N} \theta_{N} & 1 \end{bmatrix} \begin{bmatrix} 1 & -\tan \delta_{N-1} \theta_{N-1} \\ \tan \delta_{N-1} \theta_{N-1} & 1 \end{bmatrix} \cdots \begin{bmatrix} 1 & -\tan \delta_{0} \theta_{0} \\ \tan \delta_{0} \theta_{0} & 1 \end{bmatrix} \begin{bmatrix} x_{i} \\ y_{i} \end{bmatrix}, \qquad (2)$$

where  $K = \cos \theta_N \cos \theta_{N-1} \cdots \cos \theta_0$  is a scale factor.



Fig. 2. Vector rotation coordinates.

In the basic CORDIC algorithm,  $\theta_n = \arctan(2^{-n})$ , the same as  $\theta_0 = \arctan(1/2^\circ) = 45^\circ$ ,  $\theta_1 = \arctan(1/2) = 26.57^\circ$ ,  $\theta_2 = \arctan(1/4) = 14.04^\circ$ , and so on. The rotation can be rewritten as

$$\begin{bmatrix} x_{n+1} \\ y_{n+1} \end{bmatrix} = \cos \theta_n \begin{bmatrix} 1 & -\delta_n 2^{-n} \\ \delta_n 2^{-n} & 1 \end{bmatrix} \begin{bmatrix} x_n \\ y_n \end{bmatrix}.$$
 (3)

As shown in Eqs. (1) and (3), the multiplication by  $\tan \theta_k$ in Eq. (1) can be implemented as a simple shift-and-add operation, resulting in a multiplierless datapath. When the initial vector  $\overline{OA}$  is in superposition with the *X*-axis, and the coordination is  $(x_i, y_i) = (K, 0)$ , the coordination of the vector  $\overline{OB}$ is  $(x_j, y_j) = (\cos \theta, \sin \theta)$  by the sequence of the angle subrotations, which is the sine and cosine values of the input angle  $\theta$ .

### 3.2. Improved CORDIC algorithm

In the basic CORDIC algorithm, the direction of the N+1 time iteration to compute the sine and cosine values for the angle  $\theta$  is determined by computation of the residual angle of the N time iteration, which limits the speed of the algorithm. Here an improved CORDIC algorithm is presented, where the rotation direction is determined by the binary bit value of the input angle. In this way, the computation speed of the algorithm is improved and the circuit is scaled. The basic principle of the improved algorithm is shown as follows.

Consider an arbitrary positive angle  $\theta$  in  $[0, \pi/4]$ , which

can be represented as  $\theta = \sum_{k=1}^{N} b_k 2^{-k}$ , where  $b_k \in \{0, 1\}$  are the bits corresponding to the (N+1)-bit fractional binary representation of the angle  $\theta$ . Recording  $r_k = 2b_{k-1}$ , the angle  $\theta$  can be represented as:

$$\theta = \sum_{k=1}^{N} b_k 2^{-k} = \varphi_0 + \sum_{k=2}^{N+1} r_k 2^{-k}, \tag{4}$$

where  $\varphi_0 = \sum_{k=2}^{N+1} 2^{-k} = 1/2 - 1/2^{N+1}$  is a constant and  $r_k \in \{-1, 1\}, k = 2, 3, \dots, N+1.$ 

Unlike the basic CORDIC algorithm<sup>[1]</sup>, the rotation begins with  $\varphi_0$  followed by the sequence of  $2^{-2}$ ,  $2^{-3}$ , ...,  $2^{-k}$ , and the rotation direction is determined by the binary bit value of the input angle  $\theta$ . A bit  $b_k = 1$  corresponds to a positive (or counterclockwise) rotation by  $2^{-k-1}$  rad, and a bit  $b_k = 0$  corresponds to a negative (or clockwise) rotation by  $2^{-k-1}$  rad. It results in the sine and cosine values for the angle  $\theta$ :

$$\begin{cases} x_{k+1} = x_k - r_k \tan(2^{-k})y_k, \\ y_{k+1} = y_k + r_k \tan(2^{-k})x_k. \end{cases}$$
(5)

One of the major benefits of recording over CORDIC is that the direction of rotation at each stage is immediately obtained from binary representation of the angle  $\theta$ , thereby eliminating the need to compare angles at each stage. However, it is necessary to implement the multiplier for tan(2<sup>-k</sup>) in Eq. (5) and expand tan(2<sup>-k</sup>) by the Taylor series as follows:

$$\tan(2^{-k}) = 2^{-k} + \frac{1}{3} \times 2^{-3k} + \frac{2}{15} \times 2^{-5k} + \cdots .$$
 (6)

Since  $2^{-k}$  is a power of two, the implementation of  $\tan(2^{-k})$  can be simplified by employing the approximation  $\tan(2^{-k}) \approx 2^{-k}$  (for  $k \ge N/3$ ). This approximation causes no loss of accuracy in the  $\tan(2^{-k})$  representation so long as  $2^{-k}$  is sufficiently small that the difference  $(2^{-k} - \tan 2^{-k})$  is smaller than the finite-precision limits of the quantified error. For k < N/3, we can compute the initial angle  $\phi_0$ . Thus, the rotation after recording can be represented as a fixed initial rotation  $\phi_0$  followed by a sequence  $2^{-2}$ ,  $2^{-3}$ ,  $2^{-4}$ ,  $2^{-5}$ , etc. The results are saved in ROM. The next rotation is dependent on the result of direct addressing on the ROM with high bits.

For  $2^{-k}$  ( $k \ge N/2$ ), we can also obtain the approximations  $\cos(2^{-k}) \approx 1$ ,  $\sin(2^{-k}) \approx 2^{-k}$ . The approximation error is smaller than the quantified error of the data limit resolution. By approximation for Eq. (1), we can obtain the following equations:

$$x_j = x_i - r_k 2^{-k} y_i, \quad y_j = y_i + r_k 2^{-k} x_i.$$
 (7)

Now, we can simplify the all *m* rotations as follows for k > N/2:

$$\begin{cases} x_{k+m} = x_k - y_k \sum_{i=k}^{k+m-1} r_i 2^{-i}, \\ y_{k+m} = y_k + x_k \sum_{i=k}^{k+m-1} r_i 2^{-i}. \end{cases}$$
(8)



Fig. 3. Parallel architecture.

According to the above result, the improved CORDIC algorithm divides the rotation into three phases. At the beginning, the first N/3 rotations are obtained by looking for the small ROM table. Then, the next N/3 rotations are obtained by the shift-and-add butterfly. Lastly, the leave rotation is completed by Eq. (8). By the improved algorithm, we can improve the frequency, save area, and decrease the delay between the input and output. Moreover, we can change the time of the rotation, the width of immediate data path, the width of the input angle, and obtain the necessary wave under a specific resolution and frequency.

#### 3.3. Circuit implementation

In order to achieve a 1 GHz sample rate under 0.35  $\mu$ m SiGe BiCMOS technology, the digital circuit of the DDS adapts parallel and pipeline technology. After designing one datapath accumulator, 4-datapath cross phase output is produced by employing the phase offset. With the 4-datapath CORDIC module, we implement the parallel 4 signal datapaths. The architecture is shown in Fig. 3.

The parallel design and sharing of the accumulator support the operation frequency decrease to 250 MHz and simplify circuit complexity and power.

### 4. High-speed current-steering DAC

### 4.1. DAC architecture

The supply voltage of the 10-bit 1GSPS differential current steering mode D/A converter is 3.3 V. The converter has an 80-channel CMOS digital input signal and a 2-channel complementary differential analog current output signal, and has a full-scale output current of 20 mA. Figure 4 shows a block diagram of the 10-bit 1GSPS D/A converter principle. As shown in Fig. 4, the 10-bit 1GSPS D/A converter consists of a time division multiplexer, a 5-31 "thermometer" decoder, a delay circuit, a master-slave latch array, a current switch array, a current source array, a time schedule controller, a bandgap reference, and other modules. The operational principle of the converter is as follows. The 10-bit 8-channel (a total of 80 channels) CMOS digital input signal enters the time division multiplexer under control of the system clock (125 MHz), generates a 10-bit data signal by synthesis, is transmitted to the high 5-bit 5-31 "thermometer" decoder under the control of the primary frequency clock (1 GHz), eliminating "code-dependent" ripple, and the low 5-bit equivalent delay





Fig. 5. Block diagram of the time division multiplexing unit.

circuit, respectively, and is converted into a 36-channel digital signal. The converted signal is synchronously sent from the master-slave latch array to the following current switch array and current source array, and generates an analog current complementary output signal. The on-chip bandgap reference facilitates a constant current source generating the reference current. The digital/analog conversion function with a resolution of 10 bits is achieved at a clock frequency of 1 GHz.

### 4.2. Time division multiplexer

The data input signal of a very high speed D/A converter is usually a CMOS signal provided by the DDS core in the preceding stage. At present, the transmission frequency of the CMOS signal normally is about 200 MHz. Thus, the signal must be synthesized by the time division multiplexer, and then processed by the very high speed D/A converter at a clock frequency of 1 GHz. The 10-bit 8-channel time division multiplexer consists of 10 time division multiplexing units, and a block diagram of the 10-bit 8-channel time division multiplexer principle is shown in Fig. 5.

The operational principle of the time division multiplexing units is as follows. The CMOS/differential ECL converter converts an 8-channel CMOS data input signal into a highfrequency anti-jamming internal differential ECL signal. The signal enters the time division multiplexing core via the 1st and 2nd-stage master-slave latches. In order to eliminate transmission delay in the system data signal, the first-stage latch is synchronized by the system clock, while the second-stage latch is synchronized by the on-chip clock. The time division multiplexing core consists of 7 gates that are divided into 3 stages. The gating order of the 8-channel signal is 8-7-6-5-4-3-2-1. To reduce mismatch of the signal, the layout is arranged in a cross-symmetric way of 1-5-3-7-2-6-4-8. The clock frequencies of the 3-stage gates are, respectively, main frequency/8, main frequency/4, and main frequency/2 from front to end. The 3-channel clock delay to transmission data is designed accurately, and also undefined states generated during logic combination of the 8-channel signal in the time division multiplexing core are filtered out in order to reduce the noise of the whole circuit and to avoid error coding. Finally, the 8channel main frequency/8 CMOS signal is synthesized into a 1-channel main frequency differential ECL signal, and is output at the master-slave latch/buffer in order to drive the following stage.



Fig. 6. Block diagram of the 5-31 "thermometer" decoder.

## 4.3. 5-31 "thermometer" decoder

A good trade-off is made by segment structure between the reduction of ripple energy, differential nonlinearity, the increase in complexity of the decoding logic, and the total layout area. Therefore, the very high speed D/A converter is designed to be the segment structure with a high 5-bit 5-31 "thermometer" decoder but low 5-bit binary weight, thus eliminating 96.9% "code dependent" ripple and greatly improving the dynamic performance of the whole circuit.

A block diagram of 5-31 "thermometer" decoder is shown in Fig. 6. During 5-31 "thermometer" decoding, the turnover frequencies of input bits increase in the order of high to low bits. In order to design simple encoding logic and to obtain a very high speed data transmission rate, high 5-bit input data are designed to be a D5-D6 bit 2-4 decoder and a D7-D9 bit 3-8 decoder. Finally, a decoder of 32 channels generates the 31-channel decoding output data needed at the following stage.

The logic of the 5-31 "thermometer" decoder is achieved by 112 NAND gates that are divided into 4 stages (3 stages for D5 and D6 bits with high turnover frequency, 4 stages for D7, D8 and D9 bits). 1 bit differential ECL data output of the time division multiplexer in the preceding stage needs to drive 6-8 NAND gates in the 5-31 "thermometer" decoder simultaneously. Moreover, it is a long-line drive. Therefore, the driving capacity of the master-slave latch/buffer in the time division multiplexer and the design of signal long-line linewidth and spacing (determining the parasitic capacitance of the signal line) become key to achieving high performances of the very high-speed D/A converter, and post-simulation of the layout design must be done to optimize the layout. In design of circuit configuration, the master-slave latch array is introduced in the mid-segment of the decoding path (i.e. between NAND gates in the preceding and following stages) in order to add time schedule synchronization, improving the time schedule control of the 31-channel decoding data signal. In the layout design, cross-symmetry of the 31-channel decoding data signal of the decoder is arranged, and is realized by the NAND gate array with the same structure in order to eliminate mismatch between signals and to improve circuit resolution.

#### 4.4. Master-slave latch array

Under the control of the main frequency clock, the



Fig. 7. Schematic diagram of current steering switch array.

master-slave latch array sends a 36-channel digital signal in the preceding stage synchronously to the current steering switch array in the following stage to do the digital/analog conversion. The array consists of 36 master-slave latches, and has a large layout area. In order to drive the array effectively and synchronously, design of the clock tree is used. The number of clock buffers is defined by 2n (16) for symmetrical layout of the clock tree. The path for the main clock to reach different master-slave latches is designed by post-simulation of the layout in order to have equal parasitic effects on the precondition that the distance in physical paths is equal. With the design of above 2 key points, the goal of driving the clock synchronously is achieved.

#### 4.5. Current steering switch array

The current steering switch array consists of 36 current steering switches. As a key unit for digital/analog conversion, the array converts the digital voltage input signal into an analog current output signal. A schematic diagram of the current steering switch array is shown in Fig. 7.

The current steering switch array uses a fully differential symmetric structure. Q1 and Q2 are the main differential pairs, two branches Q3, Q5, R1 and Q4, Q6, R2 provide commonmode biasing points for Q1 and Q2, respectively. N groups of the same pull-up PMOS transistor branches are designed at Q3 and Q4 bases. By gating 1-N PMOS transistors and optimizing common-mode biasing points for Q1 and Q2, the antijamming capacity of the current steering switch array can be improved. Q7 and Q9 are, respectively, pull-down NPN transistors at Q3 and Q4 bases. If the inverted data input signal is 1, then Q9 is on, current at Q3 base is extracted, Q3 is off, there is no current input at Q1 base; at the mean time, as Q7 is on, current at Q1 base is extracted. Due to the simultaneous action of the above two factors, Q1 off is accelerated, in-phase current output signal is 0. The structure improves the conversion rate of differential pairs, and is of key importance in the circuit achieving high-frequency performances. The operational principle for Q8 and Q10 is the same as Q7 and Q9. A small capacitor C1 is connected between Q1 and Q2 in order to eliminate surge and oscillation in the process of differential switch

| 0 | В | F | J   | 12     | 10      | 6       | 2       |
|---|---|---|-----|--------|---------|---------|---------|
| D | М | Н | -L- | 8      | 1       | 14      | 4       |
| K | Ι | A | N   | 5<br>3 | 15<br>7 | 0<br>11 | 9<br>13 |
| G | С | Р | Е   |        |         |         |         |

Fig. 8. Switching sequence of a  $16 \times 16$  array.

conversion. If the capacitance is too large, then settling time of the conversion signal would be reduced. Therefore the capacitance can be optimized by post-simulation.

#### 4.6. Current steering switch array

Matching accuracy of the constant current source is a bottleneck in the design for the resolution of the very high-speed D/A converter, dependent on random mismatching and system mismatching. Random mismatching is determined by the intrinsic matching property of the process, and the most effective way of reducing random mismatching is to increase the area of constant current source. However, to increase the resolution of the D/A converter by 1 bit means increasing the area of the constant current source by 3 times, which increases system mismatching of the constant current source array. In this study, compensation technology of quad quadrant 2-order system error is used to compensate for system mismatching caused by processing gradient error (1-order error) and temperature and electrical gradient (2-order error). The technology optimizes the "thermometer" decoder output and the interconnecting sequence between the switching controls in the constant current source array. If the input data of the "thermometer" decoder increase gradually, then compensation output data would switch on accumulated system mismatching in order. Figure 8 shows the switching sequence of a  $16 \times 16 \operatorname{array}^{[13, 14]}$ .

As shown in Fig. 5, 16 constant current sources 0 to which X1 corresponds are distributed in A-P (16 big modules). The distribution eliminates 2-order error. In each big module, the constant current sources to which X1-X16 correspond are arranged by common-center symmetry in an emissive way. The distribution in each module eliminates 1-order error.

In the circuit design, a constant current source array is divided into four quadrants, and each quadrant is further divided into 8 parts; there are 32 modules in all these 32 constant current sources to which each output in the 5-31 "thermometer" decoder corresponds which are distributed in each large module in the way of 2-order system error compensation; each large module consists of 34 constant current sources in all, 31 for high 5 bits, 1 for low 5 bits, and 2 for the reference constant current source. The low 5-bit constant current source and reference constant current sources are placed in the center of the module.



Fig. 9. Microphotograph of the DDS die.



Fig. 10. Evaluation board of the DDS.

### 5. Serial/parallel interface

The transfer of data from the user to the DDS core is by the serial/parallel interface, which is a 2-step process. In the write operation, the user first writes the data to the I/O buffer using either the parallel port or the serial interface. In the update operation, the DDS core's register memory is updated by the data.

## 6. Measured results

The high-speed DDS is designed and processed in double well 2P4M 0.35- $\mu$ m SiGe BiCMOS. The die of the high-speed DDS was obtained by processing. The area of the die is 5.6 × 5.6 mm<sup>2</sup>, as shown in Fig. 9.

The evaluation board of the DDS is shown in Fig. 10, which incorporates the DDS, the wave output ports, and the clock generator. The DDS is clocked and the output is taken from the DAC.

By the serial/parallel interface, the frequency control words are written into the control registers of DDS to evaluate the function of the single-tone mode, frequency-sweep mode and phase shift.

The sample rate is 1GSPS, supply voltage is 3.3 V, and power current is 365 mA. The tested electrical parameter results of the DDS are as follows.

(1) Wide-band SFDR DC to Nyquist

The measured results for wide-band SFDR DC to Nyquist are shown in Table 1.

(2) Narrow-band SFDR

The measured results for narrow-band SFDR are shown in Table 2.

| Table 1. Wide-band SFDR DC to Nyquist. |            |  |  |  |
|----------------------------------------|------------|--|--|--|
| Frequency (MHz)                        | SFDR (dBc) |  |  |  |
| $F_{\rm out} = 40$                     | 56.9       |  |  |  |
| $F_{\rm out} = 100$                    | 57.4       |  |  |  |
| $F_{\rm out} = 180$                    | 53.6       |  |  |  |
| $F_{\rm out} = 360$                    | 46.2       |  |  |  |

Table 2. Narrow-band SFDR.

51.5

*F*<sub>out</sub> = 180, 700 REFCLK

| Frequency (MHz)                                  | SFDR (dBc) |
|--------------------------------------------------|------------|
| $F_{\text{out}} = 40 \pm 15$                     | 86.2       |
| $F_{\rm out} = 40 \pm 1$                         | 86.9       |
| $F_{\rm out} = 40 \pm 50$                        | 92.5       |
| $F_{\rm out} = 100 \pm 15$                       | 82.7       |
| $F_{\rm out} = 100 \pm 1$                        | 88.5       |
| $F_{\rm out} = 100 \pm 50$                       | 92.2       |
| $F_{\rm out} = 180 \pm 15$                       | 82.7       |
| $F_{\rm out} = 180 \pm 1$                        | 88.3       |
| $F_{\rm out} = 180 \pm 50$                       | 89.1       |
| $F_{\rm out} = 360 \pm 15$                       | 83.9       |
| $F_{\rm out} = 360 \pm 1$                        | 88.4       |
| $F_{\rm out} = 360 \pm 50$                       | 91.8       |
| $F_{\rm out} = 180 \pm 15\ 700\ \rm REFCLK$      | 60.9       |
| $F_{\text{out}} = 180 \pm 1\ 700\ \text{REFCLK}$ | 90.2       |
| $F_{\rm out} = 180 \pm 50\ 700\ \rm REFCLK$      | 88.5       |

Table 3. Output phase noise characteristics.

| Frequency                                   | Phase noise (dBc/Hz) |
|---------------------------------------------|----------------------|
| @103 MHz $I_{OUT}$ , @10 kHz offset         | -146                 |
| @103 MHz $I_{OUT}$ , @100 kHz offset        | -157                 |
| @403 MHz $I_{OUT}$ , @10 kHz offset         | -132                 |
| @403 MHz I <sub>OUT</sub> , @100 kHz offset | -143                 |
| @100 MHz $I_{OUT}$ with 700 MHz REF-        | -138                 |
| CLK, @10 kHz offset                         |                      |
| @100 MHz $I_{OUT}$ with 700 MHz REF-        | -153                 |
| CLK, @100 kHz offset                        |                      |
| @100 MHz $I_{OUT}$ with 700 MHz REF-        | -159                 |
| CLK, @1 MHz offset                          |                      |
| @100 MHz $I_{OUT}$ with 700 MHz REF-        | -160                 |
| CLK, @10 MHz offset                         |                      |

### (3) Output phase noise characteristics

The measured output phase noise characteristics are shown in Table 3.

The wideband SFDR is -46.2 dB and narrowband SFDR is 83.9 dB (at a clock frequency of 1 GHz and an output sinusoidal wave frequency of 360 MHz) as shown in Figs. 11 and 12.

In this DDS design, the worst spurs in the output spectrum are due to the truncation of the phase word from the accumulator to the 10 bit sine-weighted DAC.





Fig. 12. Narrow-band SFDR, 360 MHz F<sub>OUT</sub>, 30 MHz BW.

## 7. Conclusions

In this paper, we have proposed a high-speed SiGe BiC-MOS direct digital frequency synthesizer design, which integrates a DDS core, a high performance current-steering DAC, a serial and parallel interface circuit, and a high speed clock generator. The chip is processed under 0.35  $\mu$ m SiGe BiC-MOS technology. The test result shows that the DDS design can generate a frequency-agile analog output sine wave at up to 400+ MHz.

## References

- Kang C Y. CORDIC-based high-speed direct digital frequency synthesis. University of Texas at Austin, 2003: 39
- [2] De Caro D, Petra N, Strollo A G M. A 380 MHz direct digital synthesizer/mixer with hybrid CORDIC architecture in 0.25 μm CMOS. IEEE J Solid-State Circuits, 2007, 42(1): 151
- [3] Vankka J. Methods of mapping from phase to sine amplitude in direct digital synthesis. IEEE Trans Ultrason, Ferroelectr, Freq Control, 1997, 44(2): 526
- [4] Gielis G, van de Plassche R, van Valburg J. A 540-MHz 10-b polar-to-Cartesian converter. IEEE J Solid-State Circuits, 1991, 26(11): 1645
- [5] Ahn Y, Nahm S, Sung W. VLSI design of a CORDIC-based derotator. Proc IEEE ISCAS, 1998, 2: 449

- [6] Volder J E. The CORDIC trigonometric computing technique. IRE Trans Electron Comput, 1959, EC-8(3): 330
- [7] De Caro D, Petra N, Strollo A G M. Digital synthesizer/mixer with hybrid CORDIC–multiplier architecture: error analysis and optimization. IEEE Trans Circuits Syst I: Regular Papers, 2009, 56(2): 364
- [8] Fu D, Willson A N. A high-speed processor for digital sine/cosine generation and angle rotation. Proc 32nd Asilomar Conf Signal Syst Comput, 1998, 1: 177
- [9] Torosyan A, Fu D, Willson A N. A 300 quadrature MHz direct digital synthesizer/mixer in 0.25 μm CMOS. IEEE J Solid-State Circuits, 2003, 38(6): 875
- [10] Song Y, Kim B. A quadrature digital synthesizer/mixer architecture using fine/coarse coordinate rotation to achieve 14-b in-

put, 15-b output, and 100-dBc SFDR. IEEE J Solid-State Circuits, 2004, 39(11): 1853

- [11] Curticapean F, Niittylahti J. An improved digital quadrature frequency down-converter architecture. Proc 35th Asilomar Conf Signals, Syst Comput, 2001: 1318
- [12] Ahmed H M. Efficient elementary function generation with multipliers. Proc 9th Symp Comput Arithmetic, 1989: 52
- [13] Bastos J, Marques A M, Steyaert M S J, et al. A 12-bit intrinsic accuracy high-speed CMOS DAC. IEEE J Solid-State Circuits, 1998, 33: 1959
- [14] Van der Plas, Vandenbussche J, Sansen W, et al. A 14-bit intrinsic accuracy Q random walk CMOS D G AC. IEEE J Solid-State Circuits, 1999, 34: 1708