# High-Speed, Robust CMOS Dynamic Circuit Design Lai Lianzhang<sup>†</sup>, Tang Tingao, and Lin Yinyin (State Key Laboratory of ASIC & System, Fudan University, Shanghai 200433, China) **Abstract:** A novel circuit with a narrow pulse driving structure is proposed for enhancing the noise immunity and improving the performance of wide fan-in dynamic circuits. Also, an analytical mode that agrees well with simulations is presented for transistor sizing. Simulation results show that an improvement of up to 12% over the conventional technique at 1GHz is obtained with this circuit, which can run 1.6 times faster than the existing technique with the same noise immunity. Key words: domino circuit; noise immunity; high-speed; keeper; narrow pulse **EEACC:** 1265B **CLC number:** TN431.2 **Document code:** A **Article ID:** 0253-4177(2006)06-1006-06 ## 1 Introduction A domino circuit is a particular type of dynamic circuit that consists of an n-type dynamic logic block followed by a static inverter[1]. The static inverter is added in order to correct the malfunction that occurs when the dynamic gates are cascaded in a line. Generally, there are two kinds of topologies for dynamic circuits, including the conventional style with a footer transistor (Mf in Fig. 1) and a footless style with no footer transistor<sup>[2]</sup>. A footless domino with the constraint that the input signals must be grounded in the precharge phase is much faster than a footed domino. Domino circuits are widely used in high-performance VLSI for the implementation of functional units, such as MUXes, comparators, and high-speed adders[3], due to their compact design in a single gate and their high-speed operation capability. However, they are vulnerable to noise<sup>[4]</sup>, especially in the wide fan-in and deep sub-micron designs. Many techniques have been proposed to improve their noise immunity. The feedback keeper technique[5] uses a pMOS transistor as a keeper to maintain the charge at the dynamic node. The keeper must be upsized to compensate for large leakage current, but unfortunately a large keeper leads to serious speed degradation and high power consumption. This is referred to as "contention problem". To solve this problem, a conditional keeper Fig. 1 Standard domino with footer Mf structure has been proposed<sup>[6]</sup>, in which two keepers are employed and can be conditionally activated by different input combinations. However, this technique still has the contention problem in the transition window. In this paper, a novel circuit for completely eliminating the contention problem is proposed, and an analytical model is presented to demonstrate the characteristics of this circuit. # 2 Circuit theory and implementation Figure 2 shows the basic structure of the proposed circuit. An output circuit consisting of transistors Mp2 and Mn2 is used, and it is driven by a narrow pulse generator (NPG) circuit which has two inputs; one is the dynamic node G, and the other is the clock signal. When the dynamic node G is pulled down by input signals, the NPG will generate a narrow pulse at the onset of the evaluation <sup>†</sup> Corresponding author. Email: 032021085@ fudan. edu. cn Received 2 November 2005, revised manuscript received 28 February 2006 phase to drive the output circuit. The NPG is delay-sensitive, and the amplitude of the narrow pulse is controlled by the pull-down speed of the node G. A slow pulling down of the dynamic node G will generate a small and narrow pulse, which will never pull down the output circuit. This is the reason why the proposed circuit can tolerate input noise. Fig. 2 A circuit with narrow pulse generator Figure 3 shows the circuit implementation. The NPG, which consists of transistors Mpc, Mc, Mn and a delay element, is utilized to transfer the evaluated results from dynamic node G to F. The narrow pulse is generated with the help of the delay element. When the clock goes high, node P is still in the low logic level because the "1" has not been propagated to it yet. Therefore the transistor Mpc is on. If dynamic node G is pulled down at the same time, the transistor Mc will also be on and the node U will be charged until a "1" appears at the node P to turn off the transistor Mpc and turn on the transistor Mn. Thus a narrow pulse is generated at the node U within the time window formed by the delay element. It must be noted that the output node F is a "floating" node at which charge will leak via the sub-threshold leakage of the transistor Mn2. Fig. 3 Implementation of Fig. 2 To avoid sub-threshold noise, a keeper is added as shown in Fig. 4(PK), which does not induce contention because it is only activated when the dynamic node F remains high. Figure 5 shows the simulated results of an 8-bit OR gate in a 0.18 $\mu m$ CMOS process at 1.8 V/55 $^{\circ}{\rm C}$ . It can be seen from the simulations that a time window Td determined by the delay element is given for the generation of the narrow pulse before the voltage at the node P goes high. The amplitude of the narrow pulses is controlled by the input signals. A related small input signal (e. g. 600 mV) will generate a narrow pulse with small amplitude that will never pull down the dynamic node F. Fig. 4 Improved version of Fig. 3 Fig. 5 Simulated waveforms of the nodes U and F in the evaluation phase, sweeping input voltages from 580 to 700mV Although the NPG and the output circuit lead to additional delay, a great speed improvement is still observed because there is no contention problem in the circuit and the parasitic capacitance at the node G is much smaller than that of the keeper structures. # 3 Analytical model In this section, an analytical model is presented to reveal the relationship between the generated narrow pulses and the input signals. The charging course of the node U in the evaluation phase can be simplified as shown in the macro model in Fig. 6. The transistors Meffn and Meffp are equivalent transistors for the pull-down network and pMOS transistors Mpc and Mc, respectively. The capacitors $C_{\rm g}$ , $C_{\rm u}$ , and $C_{\rm f}$ are parasitic capacitors at the nodes G, U, and F, respectively. We assume that the capacitor $C_{\rm g}$ has been charged to high and $C_{\rm u}$ has been discharged to low previously. Fig. 6 Macro model of the proposed circuit First, we consider Loop A when $V_{\rm in}$ begins to pull down the node G. Considering that $V_{\rm in}$ – $V_{\rm g}$ (t) $\leq$ $V_{\rm thn}$ , transistor Meffn is in the saturation region. We obtain $$i_1 = \frac{1}{2} \beta_{\text{effn}} (V_{\text{in}} - V_{\text{thn}})^2$$ (1) for the equivalent pull-down transistor, and $$i_1 = -C_g \frac{\mathrm{d}V_g(t)}{\mathrm{d}t} \tag{2}$$ for capacitor $C_{\rm g}$ , where $V_{\rm g}$ (t) is the voltage at the node G, $V_{\rm thn}$ is the equivalent threshold voltage of the pull-down network, and $\beta_{\rm effn}$ is the equivalent transconductance. Equating (1) and (2) yields $$\frac{V_{\rm g}(t)}{\mathrm{d}t} = -\frac{\beta_{\rm effn}(V_{\rm in} - V_{\rm thn})^2}{2C_{\rm g}} \tag{3}$$ Solving Eq. (3) and using the initial condition $V_{\rm g}(0) = V_{\rm dd}$ , we obtain $$V_{\rm g}(t) = -\frac{1}{2} \times \frac{\beta_{\rm effn}}{C_{\rm g}} (V_{\rm in} - V_{\rm thn})^2 t + V_{\rm dd}$$ (4) Second, we consider Loop B when Mc is turned on to charge capacitor $C_{\rm u}$ . If $$\begin{cases} V_{\mathrm{u}}(t) - V_{\mathrm{g}}(t) \leqslant |V_{\mathrm{thp}}| \\ V_{\mathrm{dd}} - V_{\mathrm{g}}(t) \geqslant |V_{\mathrm{thp}}| \end{cases} \text{ or }$$ $$V_{\mathrm{u}}(t) - |V_{\mathrm{thp}}| \leqslant V_{\mathrm{g}}(t) \leqslant V_{\mathrm{dd}} - |V_{\mathrm{thp}}|$$ are satisfied, we also have $$i_{2} = \frac{1}{2} \beta_{\text{effp}} [V_{\text{dd}} - V_{\text{g}}(t) - |V_{\text{thp}}|]^{2}$$ $$= C_{\text{u}} \frac{dV_{\text{u}}(t)}{dt}$$ (5) where $V_{\rm thp}$ and $\beta_{\rm effp}$ are the equivalent threshold voltage and the equivalent transconductance of pMOS transistors Mpc and Mc, respectively, and $V_{\rm H}(t)$ is the voltage at the node U. We assume that the charging time of the node U equals the delay time $(\tau)$ from V(G) to V(P). Integrating Eq. (5) between zero and $\tau$ , and substituting $V_{\rm g}(t)$ with Eq. (4), we obtain the peak voltage at the node U, $$\begin{split} V_{\text{peak}} &= \frac{1}{2} \times \frac{\beta_{\text{effp}}}{C_{\text{u}}} \int_{0}^{\tau} \left[ V_{\text{dd}} - V_{\text{g}}(t) - |V_{\text{thp}}| \right]^{2} dt \\ &= A (V_{\text{in}} - V_{\text{thn}})^{4} - B (V_{\text{in}} - V_{\text{thn}})^{2} + C = Y \end{split}$$ (6) where $$A = \frac{\beta_{\text{effp}} \tau^3}{24 C_{\text{u}}} (\frac{\beta_{\text{effn}}}{C_{\text{g}}})^2$$ (7) $$B = \frac{\beta_{\text{effn}}\beta_{\text{effp}} |V_{\text{thp}}| \tau^2}{4C_{\text{u}}C_{\text{g}}}$$ (8) and $$C = \frac{\beta_{\text{effp}} |V_{\text{thp}}|^2 \tau}{2C_{\text{H}}}$$ (9) The curve expressed by Eq. (6) is a quadratic function of $(V_{\rm in}-V_{\rm thn})^2$ with a minimum at coordinates $(X_0,Y_{\rm min})=(\frac{B}{2A},C-\frac{B^2}{4A})$ , when $B^2 < 4AC$ . Substituting A, B, and C, we obtain $$X_0 = \frac{3C_g |V_{\text{thp}}|}{\tau \beta_{\text{effn}}}, Y_{\text{min}} = \frac{\beta_{\text{effp}} \tau |V_{\text{thp}}|^2}{8C_u} \quad (10)$$ where $Y_{\min}$ , $X_0$ are the minimum amplitude of the narrow pulse that the circuit can generate and the corresponding X variable, respectively. $Y_{\min}$ must be set to a small value and $X_0$ must be set to a relatively large value to enhance the noise immunity. In other words, a large input signal generates a small and narrow pulse. This is why the proposed circuit has good noise immunity. According to Eq. (10), the delay time part of the delay element $(\tau)$ should be kept small to enhance the noise immunity. However, small $Y_{\min}$ and large $X_0$ (indicating small $\tau$ , $\beta_{\text{effp}}$ , and $\beta_{\text{effn}}$ ) will sacrifice speed. Therefore, the parameters should be carefully regulated for a given design specification. Equation (10) gives a way to size the transistor and to deal with the trade-off between noise immunity and operation speed. ## 4 Simulation and comparison In this section, we will present the simulations performed in a $0.18\mu m$ CMOS process at $1.8V/55^{\circ}C$ to verify the model. In addition, 16 fan-in OR gates are implemented in the feedback keeper technique (FK), the conditional keeper technique(CK), and the proposed circuit (Fig. 4) for performance comparison. Figure 7 shows the simulated curve of $V_{\text{peak}}$ versus $V_{\rm in}$ for the proposed circuit, which is separated into two sections by the curve $V_{\text{peak}} = V_{\text{in}}$ . The part below the curve is the noise-insensitive section, and the part above it is the noise-sensitive section. The area enclosed by the noise-insensitive section and the curve $V_{\text{peak}} = V_{\text{in}}$ represents the ability of the circuit to tolerate noise disturbance. A larger area corresponds to better noise immunity. Therefore, moving the curve towards the right side and the bottom (decreasing $Y_{\min}$ and increasing $X_0$ ) is one way to enhance the noise immunity. The noise-insensitive section of the curve is shown in the inset of the figure with abscissa changed to $(V_{in} - V_{thn})^2$ . The fitting result of the noise-insensitive section is $$V_{\text{peak}} = 1876.76557 (V_{\text{in}} - V_{\text{thn}})^4 - 11.5222 (V_{\text{in}} - V_{\text{thn}})^2 + 0.05271$$ (11) which agrees well with Eq. (6) and confirms the Fig. 7 Simulation results for peak voltage of narrow pulses versus input voltages ( $V_{\rm thn} \approx 0.35 { m V}$ ) In this paper, we use unity noise gain $(UNG)^{[7]}$ to characterize the noise immunity. The UNG is defined as the amplitude of the input signals that causes the same amplitude at the output when all inputs are tied together. UNG = { $V_{\text{noise}}$ ; $V_{\text{noise}} = V_{\text{out}}$ } Table 1 shows the UNG comparisons for 16 fan-in OR gates implemented in the FK structure (Fig. 8), the CK structure (Fig. 9), and the proposed circuit(Fig. 4). It can be seen that the proposed circuit improves the UNG by 8.8% and 6.1% compared to FK and CK structures, respectively, when the signal duration is fixed at 300ps at the same delay. When the duration is increased to 500ps(i.e.1GHz), the improvement is 12% and 8%, respectively. The proposed circuit has better noise immunity at the same operation speed. Table 1 Noise-immunity comparisons at the same operation speed for 16 fan-in OR gates implemented in feedback keeper, conditional keeper, and our proposal | $0.18 \mu m$ CMOS at $1.8 V/55$ °C | FK<br>structure | CK<br>structure | The proposed structure | |------------------------------------------------------------------|-----------------|-----------------|----------------------------------------------------------| | UNG/mV<br>(Worst-case delay = 115ps,<br>Signal duration = 300ps) | 476 | 488 | 518<br>(Improved by 8.8%<br>over FK,by<br>6.1% over CK) | | UNG/mV<br>(Worst-case delay = 119ps,<br>Signal duration = 400ps) | 475 | 492 | 528<br>(Improved by 11.2%<br>over FK,by<br>7.3% over CK) | | UNG/mV<br>(Worst-case delay = 127ps,<br>Signal duration = 500ps) | 482 | 500 | 540<br>(Improved by 12.0%<br>over FK,by<br>8.0% over CK) | Fig. 8 16 fan-in OR gate implemented in feedback keeper structure Table 2 shows the worst-case delay at the same noise immunity at 1GHz. The FK, the CK, and the proposed circuit are optimized for performance. According to Ref. [6], the relations $T_{\text{keeper}} > 1.2 T_{\text{max}}$ and the size of transistor PK2 $\gg$ the size of transistor PK1 must be satisfied for the CK technique in order to get the optimized results. The table shows that in the condition of high UNG, speed improvements of $1.6 \times$ and $1.4 \times$ are Fig. 9 16 fan-in OR gate implemented in conditional keeper structure(proposed in Ref. [6]) obtained compared to the FK and CK techniques, respectively, by the proposed circuit. However, in the condition of low UNG, the proposed circuit has the same delay as the CK technique. However, the delay of the proposed circuit is basically stable when the UNG increases, whereas it increases dramatically in the cases of the CK and FK techniques. The simulation results show that in the case of high UNG requirement applications, the proposed circuit has good noise immunity as well as good performance in terms of operation speed. Table 2 Worst-case delay comparisons at the same UNG(@1GHz) | 0.18μm CMOS | FK | CK | The proposed | |---------------------------------------|-----------|-----------|--------------| | at 1.8V/55℃ | structure | structure | structure | | Worst-case delay/ps<br>(UNG = 480 mV) | 126 | 115 | 112 | | Worst-case delay/ps<br>(UNG = 500 mV) | 147 | 127 | 113 | | Worst-case delay/ps<br>(UNG = 520 mV) | 183 | 153 | 116 | ## Conclusion In this paper, we have proposed a novel cir- cuit for enhancing the noise immunity and improving the performance of wide fan-in dynamic circuits. In addition, we have presented an analytical model for transistor sizing that agrees well with the simulations. The simulations show that the proposed circuit enhances the noise immunity at the same speed and runs much faster at the same noise immunity, compared to existing techniques. The simulations also show that the proposed technique is extraordinarily suitable for wide fan-in applications and other conditions of high UNG requirements. #### References - Rabaey J M, Chandrakasan A, Nikolic B. Digital integrated circuits: a design perspective. Beijing: Tsinghua University Press, 2003 - Wang L, Krishnamurthy R, Soumyanath K, et al. An energyefficient leakage-tolerant dynamic circuit technique. Proc Int ASIC/SOC Conf, 2000 - Sun Xuguang, Mao Zhigang, Lai Fengchang. Design and implementation of a 64bit CMOS parallel adder with modified architecture. Chinese Journal of Semiconductors, 2003, 24 (2):203(in Chinese)[孙旭光,毛志刚,来逢昌.改进结构的 64 位 CMOS 并行加法器设计与实现. 半导体学报,2003,24 (2):203 - [4] Kabbani A, Al-Khalili A J. A technique for dynamic CMOS noise immunity evaluation. IEEE Trans Circuit Syst 1,2003, 50(1):74 - Ding L, Mazumder P. On circuit technique to improve noise immunity of CMOS dynamic logic. IEEE Trans Very Large Scale Integration System, 2004, 12(9):910 - Alvandpour A, Krishnamurthy R K, Soumyanath K, et al. A sub-130-nm conditional keeper technique. IEEE J Solid-State Circuits, 2002, 37(5):633 - [7] Mahmoodi-Meimand H, Roy K. Diode-footed Domino: a leakage-tolerant high fan-in dynamic circuit design style. IEEE Trans Circuit Syst [ ,2004,51(3):495 ## 高速抗噪声 CMOS 动态电路设计 #### 赖练章 汤庭鳌 林殷茵 (复旦大学专用集成电路与系统国家重点实验室,上海 200433) 摘要:提出了一种利用窄脉冲发生器驱动输出级,以提高电路抗噪声能力,同时保持动态电路的高速特性的多输入动态逻辑电路. 提出了这种电路的分析模型,用于说明电路的抗噪声特性和管子的参数设置. 在 $0.18\mu m$ CMOS 工艺,1.8V 的 $V_{\rm sd}$ 电压和 55°C 的环境温度下,模拟结果表明:与现有的两种技术相比,在相同的最坏延时情况下,新结构具有更好的抗噪声能力,分别提升了 12% 和 8%;而在具有相同的抗噪声能力的情况下,新结构具有更快的速度,分别提高了 1.6 倍和 1.4 倍. 关键词: 多米诺电路; 抗噪声能力; 高速; 电荷保持器; 窄脉冲 **EEACC**: 1265B 中图分类号: TN431.2 文献标识码: A 文章编号: 0253-4177(2006)06-1006-06