# Robustness aware high performance high fan-in domino OR logic design\*

Gong Na(宫娜)<sup>1,†</sup>, Wang Jinhui(汪金辉)<sup>2</sup>, Guo Baozeng(郭宝增)<sup>1</sup>, Wang Yongqing(王永清)<sup>1</sup>, Cao Xiaobing(曹晓兵)<sup>1</sup>, and Tian Xiuli(田秀丽)<sup>1</sup>

(1 College of Electronic and Informational Engineering, Hebei University, Baoding 071002, China)
(2 VLSI & System Laboratory, Beijing University of Technology, Beijing 100022, China)

**Abstract:** A novel technique using a keeper with a simultaneous low supply voltage and low body voltage is proposed to improve the overall performance of high fan-in OR gates without modifying the physical dimensions of the keeper. Simulation results of a 16-input domino OR gate using 45 nm CMOS technology show that the proposed technique could trade off between a high power/speed efficient operation and the robustness to noise effectively. Also, a Monte Carlo analysis indicates that the proposed domino OR gate is more robust to parameter variation compared to a conventional domino OR gate.

Key words: Domino OR; robustness; power consumption; parameter variation DOI: 10.1088/1674-4926/30/6/065005 EEPACC: 1130B; 1265

#### 1. Introduction

As a common logic in high speed-performance chip design, high fan-in domino OR circuits or like structures are commonly employed in registers and cache array bit line designs to achieve simple and fast structures<sup>[1]</sup>. However, the robustness is a major inherent concern for domino OR gates because the parallel evaluation transistors in a domino OR gate could leak charge from the evaluation node easily<sup>[2–4]</sup>.

Conventionally, the robustness of a standard OR domino gate can be improved by a weak keeper, with little performance penalty. However, as the technology scales down below the 65 nm node, the scaling of the threshold voltage ( $V_{th}$ ) and the gate oxide thickness ( $t_{ox}$ ) results in an exponential increase of the leakage current and thus keepers must be upsized to offset the worst-case leakage through the pull-down network, which reduces the performance advantage of dynamic gates over other circuit structures. Also, continued device scaling makes the robustness problem worse due to the increase in cross-talk noise between adjacent wires<sup>[5]</sup>. Furthermore, the increasing process variations<sup>[6]</sup>, which are introduced during chip device fabrication steps, also have a significant effect on the robustness of high fan-in OR domino circuits.

Therefore, there exists the need to investigate effective techniques to improve the robustness of high fan-in OR domino gates. An effective forward body biased keeper circuit technique was proposed in Ref. [2] for enhanced robustness of domino logic, but it suffers from the speed and power consumption overhead. Considering the stack effect of NMOS transistors, an alternative domino design technique was proposed in Ref. [3] to achieve a great improvement in performance and noise immunity, but this technique leads to a considerable penalty of power consumption. The diodefooted domino technique in Ref. [4] exhibits a considerable improvement in robustness to noise as compared to the standard domino circuits; but this technique, as well as other techniques mentioned above, fails to consider robustness to parameter variations and, therefore, they could not solve the robustness problem completely.

There is, therefore, a tradeoff between a high power/ speed efficient operation and robustness to noise and parameter variation exists in high fan-in domino OR logic. In this paper, we propose a novel robustness aware high fan-in OR domino design for effectively improving the overall performance.

### 2. Proposed high fan-in domino gates

Figure 1 shows the conventional domino OR structure and the proposed domino OR circuit, respectively. In the proposed design, the low supply voltage keeper technique (LSK) and the low body voltage keeper technique (LBK) are applied, where  $V_{ddL} < V_{dd}$  and  $V_b < V_{ddL}$ .

### 2.1. LSK

LSK provides two significant benefits over conventional domino OR gates. First, as indicated in Eq. (1), both switching and leakage components of the power consumption have a super-linear relation to the supply voltage ( $V_{dd}$ ), so lowering  $V_{dd}$  can reduce the total power consumption effectively. Second, LSK could reduce the contention current provided by the keeper to charge the evaluation node while the pull-down NMOS network is attempting to discharge the evaluation node, which provides a significant improvement of the delay time compared to conventional domino logic. Therefore, from the high-speed/energy efficient operation perspective,  $V_{ddL}$ 

<sup>\*</sup> Project supported by the 2008 Science and Research Foundation of Hebei Education Department (No. 2008308).

<sup>†</sup> Corresponding author.Email:gongna\_china@yahoo.com.cn

Received 21 February 2009, revised manuscript received 12 March 2009



(b)

Fig. 1. N-input OR domino gates: (a) Standard OR dominos; (b) Proposed OR dominos.

should be set as small as possible.

Gnd

$$P = P_{\text{switching}} + P_{\text{leak}} = \alpha f C_{\text{L}} V_{\text{dd}} V_{\text{swing}} + I_{\text{leak}} V_{\text{dd}}, \quad (1)$$

where  $\alpha$ , *f*, and *I*<sub>leak</sub> are the switching activity factor, the clock frequency, and the leakage current of the dynamic node of the gate, respectively. *C*<sub>L</sub> is the capacitive load at the evaluation node.

However, these significant benefits come at the cost of degradation of noise immunity, which results in two problems. On the one hand, in the evaluation phase, if the inputs are all low, the high logic of the evaluation node must be maintained by the keeper. But a keeper with LSK would have inferior strength to maintain the logic and, therefore, it may cause a logic swing at the output. So, a too low supply voltage of the keeper would induce a logic error. On the other hand, even if  $V_{ddL}$  is large enough to maintain the output swing within an acceptable level, the power consumption may be actually increased due to the charging and discharging of the evaluation node during the unnecessary logic swing, as can be seen from Fig. 2.

#### 2.2. LBK

To improve the degraded noise immunity induced by LSK, LBK is applied to our design. As the body voltage  $V_b$ 

decreases below  $V_{dd}$ ,  $V_{th}$  of the keeper will be reduced (Eq.  $(2)^{[7]}$ ), increasing the contention current as compared to a zero body biased keeper with the same physical dimensions. A keeper with low body voltage, therefore, improves the noise immunity characteristics as compared to conventional domino logic with the same keeper physical size. However, contrary to LSK, LBK would increase the contention current due to the low  $V_{th}$ , thereby increasing both the power consumption and delay time of a domino OR gate.

$$V_{\rm th} = V_{\rm th0} + \gamma (\sqrt{|-2\phi_{\rm F} + V_{\rm sb}|} - \sqrt{|-2\phi_{\rm F}|}), \qquad (2)$$

where  $V_{\text{th0}}$  is the threshold voltage when  $V_{\text{sb}} = 0$ ,  $\gamma$  is the body effect coefficient,  $2\phi_{\text{F}}$  is the silicon surface potential at the onset of strong inversion, and  $V_{\text{sb}}$  is the source to body voltage.

From the above analysis, we can see that utilizing LSK in conjunction with LBK (LSBK) has the potential to improve the overall performance of high fan-in OR domino gates by (1) first reducing the supply voltage of the keeper to improve the power and speed characteristics and (2) then applying LBK to enhance the robustness to noise.

Furthermore, LSBK can compensate for parameter variations. In this scheme,  $V_b$  is lowered below  $V_{ddL}$ ; therefore, the keeper is forward body biased, known as FBB. FBB has the desirable result of a reduction in  $V_{th}$  roll-off and DIBL, thereby reducing the sensitivity to a critical-dimension variation.

In the next section, we investigate the effectiveness of LSK, LBK and LSBK for improving the overall performance of high fan-in OR domino gates.

#### **3. Simulation results**

To evaluate the effectiveness of the proposed technique, delay, power consumption, noise immunity, and robustness to parameter variations were measured for the proposed 16-input domino OR gate and were compared with the conventional OR domino gate. Each domino gate drives a capacitive load of 8 fF. HSPICE simulation results were obtained for CMOS 45 nm BSIM4 models<sup>[8]</sup> with a power supply of 0.8 V. The simulations were performed at 110 °C where power consumption, delay, and noise immunity are all more critical than at low temperatures. All OR gates were turned to operate at a 1 GHz clock frequency and the keeper to pull-down network equivalent transistor width radio (KPR)<sup>[9]</sup> was two for all circuits.

Noise immunity is defined as the signal amplitude at the inputs that induced a 10%- $V_{dd}$  drop in the voltage at the output of domino OR gate. The noise signal is assumed to be a wave with 500 ps duration and 80% duty cycle<sup>[7]</sup>.

#### 3.1. Effectiveness of LSK

Figure 2 shows how the waveform of the output for a 16-input domino OR gate varies as a function of the supply voltage of a keeper. In our simulation, we assume the logic



Fig. 2. Transient output of a 16-input domino gate with LSK.



Fig. 3. Power and delay of a 16-input OR gate with LSK.

margin is 10%- $V_{dd}$  at the domino output; that is, as long as the logic swing at the output is below 80 mV, the output will be reliable. It can be seen that, to keep the logic reliable,  $V_{ddL}$ must be large than 0.5 V. The effectiveness of LSK in improving the power consumption and speed characteristics is shown in Fig. 3. It shows that the delay time is reduced with decreasing  $V_{ddL}$ . In particular, as  $V_{ddL}$  decreases from 0.8 to 0.7 V, the delay time could be reduced quickly. As can also be seen from Fig. 3, when  $V_{ddL}$  equals 0.7 V, due to the unnecessary logic swing, the power consumption will actually increase with decreasing  $V_{ddL}$ . To achieve a low power and high speed design, therefore, the minimum  $V_{ddL}$  in our design is 0.7 V. However, when  $V_{ddL}$  varies from 0.8 to 0.7 V, the noise immunity will degrade greatly, as shown in Fig. 4(a).

#### 3.2. Effectiveness of LBK

Table 1 lists the simulation results of a 16-input domino OR gate with LBK, which shows that there is an obvious power and delay penalty when  $V_b$  is reduced. However, the noise immunity is enhanced with decreasing  $V_b$ , as shown in Fig. 4(b). It also can be seen that when  $V_b$  achieves 0.2 V and continues to decrease, the noise immunity will not be further improved. This is because when  $V_b$  is below 0.2 V, the keeper will be strongly forward biased and will produce enough drain-to-body diode current to oppose the drain current of the keeper, thereby lowering the voltage of the evaluation node and ending the enhancement of the noise immunity as well<sup>[2]</sup>.

Based on the simulation results, we can conclude that LSK can be applied in conjunction with LBK to improve the



Fig. 4. Noise immunity of a 16-input OR gate: (a) With LSK; (b) With LBK.

overall performance of high fan-in OR domino gates.

#### 3.3. Proposed domino OR gate with LSBK

To better investigate the tradeoff between the power consumption, the delay time, and the noise immunity, we define the overall performance (OP) of OR dominos as

$$OP = \frac{Power \times Delay}{Noise\_immunity} = \frac{PDP}{Noise\_immunity}.$$
 (3)

Obviously, when the OP value is minimized, the circuits would achieve an optimal overall performance. We simulate the OP value of a 16-input domino OR gate with LSBK by varying  $V_{ddL}$  from 0.7 to 0.8 V and  $V_b$  from 0.1 to 0.8 V. The results are shown in Fig. 5 and the optimal condition ( $V_{ddL}$ = 0.74 V and  $V_b$  = 0.6 V) with a minimum OP value is obtained. As can also be seen from Fig. 5, as  $V_b$  is less than 0.2 V, the OP value will increase greatly with a decrease of  $V_b$ . This is because the significant decreasing noise immunity (Fig. 4(b)) is the decisive factor as compared to the power consumption and the delay time.

Also, we analyze the robustness of the proposed design with respect to process variations. In the experiment, 1000 Monte Carlo simulations are done to evaluate the impact of variations in the most important parameters gate length ( $L_{gate}$ ), channel doping concentration ( $N_{ch}$ ), and  $t_{ox}$  and each parameter is assumed to follow a Gaussian statistical distribution,

Table 1. Simulation results of a 16-input domino OR gate with LBK.

| $V_{b}\left(V\right)$ | 0.1                    | 0.2                   | 0.3                   | 0.4                    | 0.5                    | 0.6                   | 0.7                   | 0.8                   |
|-----------------------|------------------------|-----------------------|-----------------------|------------------------|------------------------|-----------------------|-----------------------|-----------------------|
| Power (W)             | $4.78 \times 10^{-3}$  | $8.29 \times 10^{-4}$ | $7.78 \times 10^{-5}$ | $3.14 \times 10^{-5}$  | $2.88 \times 10^{-5}$  | $3.02 \times 10^{-5}$ | $2.98 \times 10^{-5}$ | $2.94 \times 10^{-5}$ |
| Delay (s)             | $4.73 \times 10^{-10}$ | $4.94\times10^{-10}$  | $4.98\times10^{-10}$  | $4.94 \times 10^{-10}$ | $4.87 \times 10^{-10}$ | $4.97\times10^{-10}$  | $4.92\times10^{-10}$  | $4.86\times10^{-10}$  |



Fig. 5. OP value of a 16-input domino OR gate with LSBK.

with a three sigma  $(3\sigma)$  variation of  $10\%^{[10, 11]}$ .

Figure 6 shows the power-delay product (PDP) distribution curves of 16-input domino OR gates with the conventional technique and the proposed technique, which indicates that the proposed technique is preferable to reduce the PDP in the majority of the samples under process parameter fluctuations, which is similar to the analysis in the normal corner.

To evaluate the impact of the process variation on the PDP of the proposed 16-input domino OR gate and the conventional 16-input domino OR gate, we compare the parameter uncertainty (SD/A)<sup>[12]</sup>, which shows that the uncertainty of the PDP for the proposed OR gate (SD/A =  $3.96 \times 10^{-11}/7.1 \times 10^{-10} = 0.056$ ) is much less than the uncertainty of the PDP for the conventional gate (SD/A =  $6.81 \times 10^{-11}/8.68 \times 10^{-10} = 0.078$ ), as shown in Fig. 6. Therefore, the proposed domino OR gate is more robust to parameter variations as compared to the conventional OR gate.

# 4. Conclusion

Design tradeoffs of power consumption, speed, and robustness exist in high fan-in OR domino gates. In this paper, a novel design combining the low supply voltage keeper technique and the low body voltage keeper is proposed to address this dilemma. Simulation results show that the proposed technique can improve the overall performance and the high fanin domino OR logic is taken to a new level of high-speed, low-power, and robust operation. Thus, the high fan-in domino OR logic may still be employed in the deep submicron technologies, where robustness to noise and process variations is becoming an increasingly limiting issue. The significant improvement, however, comes at the cost of additional complexity as multiple supply voltages and bias generators are necessary, as well as a more complex algorithm for determining the



Fig. 6. PDP distribution curves of two 16-input domino OR gates.

optimum set of supply voltages and the bias voltage of the keeper.

## References

- Rusu S, Singer G. The first IA-64 microprocessor. IEEE J Solid-State Circuits, 2000, 35(11): 1539
- [2] Kursun V, Friedman E G. Forward body biased keeper for enhanced noise immunity in domino logic circuits. Proc IEEE Int Symp Circuits Syst, 2004: 917
- [3] Elgharbawy W, Golconda P, Bayoumi M. Noise-tolerant high fan-in dynamic CMOS circuit design. GLSVLSI, 2005: 134
- [4] Mahmoodi-Meimand H, Roy K. Diode-footed domino: a leakage-tolerant high fan-in dynamic circuit design style. IEEE Trans Circuits Syst I, 2004, 51(3): 495
- [5] Kumar R. Interconnect and noise immunity design for the Pentium 4 processor. Intel Technology Journal, Q1 2001 Issue, Feb. 2001
- [6] International Technology Roadmap for Semiconductors, 2008, http://public.itrs.net
- [7] Wang Jinhui, Gong Na, Geng Shuqin, et al. PN mixed pulldown network Domino XOR gate design in 45 nm technology. Journal of Semiconductors, 2008, 29(12): 2443
- [8] Predictive Technology Model (PTM). http://www.eas.asu.edu/~ ptm
- [9] Kursun V, Friedman E G. Multi-voltage CMOS circuit design. John Wiley & Sons Ltd, 2006,
- [10] Gong Na, Guo Baozeng, Lou Jianzhong, et al. Analysis and optimization of leakage current characteristics in sub-65 nm dual V<sub>t</sub> footed domino circuits. Microelectronics Journal, 2008, 39(9): 1149
- [11] Liu Z, Kursun V. Leakage power characteristics of dynamic circuits in nanometer CMOS technologies. IEEE Trans Circuits Syst II, 2006, 53(8): 692
- [12] Tsai Y F, Vijaykrishnan N, Xie Y, et al. Influence of leakage reduction techniques on delay/leakage uncertainty. Proceedings of VLSI Design, 2005: 374