## Complementary Pass-Transistor Adiabatic Logic Circuit Using Three-Phase Power Supply

Hu Jianping, Wu Yangbo and Zhang Weiqiang

(Faculty of Information Science and Technology, Ningbo University, Ningbo 315211, China)

Abstract: A new low-power quasi-adiabatic logic, complementary pass-transistor adiabatic logic (CPAL), is presented. The CPAL circuit is driven by a new three-phase power clock, and its non-adiabatic loss on output loads can be effectively reduced by using complementary pass-transistor logic and transmission gates. Furthermore, the minimization of the energy consumption can be obtained by choosing the optimal size of bootstrapped nMOS transistors, thus it has more efficient energy transfer and recovery. A three-phase power supply generator with a small control logic circuit and a single inductor is proposed. An 8-bit adder based on CPAL is designed and verified. With MOSIS 0.25μm CMOS technology, the CPAL adder consumes only 35% of the dissipated energy of a 2N-2N2P adder and is about 50% of the dissipated energy of a PFAL adder for clock rates ranging from 50 to 200M Hz.

Key words: complementary pass-transistor logic; adiabatic logic; low-power; 3-phase power-clock generator

EEACC: 1265A; 2570D; 2560

CLC number: TN432 Article ID: 0253-4177(2004)08-0918-07 Document code: A

#### 1 Introduction

Demands for low power circuits have motivated VLSI designers to explore new design approaches. Adiabatic logic, which utilizes AC power supplies to recycle the energy of node capacitances, is an attractive approach to obtain low power [1~9]. We can classify adiabatic circuits into two classes, fulladiabatic and quasi-adiabatic logic. The former is much more complex than the latter. For example, the complexity of a 16-bit carry-lookahead adder based on fully reversible logic is about 32 times that of a static CMOS one<sup>[1]</sup>. The quasi-adiabatic circuits, such as ECRL, 2N-2N2P, PFAL, and etc. have relatively simple architec-NERL

tures [3~8]. ECRL, 2N-2N2P, and PFAL use crosscoupled pMOS transistors for energy-recovery. Thus they have non-adiabatic loss on output loads, and their energy losses are highly dependent on the output load capacitances. In NERL[8], though the charge of output loads can be well recovered, the non-adiabatic loss of internal nodes is not small because the size of the output nMOS transistor must be sufficiently large to maintain high bootstrapping node voltage<sup>[9]</sup>.

Based on our previous research, we propose a complementary pass-transistor adiabatic logic (CPAL) using a new three-phase clocking scheme. The non-adiabatic energy loss of output loads is reduced by using CPL (complementary pass-transistorlogic) for evaluation and transmission gates for

<sup>\*</sup> Project supported by National Natural Science Foundation of China (No. 60273093), and Scientific Research Fund of Zhejiang Provincial Education Department (No. 20010238)

Hu Jianping male, was born in 1961, associate professor. His current research interests focus on low-power digital integrated circuits design

Wu Yangbo male, was born in 1972, master candidate. His current research interests focus on ASIC design.

Zhang Weiqiang male, was born in 1963, associate professor. His current research interests focus on VLSI design.

energy-recovery. Furthermore, we explain how to minimize the total energy dissipation. A three-phase power-clock generator is also presented. With 0.25 $\mu$ m CMOS technology, we confirm that CPAL circuit consumes substantially less energy than other logic circuits do.

## 2 Operation of CPAL

The basic structure of the CPAL buffer is shown in Fig. 1. It is a dual-rail logic with CPL (N1~ N4) and a pair of transmission gates (N5, P1 and N6, P2). The clamp transistors (N7 and N8) make the un-driven output node grounded. Cascaded CPAL gates are driven by the three-phase power-clock, as shown in Fig. 2. A clocking rule must be followed to form a chain of logic circuits. Each clock is followed by the next clock with a 120° phase lag for a complete pipeline operation.



Fig. 1 Schematic of the CPAL buffer



Fig. 2 CPAL buffer chain and its threephase power clock

The waveforms of simulation for the CPAL buffer are shown in Fig. 3. They are obtained when a periodic sequence "1010 ..." is propagated through the buffer chain. The power-clock frequency is 100M Hz, and the peak voltage  $V_{DD}$  is 2.5V. The device sizes of nMOS and pMOS transistors are taken with  $3\lambda/2\lambda$  and  $9\lambda/2\lambda$ , respectively, and  $\lambda = 0.12\mu m$ .



Fig. 3 Simulation waveforms of the CPAL buffer

By referring the schematic shown in Fig. 1 and the waveforms in Fig. 3, the operation of the CPAL buffer can be summarized as follows: During the period  $T_1$ , the node Y is clamped to ground, while the voltage of the node X is pre-charged to about  $V_{\rm DD}-V_{\rm TN}$ , where  $V_{\rm TN}$  is the threshold voltage of the nMOS transistor. During  $T_3$ , the node OUT is charged through transmission gate (N5, P1) as the clock  $\phi$  goes up. During this evaluation process, though a threshold voltage in the node X is lost, the pMOS (P1) complements threshold loss. Therefore, a full swing is obtained during  $T_3$ . During  $T_6$ , as the voltage of the clock  $\phi$ drops from  $V_{\rm DD}$  to ground, the charge on the node OUT is recovered through N5 and P1.

Since the node OUT has been evaluated during  $T_3$  and held its state during  $T_4$ , the input IN (power clock of previous stage) may drop to ground during  $T_4$ . Similarly, the output of the present stage is used for the output evaluation of the next stage during  $T_5$ . During  $T_6$ , as the output of next stage has been evaluated and latched, the power-clock of the present stage may drop to ground and the charge of output nodes is recovered.

During  $T_5$  and  $T_6$ , because N1 and N2 are turned off, the node X is in the high-impedance state. Therefore, the voltage of the node X can be bootstrapped to a higher level than  $V_{DD}-V_{TN}$  due to the gate-to-channel capacitance of N5 while  $\phi$  is varied. When  $\phi$  rises from 0 to  $V_{DD}$ , the voltage of the node X is increased by  $\Delta V$ , which is expressed

$$\Delta V = \frac{C_{\rm G}}{C_{\rm D1} + C_{\rm D2} + C_{\rm W} + C_{\rm G}} \times V_{\rm DD} \qquad (1)$$

where  $C_G = WLC_{OX}$  is the gate-to-channel capacitance of N5 (or N6), W and L are respectively the channel width and length of N5 (or N6),  $C_{OX}$  is the gate-to-channel capacitance per unit area,  $C_{D1}$  and  $C_{D2}$  are the diffusion capacitance of N1 and N2, respectively, and  $C_{W}$  represents the wiring capacitance. According to Eq. (1), when the channel width of N5 and N6 increases, the  $\Delta V$  is raised. High voltage  $\Delta V$  can reduce the adiabatic loss because the turn-on resistance of output-driven nMOS transistor (N5 and N6) is reduced. Figure 4 illustrates the waveform of the node X when the device size of N5 and N6 is  $18\lambda \sqrt{2\lambda}$ . The simulated  $\Delta V$  of the CPAL buffer for various channel widths of N5 and N6 is plotted in Fig. 5.



Fig. 4 Simulation waveform of the internal node X when the device size of N5 and N6 is  $18\lambda/2\lambda(\lambda=0.12\mu\mathrm{m})$ 



Fig. 5 Simulated  $\Delta V$  of the CPAL buffer versus channel width of N5 and N6 The channel length L is 0.  $24\mu m$ .

# 3 Energy dissipation and optimization

The energy dissipation occurs when the nodes of the CPAL buffer are charged or discharged. First, we analyze the energy dissipation per cycle of the node X (or Y). Assume that the charging current is i during  $T_1$ , and the gate-to-drain voltage of N1 is almost a constant,  $V_{\text{TN}}$ . Then, the energy consumption of the nMOS transistor N1 (or N4) is approximately given by

$$E_{X, \text{pre-charge}} = \int_{0}^{T_{I}} V_{TN} i dt = \int_{0}^{C_{X}(V_{DD} - V_{TN})} V_{TN} dq$$

$$= C_{X}(V_{DD} - V_{TN}) V_{TN}$$
(2)

where  $C_X = C_{D1} + C_{D2} + C_W + WLC_{OX}$  is the capacitance of the node X (or Y), and W and L are the channel width and length of N5 (or N6). When the node X is bootstrapped, N1 and N2 are isolated, thus energy dissipation during the period ( $T_{3} \sim T_{6}$ ) can be ignored. During  $T_{7}$ , as shown in Fig. 4, the charge of the node X is discharged to ground, so the non-adiabatic energy loss can be represented as

$$E_{\rm X, \, discharge} = \frac{1}{2} C_{\rm X} (V_{\rm DD} - V_{\rm TN})^2 \tag{3}$$

Therefore, the energy dissipation per cycle of the internal nodes can be written as

$$E_{X} = C_{X}(V_{DD} - V_{TN})V_{TN} + \frac{1}{2}C_{X}(V_{DD} - V_{TN})^{2}$$
(4)

When the output node OUT (or OUTb) is charged or discharged, the energy dissipation per cycle can be represented as

$$E_{\text{output}} = 2 \left[ \frac{RC_L}{T} \right] C_L V_{DD}^2$$
 (5)

where  $C_L$  is the load capacitance of the CPAL buffer, T represents the transition time of the power-clock, and  $R \simeq 1/W$  is the turn-on resistance of the transmission gate (N5, P1 or N6, P2). The total energy dissipation per cycle of the CPAL buffer can be expressed as

$$E_{\text{total}} = E_{X} + E_{\text{output}}$$
 (6)

We can reduce  $E_X$  by reducing the device sizes of N5 and N6 according to Eq. (4), whereas we can reduce  $E_{\text{output}}$  by increasing the channel widths of N5 and N6 according to Eq. (5). Therefore, we can choose the optimal sizes of N5 and N6 to minimize the total energy dissipation. Figure 6 illustrates simulation results of the energy dissipation of the CPAL buffer for various channel widths of N5 and

N6. We used the sizes of  $W/L = 0.36 \mu \text{m}/0.24 \mu \text{m}$ for nMOS transistor and  $W/L = 1.08 \mu \text{m}/0.24 \mu \text{m}$ for pMOS transistor except for N5 and N6.



Total energy dissipation per cycle of the CPAL buffer versus channel width of N5 and N6 for several values of load capacitance The frequency is 100M Hz and  $V_{DD} = 2.5 \text{V}$ .

From the simulation results, the optimal channel widths of N5 and N6 are 3λ, 9λ, 15λ, and 21λ when the load capacitance  $C_L$  is 5, 20, 35, and 50fF, respectively. For a small load capacitance, a minimum-size nMOSFET can be used to reduce the total energy loss by decreasing  $E_{\rm X}$ .

A comparison of energy consumption has been made against 2N-2N2P, PFAL, and static CMOS logic, as shown in Fig. 7. In all circuits, we used the sizes of  $W/L = 0.36 \mu \text{m}/0.24 \mu \text{m}$  for nMOS and W/ $L = 1.08 \mu \text{m} / 0.24 \mu \text{m}$  for pMOS except for N5 and N6 in CPAL, which size is  $1.08\mu m/0.24\mu m$ . Figure 7 (a) shows the curves of the energy consumption per cycle versus the load capacitance. The energy loss of CPAL is much lower than that of the other two, especially in large load capacitance, because the capacitance of the internal node is much smaller than that of output nodes. Figure 7(b) shows the curves of the energy consumption per cycle versus the power-clock frequency. Compared to 2N-2N2P and PFAL, the CPAL dissipates less energy at all operation frequencies and is insensitive to clock frequency because the turn-on resistance of the transmission gates of CPAL is smaller than that of pMOS transistors of 2N-2N2P and PFAL.



Fig. 7 (a) Comparison of energy consumption per cycle versus load capacitance among CPAL, 2N-2N2P, and PFAL; (b) Comparison of energy consumption per cycle versus power-clock frequency among CPAL, 2N-2N 2P, PFAL, and static CM OS circuits

f/MHz

150

## Gates and 8-bit adder

Complex gates can be easily realized by using the CPL to replace the transistors (N1~ N4) of the CPAL buffer. Figure 8(a) is AND/NAND gate, and Figure 8(b) is XOR/XNOR gate. All basic gates, such as inverter, AND, OR, and XOR, use the same topology, and only inputs are permutated. Figure 8(c) is ANDOR/NANDOR, which is built by cascading the CPL<sup>[10]</sup>.



CPAL gates (a) AND; (b) XOR; (c) AN-Fig. 8 DOR

Based on gates, complex digital system can be implemented. We use the 8-bit Brent-Kung adder to show the efficiency of the CPAL, as shown in Fig. 9.



Fig. 9 Schematic of 8-bit BK adder

Based on the three-phase scheme, the 8-bit adder consists of 6 pipeline stages with the buffers to maintain pipelining. It can execute 8-bit addition per cycle, and the latency time is two cycles. The average energy dissipation of the adder is shown in Fig. 10. In the simulation, CPAL, 2N-2N2P, and PFAL use the same gate-level structure but different in aspects that are specific to the logic styles used in their design. All primary outputs are connected to a 20fF load, and the input patterns to the adder are generated randomly. The simulation results show that the CPAL adder consumes only 35% and 50% of the dissipated energy of the 2N-2N2P and PFAL adder, respectively.

## 5 Three-phase power supply generator

CPAL is supplied by the three-phase powerclock, so an efficient clock circuit, which converts DC to AC power, should be designed. As shown in Fig. 11, we generated the three-phase power sup-



Fig. 10 Comparison of energy consumption of 8-bit adder per cycle among CPAL, 2N-2N 2P, and PFAL

ply. It is divided into two parts: the clock rail drivers that change the inductor connection; the controller that generates the control signals for the clock rail drivers.



Fig. 11 Three-phase power supply generator

The clock nodes of CPAL are represented by  $R_1$ ,  $C_1$ ,  $R_2$ ,  $C_2$ ,  $R_3$ , and  $C_3$ , which can be obtained in simulation tests (or experiments) by forcing a sinusoidal wave to the clock nodes and measuring the power loss and the current level<sup>[2]</sup>. Small external capacitors ( $C_{E1}$ ,  $C_{E2}$ , and  $C_{E3}$ ) are added to clock nodes to balance capacitance. Each clock rail driver

either connects its corresponding clock node to the inductor via transmission gates (TG1, TG2, or TG3) or clamps the clock node to  $V_{\rm DD}$  or GND. The equivalent capacitance at the clock nodes and the inductor L form a resonant tank. One terminal of the inductor L is connected to the DC source  $V_{\rm DD}/2$ , so that the voltage of the node  $N_{\rm L}$  is an exponentially damped sinusoid with a maximum close to  $V_{\rm DD}$ .

The control signals of the clock rail drivers are generated with a mod-6 counter. Figure 12 shows the waveforms of the control signals for the rail  $\Phi$ and illustrates how the node NL of the inductor is connected to the three clock nodes. The edges of the clock-rail are obtained from the sinusoidal pulse at the node NL. Assume that initially NL is at the ground state. During  $T_1$ ,  $N_L$  is connected to  $\Phi$ , and then  $\Phi$  swings from ground to  $V_{DD}$ . During  $T_2$ , N<sub>L</sub> is connected to  $\Phi$ , and  $\Phi$  is clamped to  $V_{DD}$ . During  $T_3$ ,  $N_L$  is connected to  $\Phi$ , and  $\Phi$  is still clamped to VDD, to make its states truly high. Each unconnected clock node is clamped to its own state: VDD or ground. The power supply generator is efficient in energy consumption because the inductor connection is changed only when the inductor current is zero.



Fig. 12 Control signals of clock-rail driver for ♠

Given a desired power-clock frequency f, the required inductance L can be found from the equation  $3f=1/2\pi\sqrt{L\,C_{\rm ep}}$ , where  $C_{\rm eq}$  is the equivalent capacitance at the clock node. Note that the frequency of the reference clock must be three times that of three-phase clock. The energy efficiency can

be defined as the ratio of dissipated energy in CPAL clock nodes and total delivered energy from the DC supply. The efficiency ranges from 18% to 60% depending on the power-clock frequency and the complexity of driven CPAL circuits. When driving the 8-bit CPAL adder, the conversion efficiency of the power supply generator is about 35% at 80M Hz.

### 6 Conclusion

The energy consumption of CPAL is insensitive to output load capacitance and less dependent on power-clock frequency. With 0.25µm CMOS technology, we confirm that CPAL circuit consumed substantially less energy than other logic circuits. The circuit design based on CPAL is simple and easy because of its regular topology. However, the energy loss of adiabatic circuits should include the overhead of a power-clock generator. We should carefully optimize power-clock generator to reduce its energy loss.

#### References

- [1] Lim J, Kim D G, Chae S I. A 16-bit carry-lookahead adder using reversible energy recovery logic for ultra-low-energy systems. IEEE J Solid-State Circuits, 1999, 34(6): 898
- [2] Maksimovic D, Oklobdzija V G. Integrated power clock generators for low energy logic. Proc IEEE Power Electronics Specialists Conference, 1995: 61
- [3] Moon Y, Jeong D K. An efficient charge recovery logic circuit. IEEE J Solid-State Circuits, 1996, 31(4): 514
- [4] Kramer A, Denker J S, Flower B, et al. 2nd order adiabatic computation with 2N-2P and 2N-2N2P logic circuits. Proc of the International Symposium on Low Power Electronics and Design, Dana Point, 1995: 191
- [5] Vetuli A, Pascoli S D, Reyneri L M. Positive feedback in adiabatic logic. Electron Lett, 1996, 32(20): 1867
- [6] Li Xiaomin, Qiu Yulin, Chen Chaoshu. Design of low voltage charge-recovery logic circuit. Chinese Journal of Semiconductors, 2001, 22(10): 1352(in Chinese)[李晓民, 仇玉林, 陈潮枢. 低电压 Charge-Recovery 逻辑电路的设计. 半导体学报, 2001, 22(10): 1352]
- [7] Dai Hongyu, Zhang Sheng, Zhou Runde. Power optimization methods of energy recovery circuits. Chinese Journal of Semi-

conductors, 2002, 23(9): 996 (in Chinese) [ 戴宏宇, 张盛, 周润德. 能量回收电路的功耗优化方法. 半导体学报, 2002, 23 (9): 996]

- [8] Kim C, Yoo S M, Kang S. Low power adiabatic computing with NMOS energy recovery logic. Electron Lett, 2000, 36 (16): 1349
- [9] Lim J, Kim D G, Chae S I. nMOS reversible energy recovery logic for ultra-low-energy applications. IEEE J Solid-State Circuits, 2000, 35(6): 865
- [10] Rabaey J M. Digital integrated circuits: a design perspective. New York: Prentice Hall, 1996: 221

## 采用三相电源的互补传输管绝热逻辑电路\*

#### 胡建平 邬杨波 张卫强

(宁波大学信息科学与工程学院,宁波 315211)

摘要:提出了一种由三相电源驱动的新绝热逻辑电路——complementary pass-transistor adiabatic logic (CPAL). 电路由 CPL 电路完成相应的逻辑运算,由互补传输门对输出负载进行绝热驱动,电路的整体功耗较小.指出选取合适的输出驱动管的器件尺寸可进一步减小 CPAL 电路的总能耗.设计了仅由一个电感和简单控制电路组成的三相功率时钟产生电路.为了验证提出的 CPAL 电路和时钟产生电路,设计了 8bit 全加器进行模拟试验.采用 MOSIS 的 0.25μm CMOS 工艺,在 50~200M Hz 频率范围内, CPAL 全加器的功耗仅为 PFAL 电路和 2N-2N2P 电路的 50% 和 35%.

关键词: 互补传输管逻辑; 绝热逻辑; 低功耗技术; 三相功率时钟

EEACC: 1265A; 2570D; 2560

中图分类号: TN 432 文献标识码: A 文章编号: 0253-4177(2004)08-0918-07

<sup>\*</sup> 国家自然科学基金(批准号: 60273093) 及浙江省教育厅(批准号: 20010238) 资助项目

胡建平 男,1961年出生,副教授,研究兴趣为低功耗数字集成电路与专用集成电路设计。

邬杨波 男,1972年出生,硕士,主要从事专用集成电路设计.

张卫强 男, 1963年出生, 副教授, 研究领域为 VLSI 集成电路设计.