## Fast Analysis of Power/ Ground Networks via Circuit Reduction Cai Yici<sup>1</sup>, Pan Zhu<sup>1</sup>, Sheldon X D Tan<sup>2</sup>, Hong Xianlong<sup>1</sup>, and Fu Jingjing<sup>1</sup> (1 Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China) (2 Department of Electrical Engineering, University of California, Riverside, CA 92521, USA) **Abstract:** This paper presents an efficient algorithm for reducing RLC power/ ground network complexities by exploitation of the regularities in the power/ ground networks. The new method first builds the equivalent models for many series RLC-current chains based on their Norton's form companion models in the original networks, and then the precondition conjugate gradient based iterative method is used to solve the reduced networks, which are symmetric positive definite. The solutions of the original networks are then back solved from those of the reduced networks. Experimental results show that the complexities of reduced networks are typically significantly smaller than those of the original circuits, which makes the new algorithm extremely fast. For instance, power/ ground networks with more than one million branches can be solved in a few minutes on modern Sun workstations. Key words: circuit simulation; power/ground network; model reduction; RLC circuit **EEACC:** 2570 **CCACC:** 7410D **CLC number:** TN402 **Document code:** A **Article ID:** 0253-4177 (2005) 07-1340-07 #### 1 Introduction Efficient transient analysis techniques of P/G networks are required to precisely capture the dynamic voltage fluctuations on P/G wires for guiding the designs of reliable and robust P/G networks. As millions of devices are integrated into a single chip, transient analysis of a power/ground network becomes a challenging task due to the increasing sizes of the networks. Traditional circuit simulators like SPICE are no longer able to meet the formidable tasks of analyzing P/G circuits with millions of extracted RL KC elements in a timely manner. Significant research efforts have been carried out to find efficient simulation approaches for P/G grid analysis $^{[1^{\circ}9]}$ in the past. Among them, the multi-grid method $^{[8]}$ , hierarchical method $^{[10]}$ , pre- conditioned conjugate gradient<sup>[2]</sup>, hierarchical model order reduction<sup>[3]</sup>, and frequency domain analysis<sup>[1]</sup> are the latest proposed methods. Although significant improvements have been made to analyze large P/G grids by those proposed methods, the naturally regular physical structures of a P/G network are not fully explored to speedup transient analysis of networks. Tree-mesh structured P/G circuits are analyzed in Ref. [10]. The tree portions of the P/G network are reduced via the model order reduction method PRIMA<sup>[16]</sup> to obtain the equivalent circuits of RLC tree circuits. But ,some errors will be introduced. The methods in Refs. [4,11] fully explore the naturally regular structures of a P/G network to reduce the P/G network complexities in an error-free fashion. However, they can only be applied for DC analysis of resistor-only P/G networks. In this paper, we propose a new approach to <sup>\*</sup> Project supported by the National Hi-Tech Research & Development Program of China (No. 2002AA1Z1460) and the National Natural Science Foundation of China (No. 90307017) Cai Yici female ,was born in 1960 ,associate professor. Her research interests include theory and algorithms on VLSI physical design. Pan Zhu male ,was born in 1979 ,master student. His research interests include algorithm study on P/G network analysis. the transient analysis of large P/ G grids. Our new method is based on the observation that P/ G grids, especially those used in many cell-based layout ASIC applications, consist of any series RLC chains<sup>[11-13]</sup> as shown in Fig. 2. We show that a compact equivalent model can be built for the RLC chain circuit in each time step of approximate integration. With the equivalent chain models, the original networks can be reduced significantly and efficient transient analyses of very large P/ G networks becomes possible. Fig. 1 $\,$ A P/ G network of standard cell based ASIC layout As only resistors and constant current sources are present in the reduced P/G networks in each time step, and the resulting circuit matrix of the network is symmetric positive definite (the voltage sources are also modeled by Norton equivalent circuits), a preconditioned conjugate gradient based iterative method<sup>[14]</sup> can be used to efficiently solve the resulting circuit matrix. Our contribution is the introduction of an efficient algorithm for reducing RLC P/G network complexities in an error-free manner by exploitation of the regularities in the power/ground networks. Experimental results show that at least two orders of magnitude speed up over SPICE is observed for large P/G networks. # 2 Construction of equivalent circuit models The P/G networks are typically mesh-struc- tured into VLSI technology and consist of many cascaded RLC sections extracted from chip layouts, as shown in Fig. 1, for a power network. Fig. 2 A series RLC chain in a P/G network Here $R_i$ , $C_i$ , $L_i$ , denote , respectively , the resistance , capacitance , and inductance at the P/G wire segment i. Capacitance $C_i$ includes both the parasitic capacitances of the P/G wire i and the decoupling capacitances. $e_i$ is a time-varying current source that captures the dynamic current consumptions of the circuit cells. $N_1$ and $N_{n+1}$ are called cross nodes and the rest of the nodes are called intermediate nodes. Our goal is to suppress all the intermediate nodes in this series RLC chain circuit and replace them in each integration time step of transient analysis using an electrically equivalent circuit that consists of only the cross nodes. ## 2. 1 Integration approximation in Norton's form Let h be the time step used at simulation step k+1. For the capacitors and inductors, trapezoidal companion models of Norton's form are used. In this way, we will not introduce any extra nodes. Specifically, the current across a capacitor at the k+1 step is: $$I_{c,k+1} = \frac{2C}{h} V_{c,k+1} - \left( \frac{2C}{h} V_{c,k} + I_{c,k} \right)$$ (1) where $V_{c,k}$ , $I_{c,k}$ , $V_{c,k+1}$ , $I_{c,k+1}$ denote the branch voltage and branch current of the capacitor at step k and step k+1, respectively, and C is the value of the capacitor. Similarly the current through an inductor at step k+1 is: $$I_{L,k+1} = \frac{h}{2L} V_{L,k+1} + (\frac{h}{2L} V_{L,k} + I_{L,k})$$ (2) where $V_{L,k}$ , $I_{L,k}$ , $V_{L,k+1}$ , $I_{L,k+1}$ denote the branch voltage and branch current of the inductor at step k and step k + 1, respectively, and L is the value of the inductor, as shown in Fig. 3. Fig. 3 RLC chain circuit companion models at step k+1 Next, we merge two floating resistors for each RLC section, as shown in Fig. 4, using Norton theory. Thus the circuit in Fig. 3 can be simplified into the following circuit shown in Fig. 5, which is ready for further reduction to be explained in the next subsection. Fig. 4 Merge two resistors in a RLC section Fig. 5 Simplified RLC chain circuit ## 2. 2 Transformation from Y model to model In this subsection, we show how the RLC chain circuit shown in Fig. 5 can be further reduced by an equivalent circuit consisting of only the cross nodes. This is done by repeatedly transforming a Y model circuit to a model circuit, as shown in Fig. 6. In Fig. 6, $E_0$ and $E_0$ are the currents inflowing into node a and node b, respectively, and the corresponding arrowheads show the current directions. The equivalent currents and resistances are shown on the right-hand side of Fig. 6. Fig. 6 Y model circuit to model circuit ## 2. 3 Construction and back solving algorithms for the equivalent circuits By repeatedly applying the Y model to model transformation, starting from node $N_1$ until we reach node $N_{n+1}$ , the whole chain circuit is in Fig. 5 can be reduced into an equivalent circuit shown in Fig. 7. Fig. 7 Equivalent chain circuit model The algorithm for computing $R_1^{\text{equiv}}$ , $R_{n+1}^{\text{equiv}}$ , $R_{n+1}^{\text{equiv}}$ , $R_{n+1,k+1}^{\text{equiv}}$ , $I_{n+1,k+1}^{\text{equiv}}$ , and $I_{n+1,k+1}^{\text{equiv}}$ from the circuit is shown in Fig. 5. Once the reduced network has been solved, all the intermediate nodes of original circuits can be back solved using the superposition principle. As shown in Fig. 5, for voltage $V_{i,k+1}$ at the node i, when the current flowing through the resistor $R_i^*$ is obtained, the voltage $V_{i+1,k+1}$ at node i+1 is equal to $V_{i,k+1}$ plus the voltage drop on $R_i^*$ . The current flowing through the resistor $R_{i+1}^*$ can be computed simply with the KCL law. ## 3 Overall algorithm #### 3. 1 Algorithms description After the reduced network is obtained at each time step, a preconditioned conjugate gradient (PCG) based iterative method<sup>[14]</sup> is used for the so- lutions as the resulting circuit matrix is symmetric positive definite and the circuit consists of only resistors and constant current sources. Assume initial node voltages are known and the number of RLC sections in a P/G network is M. The overall simulation algorithm is shown as follows. ``` PG-Solver() { Initiate V_i, R_i^*, r_i for i [1, M]; for k = 1 to n_{\text{step}} do { RENEW(ec<sub>i,k</sub>, el<sub>i,k</sub>); For all RLC chains EQVCKT-CON-STRUCT(); PCG-SOLVER(); For all RLC chains BACK-SOLVER(); } } ``` $n_{\text{step}}$ is the number of time steps used in the simulation. RENEW(ec<sub>i,k</sub>,el<sub>i,k</sub>) computes the combined current sources from previous simulation results. EQVCKT-CONSTRUCT constructs the equivalent circuits for all the RLC chains. PCG-SOLVER is the linear solver based on the PCG algorithm for the solutions of the reduced network. BACK-SOLVER back solves the node voltage of all intermediate nodes. #### 3. 2 Time complexity analysis Suppose there is a P/G network with N nodes, among which $N_{\rm cross}$ is the number of cross nodes and $N_{\rm mid}$ is the intermediate nodes and $N=N_{\rm cross}+N_{\rm mid}$ . Typically $N_{\rm cross}$ is far less than N in standard cell based P/G networks of ASICs. The computation cost of the resulting algorithm is $T_{\rm eqvekt}(N) + T_{\rm back}(N_{\rm mid}) + T_{\rm pcg}(N_{\rm cross})$ (3) where $T_{\rm eqvekt}(N)$ , $T_{\rm back}(N)$ , and $T_{\rm pcg}(N)$ are the time costs for routines EQVCKT-CONSTRUCT, BACK-SOLVER and PCG-SOLVER, respectively. We notice that EQVCKT-CONSTRUCT and BACK-SOLVER are of linear complexity. As for the PCG-SOLVER, it is greater than linear theoretically, but it was shown in Refs. [2,13] that it practically shows linear time or close to linear time complexity for many practical large P/G networks due to sparsity of those circuits. Also ,if $N_{cross}$ is far less than $N_{mid}$ ,the nonlinear time cost has marginal effects on the scalability of the whole algorithm. As a result , the whole algorithm , which consists of PCG-SOLVER ( $N_{cross}$ ) , EQVCKT-CONSTRUCT (N) and BACK-SOLVER ( $N_{mid}$ ) , is of linear or close to linear time complexity in practice. This is illustrated in Fig. 8. In Fig. 8, we define $= N/N_{cross}$ , which reflects the reduction ratio between the original P/G grid and the reduced P/G grid and also draws a CPU time versus number of circuit nodes curves for both the pure PCG and new method. Four networks with different are solved for each total number of nodes. The CPU time for the pure PCG method (no node reduction is performed) is the average time used for the four circuits as the CPU time of the pure PCG method does not change significantly with . It is shown that the CPU time used by the new algorithm grows almost linearly when 10. Also ,as more reduction become possible with a larger ,the increase rate of the CPU time decreases with the number of nodes. Fig. 8 Time complexity comparison ## 4 Experimental results The proposed simulation algorithm is implemented in C and $C^{+\,+}$ . All the experimental results are obtained on a SUN UltraSparc workstation with a 750MHz CPU and 1Gb memory. The number of time steps $n_{\text{step}}$ is assigned as 120 for a clock cycle according to the requirement of accuracy<sup>[8]</sup>. We have tested our program on a number of P/G networks with complexities ranging from 2500 nodes to 1 million nodes. The statistics of those P/G circuits are shown in Table 1. Table 1 Statistics on the original and reduced nets | P/ G grids | Node see | Reduction ratio/ X | | |------------------------------|----------|--------------------|-----| | | Pure PCG | Our method | | | 50 <b>x</b> 50 <b>x</b> 10 | 2551 | 501 | 5 | | 100 ×100 ×10 | 10101 | 1001 | 10 | | 200 ×200 ×10 | 40201 | 2001 | 20 | | 400 <b>x</b> 400 <b>x</b> 10 | 160401 | 4001 | 40 | | 600 <b>x</b> 600 <b>x</b> 10 | 360601 | 6001 | 60 | | 800 <b>x</b> 800 <b>x</b> 10 | 640801 | 8001 | 80 | | 1000 ×1000 ×10 | 1001001 | 10001 | 100 | First, we compare our method with that of SPICE in terms of accuracy. Both programs are used to solve a 50 $\times$ 50 $\times$ 5 ( $x \times x \times y$ P/G network has x P/G straps and y P/G trunks, each strap has x+1 nodes.) P/G network. The results are shown in Fig. 9. The two waveforms are essentially same. The maximum error is just 0.00289 %. Next, we compare our new simulation method Fig. 9 Accuracy comparison with SPICE with the SPICE and the pure PCG algorithm. The effectiveness of equivalent circuit modeling is shown in Table 1 where the number of nodes after node reduction and the reduction ratio for each circuit are shown in columns 3 and 4. It can be seen that complexities of the P/G network can be significantly reduced. On the other hand, we notice that the topologies of P/G grids do have impacts on the performance of the new algorithm. Tables 1 and 2 show that the pure PCG can not deal with very large P/G networks. This is because PCG will become memory hungry when large circuits are loaded. On the other hand, the new algorithm is much more memory efficient and powerful than the pure PCG method. Table 2 Comparison on CPU times | | CPU time/ s | | | | | Speedup over | | |--------------------------------|-------------|----------|------------------|---------------|------------|--------------|----------| | P/ G grid | apice | p pcc | Our new method | | | apice | D DOG | | | SPICE | Pure PCG | Eqvckt-construct | Back _ solver | Total time | SPICE | Pure PCG | | 50 <b>x</b> 50 <b>x</b> 10 | 49.08 | 4.48 | 0.15 | 0.17 | 0.98 | 50.08 | 4.57 | | 100 <b>x</b> 100 <b>x</b> 10 | 762.74 | 28.47 | 0.62 | 0.55 | 3.26 | 233.97 | 8.73 | | 200 <b>x</b> 200 <b>x</b> 10 | N/ A | 143.88 | 6.86 | 6.33 | 24.97 | N/ A | 5.76 | | 400 <b>x</b> 400 <b>x</b> 10 | N/ A | 1391.55 | 34.87 | 30.40 | 115.82 | N/ A | 12.04 | | 600 <b>x</b> 600 <b>x</b> 10 | N/ A | 3874.67 | 88.33 | 66.91 | 250.70 | N/ A | 15.46 | | 800 <b>x</b> 800 <b>x</b> 10 | N/ A | 13102.46 | 135.68 | 117.85 | 435.26 | N/ A | 30.10 | | 1000 <b>×</b> 1000 <b>×</b> 10 | N/ A | N/ A | 211.13 | 190.37 | 687.50 | N/ A | N/ A | From Table 2, it is seen that the new algorithm is one order of magnitude faster than the pure preconditioned conjugate gradient method and two orders of magnitude faster $(50X \sim 233X)$ than SPICE. The speedup of the new algorithm is com- parable with the recently proposed P/G simulation algorithm in Ref. [3]. The capability of the new algorithm is also significantly improved compared with that of the pure PCG, as shown in Table 2. The maximum level of a P/G network that can be solved by the PCG method is 800 ×800 with 640k nodes and it takes 13102. 46s to solve the circuits, while the new algorithm can easily solve a 1000 × 1000 P/G network in 687. 50s. For the 800 ×800 circuit, the PCG SOLVER takes 181. 73s (which is the total time minus the times of EQVCKT-CON-STRUCT and BACK-SOLVER shown in the table), which implies that PCG SOLVER only contributes 41. 75 % of the time cost of our algorithm. As SPICE bails out on very small circuits, we expect more speedup will be seen on large circuits given the super-linear time complexity of SPICE. ## 5 Conclusion and future work This paper proposes an efficient algorithm for reducing RLC power/ ground network complexities by exploitation of the regularities in the power/ ground networks. The experimental results demonstrate that the node reduction technique contributes at least one order of magnitude speedup over methods without node reduction and the resulting new algorithm is two orders of magnitude faster than SPICE with almost no accuracy loss for many standard cell based P/G networks. The new algorithm takes 687. 5s to solve P/G networks with more than 1 million nodes on a 750MHz Sun workstation. Also the method can be easily extended to deal with tree-mesh structured P/G networks. This makes our algorithm a serious contender for attacking real industry P/G networks. #### References - [1] Bai G,Bobba S, Hajj I N. Simulation and optimization of the power distribution network in VLSI circuits. Proc IEEE/ ACM International Conf on Computer Aided Design, 2000: 481 - [2] Chen T, Chen C C. Efficient large-scale power grid analysis - based on preconditioned Krylov-subspace iterative method. Proc ACM/ IEEE Design Automation Conf ,2001:559 - [ 3 ] Cao Y,Lee Y,Chen T,et al. HiPRIME: hierarchical and passivity reserved interconnect macromodeling engine for RL KC power delivery. Proc ACM/ IEEE Design Automation Conf, 2002:379 - [4] Stark D, Horowitz M. Techniques for calculating currents and voltages in VLSI power supply networks. IEEE Trans Comput-Aided Des, 1990, 9(2):126 - [5] Chen H H, Ling D D. Power supply noise analysis methodology for deep-submicron VLSI chip design. 34th ACM/ IEEE Design Automation Conf, 1997:683 - [6] Dharchoudhury A, Panda R, Blaauw D, et al. Design and analysis of power distribution networks in power PC microprocessors. Proc ACM/ IEEE Design Automation Conf, 1998:738 - [7] Lee Y M, Chen C P. Power grid transient simulation in linear time based on transmission-line-modeling alternating-direction-implicit method. Proc IEEE/ ACM International Conf on Computer-Aided Design, 2001:75 - [8] Kozhaya J N, Nassif S R, Najm F N. A multigrid-like technique for power grid analysis. IEEE Trans Comput-Aided Des, 2002, 21(10):1148 - [ 9 ] Zhao M, Panda R V, Sapatnekar S S, et al. Hierarchical analysis of power distribution networks. IEEE Trans Computer Aided Des, 1990, 9(2):159 - [10] Su H H, Gala K H, Sapatnekar S S. Fast analysis and optimization of power/ ground networks. Proc IEEE/ ACM International Conf on Computer-Aided Design, 2000:477 - [11] Tan X D, Shi C J. Fast power-ground network optimization using equivalent circuit modeling. Proc 38th ACM/ IEEE Design Automation Conf, 2001:550 - [12] Yeh C W, Kang Y S. Cell-based layout techniques supporting gate-level voltage scaling for low power. IEEE Trans Very Large Scale Integration Systems, 2000, 8(5):629 - [13] Wu X H, Hong X L, Cai Y C, et al. Area minimization of power distribution network using efficient nonlinear programming techniques. Proc IEEE/ ACM International Conf on Computer-Aided Design, 2001:153 - [14] Golub G, Van Loan C. Matrix computation. 3rd Edition. Johns Hopkins University Press, 1996 - [15] Tierney J A. Differential equations ,reading. Boston: Allyn and Bacon Incorporation ,1985 - [16] Odabasioglu A ,Celik M ,Pilleggi L T. PRIME:passive reduction-order interconnect macromodeling algorithm. IEEE Trans Comput-Aided Des ,1998 ,17(8) :645 ## 基于等效电路降阶的电源/ 地线网络快速瞬态模拟 \* 蔡懿慈<sup>1</sup> 潘 著<sup>1</sup> Sheldon X D Tan<sup>2</sup> 洪先龙<sup>1</sup> 傅静静<sup>1</sup> (1 清华大学计算机科学与技术系, 北京 100084) (2 Department of Electrical Engineering, University of California, Riverside, CA 92521, USA) 摘要:提出了一种电源/地线网络的快速时域分析方法.本算法在每一模拟时刻,首先离散化原始电路并且利用诺顿定律简化电路,继而针对电路中的链式结构的串联支路,提出了一种无误差的等效电路的模型降阶算法,将原始电路压缩为仅由交叉节点组成的电路.然后用预条件共轭梯度法求解被压缩的电路,最后恢复求解中间节点的电压值.有效地提高了算法的分析效率,在不损失精度的情况下,比目前工业界普遍采用的 SPICE 软件快两个数量级. 关键词:时域分析; P/G网络;模型降阶; RLC电路 **EEACC:** 2570 **CCACC:** 7410D 中图分类号: TN402 文献标识码: A 文章编号: 0253-4177(2005)07-1340-07 <sup>\*</sup>国家高技术研究发展计划(批准号:2002AA1Z1460)和国家自然科学基金(批准号:90307017)资助项目 蔡懿慈 女,1960年出生,副教授,主要研究方向为 VLSI自动布图的算法与理论.