# Reducing vulnerability to soft errors in sub-100 nm content addressable memory circuits\*

Sun Yan(孙岩)<sup>1,†</sup>, Zhang Jiaxing(张甲兴)<sup>1</sup>, Zhang Minxuan(张民选)<sup>1</sup>, and Hao Yue(郝跃)<sup>2</sup>

(1 School of Computer, National University of Defense Technology, Changsha 410073, China) (2 School of Microelectronics, Xidian University, Xi'an 710071, China)

**Abstract:** We first study the impacts of soft errors on various types of CAM for different feature sizes. After presenting a soft error immune CAM cell, SSB-RCAM, we propose two kinds of reliable CAM, DCF-RCAM and DCK-RCAM. In addition, we present an ignore mechanism to protect dual cell redundancy CAMs against soft errors. Experimental results indicate that the 11T-NOR CAM cell has an advantage in soft error immunity. Based on 11T-NOR, the proposed reliable CAMs reduce the SER by about 81% on average with acceptable overheads. The SER of dual cell redundancy CAMs can also be decreased using the ignore mechanism in specific applications.

**Key words:** soft error; content addressable memory; reliability; vulnerability; critical charge **DOI:** 10.1088/1674-4926/31/2/025013 **EEACC:** 1265A; 2570D

# 1. Introduction

Circuit reliability has become a key challenge in the deep submicron design age. Radiation-induced soft errors are one of the main reasons for reliability problems<sup>[1]</sup>. As VLSI technology scales to sub-100 nm, smaller node capacitances, higher clock frequencies and lower supply voltages lead to circuits that are more susceptible to soft errors. At the same time, the total number of devices on chip is fast increasing. These all result in a high vulnerability to soft errors. The soft error problem is becoming a serious threat for modern VLSI circuits in the sub-100 nm design age, especially for applications in which reliability is an important attribute<sup>[2]</sup>.

Content addressable memory (CAM) is widely used for locating data in a fully associative parallel search. It compares input search data against a table of stored data, and returns the address of the matching data<sup>[3]</sup>. CAMs have a single clock cycle throughput making them faster than other search systems, so they can be used in high speed applications such as tag memory in translation lookaside buffers (TLBs) or high associative caches.

Unfortunately, techniques to increase the vulnerability of SRAMs (such as ECC or parity) are not immediately applicable to CAMs because they typically depend on processing the full contents of the memory word outside the array. This is not possible in a normal CAM access as it only returns match or miss results<sup>[4]</sup>. Some methods have been proposed to protect CAMs against soft errors. Azizi *et al.* presented a kind of ternary CAM (TCAM) cell that reduces SER at the cost of some area<sup>[5]</sup>, but it is only applicable to TCAMs. Pagiamtzis *et al.* proposed an error-correcting-match scheme for CAMs by adding parity bits to tolerant bit errors in the stored contents<sup>[4]</sup>, but the design is too complex and depends strongly on the circuit parameters and precision of design. At present, systemic research into the soft error vulnerability of nanometer CAM circuits is rare.

In this paper, we study the impacts of soft errors on CAMs, compare the soft error immunity of different CAM cells and present three kinds of soft error immune reliable CAM circuits and an error protect mechanism. Experiments show that the proposed techniques can efficiently reduce vulnerability to soft errors in nanometer CAM circuits.

# 2. Impact of soft error on CAMs

Soft errors are caused by external radiation rather than design or manufacturing defects. Considering that an energetic particle strikes a sensitive region of a VLSI storage element, the amount of charge deposited in the associated p–n junction may create an ionizing track through which a current pulse may flow. The particle generates electron–hole pairs in its wake. The source and diffusion nodes of a transistor can collect these charges. If the collected charge is sufficiently large, it can cause the contents of the storage element to flip thus causing a single event upset (SEU)<sup>[6]</sup>. The critical value of the collected charge is called the critical charge (expressed as  $Q_{crit}$ ), and the frequency of soft errors for a device is designated the soft error rate (SER). The SER is related as shown in Eq. (1).

SER 
$$\propto N_{\text{flux}} A_{\text{node}} \exp\left(-\frac{Q_{\text{crit}}}{Q_{\text{s}}}\right),$$
 (1)

where  $N_{\text{flux}}$  is the intensity of the neutron flux,  $A_{\text{node}}$  is the area of the node and  $Q_s$  is the charge collection efficiency; we use  $Q_s = 12 \text{ fC}^{[7]}$ . The SER decreases exponentially with increasing  $Q_{\text{crit}}$  of cell nodes. The unit of SER is failures in time (FIT). One FIT is one failure in one billion hours. To evaluate the soft error vulnerability of circuits, we inject a current pulse at the sensitive node of the circuits to model a particle strike in a SPICE simulation. The pulse has a rapid rise time and a gradual fall time. The waveform of the pulse can be approximated

<sup>\*</sup> Project supported by the National Natural Science Foundation of China (No. 60703074) and the National High-Tech Research and Development Program of China (No. 2009AA01Z124).

<sup>†</sup> Corresponding author. Email: yansun@nudt.edu.cn

Received 26 March 2009, revised manuscript received 26 June 2009



Fig. 1. CAM cells for (a) 10T-NAND, (b) 11T-NAND, (c) 10T-NOR, and (d) 11T-NOR type.



Fig. 2. Waveforms when a particle strikes a CAM cell.

as Eq. (2).

$$I(t) = \frac{2Q}{\sqrt{\pi}} \sqrt{\frac{t}{T}} \exp\left(-\frac{t}{T}\right),$$
 (2)

where Q is the charge collected and T is the technology constant. We use T = 16 ps for our experiments<sup>[7]</sup>.

A CAM cell is made up of two parts: 6-T SRAM storage and a comparison circuit. There are two major types of CAM cell configuration (NAND and NOR types) according to the difference in the comparison circuit. Figures 1(a) to 1(d) show four kinds of CAM cells, which are 10T-NAND, 11T-NAND, 10T-NOR and 11T-NOR, respectively.

In CAM cells, nodes Q and NQ are the most sensitive regions. In order to determine  $Q_{crit}$  of these nodes, we performed SPICE simulations by injecting current pulses of Eq. (2) for various values of Q, between the drain and substrate of the off transistor. Figure 2 shows the waveforms of Q when a particle strikes in a CAM cell. The duration of the glitch at node Q measured at 200 mV from either the supply voltage or the ground voltage is defined as recovery time<sup>[8]</sup>. In Fig. 2, the state of node Q is recovered when the peak current modeled is 100 to 180  $\mu$ A. When the peak current is equal to or greater



Fig. 3.  $Q_{\text{crit}}$  of various CAM cells.

than 200  $\mu$ A, an SEU failure happens.

We simulated the four types of CAM cells shown in Fig. 1 using the 180, 130, 90 and 65 nm CMOS processes respectively. The  $Q_{\rm crit}$  of each type is compared in Fig. 3. As shown in the figure, for all types of CAM cells, the trend of  $Q_{\rm crit}$  from 180 to 90 nm continues to decrease. The line with symbols in Fig. 3 is the prediction of the SRAM cell  $Q_{\rm crit}$  referred to in Ref. [9]. The average  $Q_{\rm crit}$  in our experiment is lower than predicted, while the basic trends of CAM cell vulnerability are coincident with the prediction. With the VLSI feature size decreasing from 180 nm to 65 nm,  $Q_{\rm crit}$  of the memory cell decreases rapidly. In the sub-100 nm age, the soft error problem becomes a sufficient serious challenge. We must address the risks of soft errors in circuit design.

In each technology generation, the  $Q_{\rm crit}$  of the four kinds of CAM cells shows a similar situation. The 10T-NAND and 10T-NOR CAM cells have almost equal  $Q_{\rm crit}$  because they are the same in structure except for the connection of matchlines and comparison transistors.  $Q_{\rm crit}$  of 11T-NAND is also the same as the 11T-NOR CAM cell for the same reason. The two kinds of 11T CAM cells have larger  $Q_{\rm crit}$  than the 10T cells because the capacitance of sensitive nodes in 11T is bigger than that of 10T CAM cells. So 11T structures are more immune to soft errors than 10T structures are.

However, NAND type CAM operation is slower compared to NOR as the matchline delay increases approximately linearly with the stack depth. Hence a NAND CAM has not been commonly used in high-speed circuits, such as microprocessor caches<sup>[10]</sup>. The proposed reliable CAM cells in next section are all based on 11T-NOR structure, but the soft error immune circuits are also suitable for other types of CAM cells.

# 3. Soft error immune CAM design

In this section, we first present a stable-structure-based reliable CAM cell, and then propose two kinds of dual cell soft error immune reliable CAM cells. Finally, we provide a soft error protection mechanism.

### 3.1. SSB-RCAM

Rockett presents a stable structure<sup>[11]</sup>, as shown in Fig. 4(a). In this structure, carefully designed redundancy nodes and a feedback mechanism mean that the memory cells



Fig. 4. SSB-RCAM. (a) Basic stable structure. (b) Circuit topology of SSB-RCAM.



Fig. 5. Circuit topology of (a) DCF-RCAM and (b) DCK-RCAM.

can recover from upsets automatically<sup>[12]</sup>. Because the hardened CAM we present is based on the stable structure, this kind of CAM cell is called a stable-structure-based reliable CAM (SSB-RCAM). It does not need the resistors to uncouple, so it can perform high speed operations. Additionally, it can scale freely as technology progresses. Figure 4(b) shows the circuit topology of SSB-RCAM.

In SSB-RCAM structure, transistors M1 and M3 are cross coupling, and the sizes of them are larger than others. M2 and M4 are two pulldown transistors, and their sizes are correspondingly smaller. Transistors M1 to M4 constitute a basic stable structure, which can enhance the reliability of the memory cell by supplying charges continuously when Q or NQ are struck by particles. However, careful sizing of transistors M1 to M6 is required for the stable structure to achieve a balance between reliability, speed and area. Bigger transistors in a stable structure need more layout area and lead to overheads of at least 200%. Because the additional stable structure is not in the critical path, the speed decreases in write or match operations can be ignored.

## 3.2. DCF-RCAM and DCK-RCAM

In the next design, we add an SRAM cell to the original CAM cell and two cells connected through transmission gates



Fig. 6. Ignore mechanism in NOR dual cell RCAM.

for feedback. We call this structure a dual cell feedback reliable CAM (DCF-RCAM) because the two storage cells are always in a state of feedback, as illustrated in Fig. 5(a). Although the critical charge of DCF-RCAM is enhanced by the dual cell feedback, the driving strength is still a little weak because there are only pass transistors connecting two cells. We improve this to connect one of storage nodes into the supply or ground. The strong driving strength of the supply or ground makes the connected nodes more robust, as shown in Fig. 5(b). Because M1 and M2 perform a role like a keeping transistor in dynamic logic, we call this structure a dual cell keeping reliable CAM (DCK-RCAM).

Because the two cells watch over each other, the  $Q_{crit}$  of DCF-RCAM can be greatly improved. The added SRAM cell is set to minimum size, so the area overhead of DCF-RCAM is smaller than that of SSB-RCAM. However, the write delay increases due to the additional SRAM cell. Fortunately, the write delay increase is not too large and can be controlled by transmission gate sizing. Trends of operation delay, power and area of the DCK-RCAM are close to DCF-RCAM because of their similar structures. Because of the strong driving of M1 and M2, their sizes can be slightly smaller. The  $Q_{crit}$  of DCK-RCAM is bigger than DCF-RCAM, while area and power are a little smaller than the latter.

#### 3.3. Ignore mechanism

One bit upset in a CAM array may cause two different faults<sup>[2]</sup>. If an incoming search word is matched against a stored word when it should really have been mismatched, it is called a false positive match. The contrasting case is called a false negative match. The two kinds of false match may cause separate harm. Reference [2] lists both cases for CAMs in a write-through cache, a DTLB and a store buffer. A false positive match will cause the corresponding data array to deliver the incorrect entry, potentially causing incorrect execution, while, except for the store buffer, a false negative match would only result in a miss, causing the entry to be refetched rather than incorrectly executed.

The idea of our soft error protection mechanism is based on the inequality in harm of the two kinds of false matches. In dual cell redundancy CAMs, if any cell is struck by a particle, the states of two cells are different. A false match would happen and an error may propagate to the system. To reduce the SER of a reliability-critical system, we should eliminate



Fig. 7.  $Q_{crit}$  of the proposed RCAMs with different areas.

more harmful false positive matches as early as possible. That is, no matter whether it is actually a match or a mismatch, if an upset happens in the word, the result always becomes a mismatch. This soft error protection mechanism is called the ignore mechanism. Figure 6 shows a circuit diagram of the ignore mechanism in a NOR dual cell redundancy CAM.

The solid block in Fig. 6 is a redundancy CAM cell, and the circuit in the dashed block is a pulldown network. Normally, the two paths of M1-M2 and M3-M4 are turned off because there is always a closed NMOS. When an upset arises, one of the two paths is opened and the matchline will be pulled to "0". Theoretically, the SER of components with backup data (such as write-through caches and TLBs) can be reduced to zero using the ignore mechanism.

### 4. Results and discussion

The proposed reliable CAM cells are simulated using the 90 nm CMOS process. To evaluate the impact of transistor sizing on critical charge, delay and power, we try to simulate eight kinds of circuits for every structure. These circuits with different sizes are represented as 1x to 8x. The 1x one is the cell with minimum size. For SSB- and DCF-RCAM, the 2x one is the cell whose reliability enhancing transistors are 0.5 times bigger than 1x, and the rest are similar to this. Because of the strong driving strength of M1 and M2 in DCK-RCAM, the added circuit size of the 2x one is 0.4 times bigger than 1x. Figure 7 shows the  $Q_{crit}$  trends for the three kinds of RCAM cell with increasing area. Figure 8 shows the write delay, mismatch delay, power and area of the proposed RCAMs with different sizes.

In Fig. 7, the dashed lines are approximative curves according to the simulation result trends. We find that DCK-RCAM has the biggest  $Q_{crit}$  and therefore its vulnerability to soft errors is the lowest. The  $Q_{crit}$  of DCF-RCAM is bigger than SSB-RCAM when the area is smaller. With increasing area,  $Q_{crit}$  increases more and more slowly, especially in DCF-RCAM and DCK-RCAM. That is, excessive size increase is unnecessary.

The write delay of SSB-RCAM is almost constant, while the write delay of DCF- and DCK-RCAM increases and at most reaches about 60% larger than 1x, with DCK- being slightly bigger than DCF-RCAM. The mismatch delay of all RCAMs is nearly constant because reliability enhancing circuits have no impact on the match path. All three structures





100

(a)

Fig. 8. (a) Write delay, (b) mismatch delay, (c) power, and (d) area of the proposed RCAMs with different sizes.

Table 1.  $Q_{crit}$ , write delay, mismatch delay, power and area of different CAM cells

|     | $Q_{\rm crit}$ (fC) | Write<br>delay (ps) | Mismatch<br>delay (ps) | Power<br>(µW) | Area $(\mu m^2)$ |
|-----|---------------------|---------------------|------------------------|---------------|------------------|
| 10T | 5.06                | 50.0                | 19.5                   | 6.54          | 6.0              |
| 11T | 7.13                | 52.0                | 19.8                   | 6.12          | 5.2              |
| SSB | 25.61               | 61.7                | 19.5                   | 19.94         | 23.8             |
| DCF | 23.39               | 77.4                | 19.3                   | 18.64         | 12.2             |
| DCK | 25.92               | 79.9                | 18.7                   | 19.59         | 11.4             |

have a linearly increasing power consumption, but the trend for SSB-RCAM is faster than the others because of its large area. The most serious disadvantage of SSB-RCAM is its large area overheads, which are about 160% larger than 1x on average.

Table 1 lists details of  $Q_{crit}$ , write delay, mismatch delay, power and area of 10T-NOR, 11T-NOR CAM cells and the proposed RCAM cells. For the RCAM cells, the data in the table are average values for all eight sizes.

The 11T-CAM has advantages due to its bigger  $Q_{crit}$  and compact layout. The  $Q_{crit}$  of SSB and DCK are greater than DCF cell, and DCK has the best immunity against soft errors. Compared to the other two cells, the impact of the reliability circuits on the write delay of the SSB-RCAM cell is the lowest. However, it has the biggest power and area overheads. Besides the advantage in  $Q_{crit}$ , the area of DCK is also the lowest. In most applications, DCK is the perfect RCAM cell.

Figure 9 shows a comparison between matchline waveform in three scenarios: an 8-bit CAM word without SEU, with SEU (leading to a false positive match) and with SEU but employing the ignore mechanism. In Fig. 9, assuming the search word and the stored word in CAM differ in only one bit position, these two sets of bits are said to be apart by a Hamming distance of one<sup>[2]</sup>. When an SEU happens, the state of one bit in the CAM word is upset and might make two words equal. If there is no error protection mechanism, a false positive match will occur and will lead to data corruption or a system crash. In our experiment, the ignore mechanism solves the problem very well. We can see from Fig. 9 that once a bit upset caused by a particle strike has occurred, the matchline will be pulled down immediately in spite of the match or mismatch between two words. We



Fig. 9. Simulation waveform of the ignore mechanism.



Fig. 10. SER of different CAM cells.

simulated 40 groups of one Hamming distance stimuli in our experiment. The result shows that the ignore mechanism is able to eliminate all the false positive matches. Hence, for components with backup data, the ignore mechanism can reduce the SER to zero.

Figure 10 shows the SER reduction of radiation hardened reliable CAMs calculated using Eq. (2). Compared to the 10T-NOR CAM cell, there is about 81% SER reduction on average for all kinds of RCAM proposed. The average SER is shown in the dashed ellipse in Fig. 10. For different sizes of RCAMs, there is about 15% SER reduction on average. The mean value of SER for RCAMs is between  $3 \times$  and  $4 \times$  in size. In most applications, we prefer to select the small size to reduce overheads.

# 5. Conclusion

Continuous scaling of CMOS technology results in an increasing soft error vulnerability in VLSI circuits. CAMs are widely used in on-chip structures, so understanding the soft error problem of CAM structures is important. In this paper, we first study the impacts of soft errors on various types of CAM for different feature sizes. Experimental results indicate that the soft error vulnerability of sub-100 nm CAMs increases rapidly, and an 11T-NOR CAM cell has an advantage in soft error immunity. Then we present a soft error immune CAM cell, SSB-RCAM, and propose two kinds of dual cell reliable CAM cells, DCF-RCAM and DCK-RCAM. The structures proposed reduce SER by 81% on average. In addition, we propose an ignore mechanism to protect dual cell redundancy CAMs. Our future work will include how to reduce the overheads of delay, power and area incurred from reliability enhancing circuits.

### References

- Mukherjee S S, Emer J S, Reinhardt S K. The soft error problem: an architectural perspective. 11th International Symposium on High-Performance Computer Architecture, 2005: 243
- [2] Mukherjee S. Architecture design for soft errors. Burlington: Morgan Kaufmann Publishers, 2008
- [3] Pagiamtzis K, Sheikholeslami A. Content-addressable memory (CAM) circuits and architectures: a tutorial and survey. IEEE J Solid-State Circuits, 2006, 41(3): 712
- [4] Pagiamtzis K, Azizi N, Najm F N. A soft-error tolerant contentaddressable memory (CAM) using an error-correcting-match scheme. IEEE Custom Integrated Circuits Conference, 2006: 301
- [5] Azizi N, Najm F N. A family of cells to reduce the soft-error-rate in ternary-CAM. 43rd ACM/IEEE Design Automation Conference, 2006: 779
- [6] He Chaohui, Li Guozheng, Luo Jinsheng, et al. Analysis of single event upset in CMOS SRAMs. Chinese Journal of Semiconductors, 2000, 21(2): 174 (in Chinese)
- [7] Gill B S, Papachristou C, Wolff F G. A new asymmetric SRAM cell to reduce soft errors and leakage power in FPGA. Design, Automation and Test in Europe Conference and Exposition, 2007: 1
- [8] Lin S, Yang H, Luo R. A new family of sequential elements with built-in soft error tolerance for dual-VDD systems. IEEE Trans Very Large Scale Integration (VLSI) Syst, 2008, 16(10): 1372
- [9] Shivakumar P, Kistler M, Keckler S W, et al. Modeling the effect of technology trends on the soft error rate of combinational logic. International Dependable Systems and Networks, 2002: 389
- [10] Mupid A, Mutyam M, Vijaykrishnan N, et al. Variation analysis of CAM cells. 8th International Symposium on Quality Electronic Design, 2007: 333
- [11] Rockett L R. An SEU-hardened CMOS data latch design. IEEE Trans Nucl Sci, 1988, 35(6): 1682
- [12] Liu B, Chen S, Liang B. A novel low power SEU hardened storage cell. Chinese Journal of Semiconductors, 2007, 28(5): 755 (in Chinese)