high-speed and low-power design techinques for tcam macros

8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

1/11

530 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

High-Speed and Low-Power DesignTechniques for TCAM Macros

Chao-Ching Wang, Jinn-Shyan Wang, Member, IEEE, and Chingwei Yeh

AbstractTernary content addressable memory (TCAM) is animportant component for many applications. For TCAM-basednetworking systems, the rapidly growing size of routing tablesbrings with it the challenge to design higher search speeds andlower power consumption. In this work, two techniques are pro-posed to realize high-performance and low-power TCAM for IPaddress lookup. One technique is the tree AND-type match-linescheme for high search speed. The other technique is the seg-mented search-line scheme for low power. The implemented 1.8 V0.18 m 256 128b TCAM macro achieves a 1.56 ns search timeusing a 1.42 fJ/bit/search of energy.

Index TermsAssociative memories, content-addressable

memory, high speed, low power, PF-CDPD, pseudo-footless, seg-mented search line, tree match line.

I. INTRODUCTION

THE content addressable memory (CAM) is an important

component for accelerating data search operation in many

applications such as data base access [1], pattern matching

[2], signal processing [3], and networking IP address lookup

[4][6]. In some applications such as IP-address lookup,ternary

contentaddressablememory (TCAM) is required to implement

the masking function through storing (dont care) in the

TCAM cell. When storing in the TCAM cell, the cell datumis always regarded as matched with the search datum no matter

what the search datum is. Fig. 1 shows a search engine realized

with a TCAM. The input data are fed into the search lines

through the search-line buffers, and are then compared simulta-

neously with all the stored data in the TCAM array. Row-based

data search is performed to generate matching results through

match-line circuits. Previous works [7][14] have demonstrated

that the design of the match-line circuit has a major impact on

search speed and power consumption.

It is generally recognized that NOR-type match-line circuits

[7][9] achieve high search speed but at the expense of high

power consumption, while NAND-type match-line circuits[10], [11] are power efficient with the penalty of low speed.

Recently, an AND-type match-line circuit [12] constructed

with the pseudo-footless clock-and-data pre-charged dynamic

(PF-CDPD) logic was proposed to achieve not only high speed

but also low power.

Manuscript received February 24, 2007; revised August 27, 2007. This workwas supported by the National Science Council, the Ministry of Economic Af-fairs, and the National Si-Soft Project of Taiwan.

The authors are with the Department of Electrical Engineering, Na-tional Chung Cheng University, Chia-Yi, 621 Taiwan, R.O.C. (e-mail:[email protected]).

Digital Object Identifier 10.1109/JSSC.2007.914330

Fig. 1. A search engine realized by a TCAM.

The works in [9][12] show that the power consumption of

match-line circuits has been greatly reduced by the advance-

ment in match-line circuit techniques. The work in [13] used in

the IP address lookup application used low-swing search-line

circuits for further power reduction, and added the pipelining

technique to the NOR-type match-line circuit [9] for enhancing

throughput. This indeed increases the throughput, but the area

overhead resulting from the flip-flops and the clock driverfor pipelining makes this design not cost effective, and the

extra power consumption required for these added components

nullifies any power saving from new search-line circuits. On

the other hand, the research in [14] added the non-pipelined

split-path technique on top of the AND-type match-line circuit

[12] to achieve over 50% search speed improvement compared

to the pipelined NOR-type match-line scheme [13]. However,

compared to the original AND-type match-line design [12], the

power efficiency of the split-path AND-type match-line scheme

is sacrificed due to a much larger clock loading.

This work proposes both high-speed and low-power design

techniques [15] for TCAM macros. The speed enhancementtechnique is a tree AND-type match-line scheme, which can ef-

ficiently speed-up search operations with only a slight sacrifice

in energy efficiency due to a slightly more complex intercon-

nection. In addition, total power consumption can be reduced

through the proposed segmentedsearch-line scheme by utilizing

the specific feature of IP address lookup.

The rest of the paper is organized as follows. Section II de-

scribes the tree match-line circuitry, and Section III describes

the segmented search-line circuitry. Other design considerations

are described in Section IV. Test chip implementation and ex-

perimental results are presented in Section V. Finally, conclu-

sions are drawn in Section VI.

0018-9200/$25.00 2008 IEEE


2/11

WANGet al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS 531

Fig. 2. (a) The original cascaded AND-type match-line circuit. (b) Logic transformation.

II. TREEMATCH-LINECIRCUITRY

In the first part of this section, we will point out the prob-

lems with the split-path AND-type match-line circuit [14]. The

analysis emphasizes the design concept of the proposed treeAND-type match-line circuitry, which will be described in the

second part of this section.

A. Problems With the Split-Path AND-Type Match-Line

An original -stage PF-CDPD AND-type match-line circuit

[12] is shown in the upper part of Fig. 2(a), while the lower

part of Fig. 2(a) depicts the same circuit represented by logic

symbols. Except for the first gate, all other gates perform a

p-input AND function. The evolution from the cascaded AND-

type match-line to the split-path AND-type match-line is shown

in Fig. 2(b). There are separated p-input AND gates

in the split-path AND-type match-line. It was shown [14] thata 23.33% speed gain (delay reduced from 2.1 ns to 1.61 ns) is

obtained by the logic transformation, mainly due to a much sim-

pler critical-path circuitry.

However, the speed enhancement comes at a cost. For a

256 128b BiCAM macro designed in a 0.18 m CMOS tech-

nology [16], the energy efficiency deteriorates substantially,

from 2.33 fJ/bit/search to 4.83 fJ/bit/search [14]. Our analysis

indicates that the energy efficiency deterioration results from

three reasons. First, the clock driver needs to be enlarged

because all the separated gates need to be triggered by the clock

signal resulting in increased power consumption of the clock

driver. Second, all separated p-input AND gates are evaluated

independently. This means that the evaluation does not depend

on the evaluation results of any other gate, and the switching

activity of these gates will be higher than that of the p-input

AND gate in the cascaded design. Third, the number of logic

gates is increased, and hence the interconnections among the

logic gates and the parasitic capacitance are correspondingly

increased.

B. The Proposed Tree AND-Type Match-Line Circuitry

The basic concept behind the split-path AND-type match-line

circuit is that it tries a different way to implement a big AND

function originally realized by m cascaded AND gates. How-

ever, there are several ways to achieve the same goal, and threeof them are shown in Fig. 3 (assuming 64b in each half plane).

Fig. 3. (a) Parallel, (b) 3-level tree, and (c) 2-level tree AND-type match lines.

The design in Fig. 3(a) uses two short parallel match lines in

each half plane and merges the outputs from both planes into

a 4-input AND gate to generate the final matching result. On

the other hand, the design in Fig. 3(b) and (c) adopt a 3-level

and 2-level tree match-line circuit, respectively, in each half

plane, and use an 8-input and 4-input AND gate, respectively,

to generate thefinal matching results. The electrical behaviors,

including delay time and power consumption, are used to de-termine thefinal choice. The evaluation results are described as

follows.

Let us take the design of a 0.18 m 128b TCAM match line

as an example. Post-layout evaluation results of different im-

plementations are listed in Table I. All the designs use the same

TCAM cell for a fair evaluation, and the cell layout is shown in

Fig. 4. The impacts of the TCAM cell design and the cell layout

design will be described in Section IV.

The following are the observations from the extracted fea-

tures and parameters.

1) Both designs 1 and 2 have the deepest logic depth, but

design 1 performs a more complex function in the critical

path than design 2. So, design 1 has the longest searchdelay.


3/11


TABLE I

PERFORMANCE COMPARISONSBETWEENDIFFERENTMATCH LINES

Fig. 4. The TCAM cell layout used for match-line evaluation.

2) Designs 3, 4, and 5 have nearly 30% improvement on

search speed compared to design 1.

3) The differences of the delay times of designs 3, 4, and 5 do

not exceed 1.5%. Therefore, the final decision can be made

based on the power consumption.

4) Compared to 233.6 of power consumption of the cas-

caded design, the split-path design has 85% more power

consumption. On the other hand, compared to the cascaded

design, the parallel design and the 3-level tree design have

about 20% more power consumption and the 2-level design

has only 9% more power consumption.

5) Therefore, we adopt the 2-level tree match-line circuitry in

the TCAM design.

III. SEGMENTEDSEARCH-LINECIRCUITRY

In order to reduce power consumption, we must be aware

that a TCAM macro consumes power mainly for three parts: the

clock driver, match lines, and search lines. Due to the advance-

ment of match-line circuit techniques, the power consumption

of both match-line circuit and clock driver have been greatly re-duced. According to the data published in [9], [11], and [12],

the search time and the energy index breakdown normalized

at 1.2 V 0.1 m technology are shown in Fig. 5. We find that

the power consumption of search lines occupies about 54%,

71%, and 82% of the total power consumption of the CAM de-

signs [9], [11], and [12], respectively. To reduce the power con-

sumption for search lines, this study proposes the segmented

search-line technique for the TCAM macro in the application

of IP address lookup. In the following, we willfirst describe the

attributes of TCAM for IP address lookup, and then describe the

design of the segmented search-line circuitry.

A. Attributes of TCAM for IP Address Lookup

In Internet Protocol version 6 (IPv6), the length of an IP ad-

dress extends to 128 bits. In a routing table, the prefix region

stores either 0 or 1, and the rest stores . The statistic prefix

length distribution observed at a specific router [17] is shown in

Fig. 6(a). We find that more than 90% of IP addresses are shorter

than 64 bits. Therefore, when the routing table is constructed

with a TCAM array, a large portion of the array contains the

mask bits (i.e., the bits), as shown in Fig. 6(b).

B. The Proposed Segmented Search-Line Circuitry

Since the cells in Fig. 6(b) do nothing but pass matching

signals, they do not have to be involved with the search opera-tion. This property, when combined with the progressive layout

pattern, indicates that search lines behind the cells can be

turned off to save energy. The idea then leads to the segmented

search-line design as shown in Fig. 7. Many segmentation en-

tries (SEs) are inserted into the cell array. A segmentation entry

contains a row of segmentation cells (SCs), and SCs are used to

control signal propagation in the search lines.

The circuit containing an SC and two TCAM cells is shown in

Fig. 8. The SC is composed of a dummy cell and a path-control

switch. The word line (WL) for the upper TCAM cell is also

applied to the dummy cell. When writing an into the upper

TCAM cell, both WBLP and WBLN lines are raised to high. In

that case, the output of the dummy cell receives alowto cutoff the signal propagation, and the upper segment of the search


4/11


Fig. 5. Search time and energy index of conventional CAM macros.

Fig. 6. (a) The prefix length distribution of IP addresses, and (b) the corresponding TCAM array.

Fig. 7. Concept of the segmented search-line scheme.

lines (SBLNu and SBLPu) will be pulled down to ground. In

other words, when the ternary cell above the segmentation cell

(SC) stores an , the segmentation cell will automatically blockthe search data from propagating forward and so save energy.

Fig. 8. The circuit showing the relationship between the SC and neighboringTCAM cells.

The number and locations of segmentation entries can be de-

cided by the statistic features of the routing table. Once the

TCAM array has been designed, segmentation entries can not be

changed for a specific embedded application. If an entry needs

to be added to the look-up table, the table should be resorted atthe system level first, and then write operations are performed


5/11


TABLE II

PERFORMANCE COMPARISON BETWEENDIFFERENTINTERCONNECTIONMANNERS

Fig. 9. The proposed TCAM cell.

to update the table. A design example and experimental results

will be described in Sections IV and V.

IV. OTHERDESIGNCONSIDERATIONS

In this section, we describe the physical design considera-

tions for a 1.8 V 0.18 m 256 128b TCAM macro, including

the TCAM cell design, the interconnections among TCAMcells, the TCAM cell layout, and the design of the segmentation

entries.

A. TCAM Cell Design

Fig. 9 depicts the schematic of the proposed TCAM cell

for the AND-type match line. This cell uses two independent

latches for storing three possible kinds of data, similar to the

TCAM cell [8] used for the NOR-type match line. QN and QP

are the storage nodes, and they store complementary values

when the stored datum is either 0 or 1. MN and MP perform the

comparison (XOR) function between (QN, QP) and (SBLN,

SBLP). When the cell needs to store , both QN and QP shouldbe written 0 to turn off MN and MP to disable the XOR

operation. In the AND-type match line, transistor MC in this

case should always be turned on, and MX1 and MX2 are used

to charge node cell_out to a high voltage level VDDC for

this purpose.

This cell is almost identical to the TCAM cell used in [12]

except that VDDC is rather than . This change

considers the charge sharing effect (CSE). We adopt the same

simulation model [Fig. 10(a)] as that used in [12] to observe the

worst CSE of the proposed design. During the evaluation period,

the voltage at node out should be kept sufficiently low to guar-

antee a correct function. However, due to the CSE, there will be

a ripple or even a logic change at nodeout. The simulationresults reflecting the relationship between the undesired ripple

voltage at node out and the channel length (L) of the feedback

pMOS at typical (TT) and worst (SF) process corners are

shown in Fig. 10(b). The results indicate that ifcell_out is con-

nected to , the adjustable range of for maximal not

exceeding 0.4 V is very limited. Moreover, for the same ripple

voltage, say 0.15 V, the design with can

use a longer (0.5 m) and obtain a shorter gate delay (188 ps),

while the design with should use a shorter

(0.21 m) and get a larger gate delay (379 ps).

B. Interconnections Among TCAM Cells

All the gates in the proposed 2-level tree match line [Fig. 3(c)]

should be arranged in one row in the memory array. Therefore,

it is necessary to make a long interconnection to link two

branches of the tree. The way an interconnection is made influ-

ences the amount of parasitic capacitance and in turn influences

both search speed and power consumption. We have studied

two interconnection methods for performance evaluation.

Fig. 11(a) and (b) show the conceptual and layout diagrams

of straightforward and leap-frog interconnection methods,

respectively. Post-layout simulation results are summarized in

Table II.

The simulation data in Table I are based on the leap-frog in-

terconnection. The data in Table II reveal that if a straightfor-

ward interconnection is adopted, not only will the search be de-

layed but also the power consumption will increase. This effect

is mainly because the long interconnection in the straightfor-

ward manner lies in the critical path and results in a larger RC

product.

C. TCAM Cell Layout

Both evaluation results in Tables I and II are based on theTCAM cell shown in Fig. 4. In the following we show per-

formance evaluations based on different cell layouts. Fig. 4 is

a TCAM cell with an aspect ratio of 1.17. We designed two

other cell layouts with a small aspect ratio, as shown in Fig. 12.

Table III summarizes the post-layout evaluation results for a

128b 2-level tree match line (ML).

The data in Table III show that the smaller the aspect ratio

of the TCAM cell, the longer the search delay, the larger the

power consumption of the match-line but the smaller the capac-

itance on the search lines (SL). A good tradeoff is to use the de-

sign of Fig. 12(a) because it only sacrifices 1.97% search delay

but obtains 33% SL capacitance reduction. The overall power

reduction from 33% SL capacitance reduction will more thancompensate for the 4% ML power increase.


6/11


Fig. 10. (a) Simulation model for observing the CSE and (b) simulation results.

Fig. 11. (a) Straightforward and (b) leap-frog interconnection manners.

TABLE IIIIMPACT OF DIFFERENTCELL LAYOUTS

D. Design of Segmentation Entries

Although we can save more power if more SEsused, the prop-

agation delay of the search signal will be strongly affected by

the sizing of the SC and the number of series SCs along the

search line. To compromise search speed and power consump-

tion, we design the SEs with the following steps. First, assume

there is only one SC along the search line, and size the trans-mission gate (TG) in the SC. For a 256-entry TCAM macro, the

TG should be sized up to 2.4 times the minimal size so that the

voltage at nodecell_outin the TCAM cell can be pulled up to

from 0 V or pulled down to 0 V from

within thefirst half clock cycle to guarantee a safe match op-

eration in the second half clock cycle. We also found that, with

this sizing, the operation at cell_out becomes too slow even

if there are only two SCs along the search path, as shown inFig. 13. Second, increase the number of SEs along the search


7/11


Fig. 12. (a) The second style, and (b) the third style TCAM cell layouts.

Fig. 13. Simulation waveforms under different number of segmentation entries.

line for a larger power saving, but use the above sizing whileavoid further speed loss with the aid offloorplan design. Fig. 14

shows thefinalfloorplan of the 256 128b TCAM macro with

two SEs. One SE is located at the quarter and the other at the

half of the search line. The write and search buffers are located

at the center of the array so that they can drive search lines in

the upper and the lower half arrays simultaneously. With this

design, each search signal will pass only one SC although one

256b search line has two SCs.

In Fig. 14 we also show the diode used to generate VDDC.

The large diode is realized by many distributed small diodes

located at top and bottom of the cell array.

V. EXPERIMENTAL RESULTS

We implemented a 1.8 V 0.18 m TCAM test chip for veri-

fying the proposed design techniques. The critical-path circuit

of the TCAM macro is shown in Fig. 15(a). Before we can use

the TCAM for searching purposes, the TCAM array should be

filled with data using the write operation. The timing waveforms

for the write mode are shown in Fig. 15(b). When writing a

dont care, the corresponding mask bit will be set

as 0, and the corresponding bitline enable signal

and write bitlines ( , ) will be pulled low

and high, respectively. So, both storage nodes (QP and QN) of a

TCAM cell will be written a0and the inner nodecell_out

will be pulled up to the voltage level of VDDC as described ear-lier. When writing1or0,the signal goes high and

Fig. 14. Thefloorplan of the 256 2 128b TCAM macro.

one of the write bitlines will be pulled low according to the input

datum.

The timing waveforms for the search operation are shown

in Fig. 15(c). The signal is the internal clock signal for the

match circuit, and the complementary signal of the external

clock signalclk. When goes low, the match-line circuit enters

the pre-charge phase and the external datum and its complement

are fetched by the up-goingclkinto the search lines through the

search line buffers. In this phase, the datum on the search linesbegins to compare with all the data previously written and stored


8/11


Fig. 15. (a) Schematic of the critical-path circuit, (b) waveforms of the write operation, and (c) waveforms of the search operation.

into the TCAM array, and the voltage at nodecell_outof each

memory cell goes toward its final value. When goes high, the

match-line circuit enters the evaluation phase. Please refer to

[12] for the detailed operation of the PF-CDPD match-line cir-

cuit. All match lines are evaluated at the same time, and each

match line generates an output . will go high if

the search data matches with the stored data.

The block diagram of the test chip is shown in Fig. 16(a). The

TCAM macro contains two segmentation entries with the prefixlength of each being equal to 64 bits and 32 bits, respectively.

A voltage controlled oscillator (VCO) and a divide-by-two cir-

cuit are used to generate the clock signals with a 50% duty

cycle. The clock frequency range can be adjusted from 200 MHz

to 600 MHz. A dummy clock buffer synchronizes the rising

(falling) edge of the clock clkt for the peripheral circuits, and

the falling (rising) edge of the clock for the TCAM core.

The pre-stored data and the search data are generated by four

32b linear feedback shift registers (LFSRs), and the seed for

the LFSRs can be controlled for varying the data sequence. The

mask-bit control circuit is used to help generate the progres-

sive data pattern. The 8b counter is used for generating the ad-

dress for the write operation. In the beginning of measurement,the 4 32b LFSRs will generate a random pattern, which is

ANDed with the pattern output by the mask-bit control circuit

to generate the progressive pattern for the look-up table. When

doing matching operations, the random search data will be gen-

erated by LSFRs again with the same seed, and therefore each

search data will match with one of the stored data. In this sense,

the power is measured with one and only one match output per

clock cycle. The timing diagrams of the test chip are shown in

Fig. 16(b). From the timing diagrams, the search time can be

calculated as , where is themeasured clock cycle time, and is the simulated set-up time

of theflip-flop (5.2 ps).

The photograph of the test chip is shown in Fig. 17(a), and

the measured waveforms are shown in Fig. 17(b). The test chip

can run at a minimal clock cycle time of 3.12 ns at the typical

supply voltage of 1.8 V. In this case, the search time is calculated

as ns ps ns. The lowest working is

1.2 V as shown in Fig. 17(c), and the corresponding search time

is 5.63 ns. The energy indexes of the TCAM macro are measured

to be 1.42 and 0.63 fJ/bit/search at 1.8 V and 1.2 V, respectively.

Chip features extracted from the implemented chip are listed

in Table IV. We obtained eight samples from an educational pro-

gram, and all of them functioned correctly. The standard devia-tion of the search time and the energy index of these eight chips


9/11


Fig. 16. (a) The block diagram and (b) the timing diagrams of the test chip.

Fig. 17. (a) Photograph, (b) measurement waveforms, and (c) shmoo chart of the test chip.

TABLE IV

FEATURESSUMMARY OF THE TESTCHIP

are 0.021 ns and 0.012 fJ/bit/search, respectively. Thisresult im-

plies that the proposed design techniques are robust to processvariations.

Performance comparisons are illustrated in Table V. Even

the proposed TCAM cell is 11% larger than the conventional

TCAM cell used in [13], the normalized area per bit of the

proposed design is 60% smaller than the conventional high-

speed pipelined design [13]. When adopting the tree AND-type

match-line circuitry and the segmented search-line technique,

the 256 128b TCAM macro achieves a search time of 1.56 ns

and an energy index of 1.42 fJ/bit/search at 1.8 V. This achieve-

ment represents a 51% improvement in the energy index as com-

pared to the TCAM design in [13]. It also represents a 38% re-

duction in the minimal clock cycle time and a 39% improvement

in the energy index compared to the BiCAM design in [12]. Be-

cause the segmented search line (SSL) is an application-specific

technique, we also show the energy index in Table V for the pro-posed TCAM without using the SSL technique. Compared to


10/11


TABLE V

FEATURESSUMMARY ANDPERFORMANCE COMPARISON

TABLE VIOTHERPERFORMANCE COMPARISONS

the TCAM design [13], the proposed design still shows a 25%

improvement in the energy index.

In order to see how the speed and power are affected by the

bit width and CMOS technology, we have also implemented a

0.18 m 1.8 V 256 144b TCAM macro and a 0.13 m 1.2 V

256 128b TCAM macro. Table VI summarizes the design fea-

tures. When realizing a 144b match line, a four-input PF-CDPD

AND gate is added at the end of each branch of the 2-level tree

AND-type match line (refer to Fig. 3(c)). As compared to the

128b-wide TCAM macro, the search delay and the energy index

of the 144b-wide TCAM match line increase 12% and 23%, re-

spectively. On the other hand, comparing the 0.13 m 1.2 V

256 128b TCAM design to the 0.18 m 1.8 V 256 128b

TCAM design, the search delay and the energy index improve

29% and 75%, respectively. The results indicate the benefits

from the technology scaling.

VI. CONCLUSION

In this work, the tree AND-type match-line scheme is pro-posed for its high search speed, and the segmented search line

scheme for its high energy efficiency in the TCAM-based appli-

cation of IP address lookup. The design of the TCAM cell, in-

terconnections among TCAM cells, TCAM cell layout, and seg-

mentation entries are also described. The realized 1.8 V 0.18 m

256 128b TCAM macro achieves a search time of 1.56 ns with

1.42 fJ/bit/search energy.

ACKNOWLEDGMENT

The authors thank the Chip Implementation Center for sup-

porting the chip fabrication.

REFERENCES

[1] K.-J. Lin and C.-W. Wu,A low-power CAM design for LZ data com-pression,IEEE Trans. Comput., vol. 49, pp. 11391145, Oct. 2000.

[2] F. Yu, R. H. Katz, and T. V. Lakshman,Gigabit rate packet pattern-matching using TCAM,in Proc. IEEE ICNP, 2004, pp. 174183.

[3] T. Ikenaga andT. Ogura, A fully parallel 1-Mb CAMLSI for real-timepixel-parallel image processing,IEEE J. Solid-State Circuits, vol. 35,no. 4, pp. 536544, Apr. 2000.

[4] R. Sangireddy and A. K. Somani,High-speed IP routing with binarydecision diagrams basedhardware address lookup engine,IEEE J. Sel.Areas Commun., vol. 21, no. 5, pp. 513521, May 2003.


11/11


[5] T. Hayashi and T. Miyazaki, High-speed table lookup engine forIPv6 longest prefix match, in Proc. GLOBECOM99, 1999, vol. 2,pp. 15861571.

[6] N.-F. Huang, W.-E. Chen, J.-Y. Luo, and J.-M. Chen, Design ofmulti-field IPv6 packet classifiers using ternary CAMs, in Proc.GLOBECOM 2001, vol. 3, pp. 18771881.

[7] H. Miyatake, M. Tanaka, and Y. Mori, A design for high-speedlow-power CMOS fully parallel content-addressable memory macros,

IEEE J. Solid-State Circuits, vol. 36, no. 6, pp. 956968, Jun. 2001.[8] I. Arsovski, T. Chandler, and A. Sheikholeslami,A ternary content-addressable memory (TCAM) based on 4T static storage and including

a current-race sensing scheme, IEEE J. Solid-State Circuits, vol. 38,no. 1, pp. 155158, Jan. 2003.

[9] I. Arsovski and A. Sheikholeslami,A mismatch-dependent power al-location technique for match-line sensing in content-addressable mem-ories, IEEE J. Solid-State Circuits, vol. 38, no. 11, pp. 19581966,Nov. 2003.

[10] F. Shafai, K. J. Schultz, G. F. R. Gibson, A. G. Bluschke, and D. E.Somppi,Fully parallel 30-MHz, 2.5-Mb CAM,IEEE J. Solid-StateCircuits, vol. 33, no. 11, pp. 16901696, Nov. 1998.

[11] S. Choi, K. Sohn, and H.-J. Yoo,A 0.7 fJ/bit/search, 2.2 ns searchtime, hybrid type TCAM architecture, IEEE J. Solid-State Circuits,vol. 40, no. 1, pp. 254260, Jan. 2005.

[12] Li Hung-Yu, C.-C. Chen, J.-S. Wang, and C. Yeh, An AND-typematch-line scheme for high-performance energy-efficient content ad-

dressable memories, IEEE J. Solid-State Circuits, vol. 41, no. 5, pp.11081119, May 2006.

[13] K. Pagiamtzis and A. Sheikholeslami,A low-power content-address-able memory (CAM) using pipelined hierarchical search scheme,

IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 15121519, Sep. 2004.[14] C.-C. Chen, Li Hung-Yu, and J.-S. Wang, The split-path and-type

match-line scheme for very high-speed content addressablememories,inProc. Asian Solid-State Circuits Conf., 2005, pp. 525528.

[15] J.-S. Wang, C.-C. Wang, and C. Yeh,TCAM for IP-address lookupusing tree-style and-type match lines and segmented search lines,in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers , 2006, pp.

577586.[16] TSMC 0.18um mixed signal 1P6M + MIM salicide 1.8 V/3.3 V

process documents,Taiwan Semiconductor Manufacturing Co., Ltd.,T-018-MM-TM-002.

[17] BGP Table Statistics. 2006 [Online]. Available: http://bgp.potaroo.net& http://bgpview.6test.edu.cn

Chao-Ching Wang was born in Taiwan, R.O.C.,in 1981. He received the B.S. degree in electricalengineering from National Chung Cheng University,Taiwan, in 2003. He is currently working towardthe Ph.D. degree at the Institute of Electrical Engi-neering, National Chung Cheng University.

His research interests include high-speed,low-leakage, and low-power memory designs

Jinn-Shyan Wang (S85M88) was born inTaiwan, R.O.C., in 1959. He received the B.S.degree in electrical engineering from NationalCheng-Kung University, Tainan, Taiwan, in 1982,and the M.S. and Ph.D. degrees from the Instituteof Electronics, National Chiao-Tung University,

Hsinchu, Taiwan, in 1984 and 1988, respectively.He was with Industrial Technology Research In-

stitute (ITRI) from 1988 to 1995, engaged in ASICcircuit and system design, and became the Managerof the Department of VLSI Design. He joined the De-

partment of Electrical Engineering, National Chung-Cheng University,Chia-Yi,Taiwan, in 1995, where he is currently a full Professor. His research interestsare in low-power and high-speed digital integrated circuits and systems, analogintegrated circuits, IP and SOC design, and CMOS image sensors. He has pub-lishedover 20 journalpapers and40 conference papersand holds over 20 patentson VLSI circuits and architectures.

Chingwei Yehreceived the B.S. degree in electricalengineering from National Taiwan University,Taipei, Taiwan, in 1986, and the Ph.D. degreein electrical and computer engineering from theUniversity of California at San Diego in 1992.

Since then, he has been a faculty member with theElectrical Engineering Department, National Chung-

Cheng University, Taiwan. His research interests in-clude digital VLSI design and CAD.

high-speed and low-power design techinques for tcam macros

Documents