high-speed and low-power design techinques for tcam macros
TRANSCRIPT
-
8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros
1/11
530 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008
High-Speed and Low-Power DesignTechniques for TCAM Macros
Chao-Ching Wang, Jinn-Shyan Wang, Member, IEEE, and Chingwei Yeh
AbstractTernary content addressable memory (TCAM) is animportant component for many applications. For TCAM-basednetworking systems, the rapidly growing size of routing tablesbrings with it the challenge to design higher search speeds andlower power consumption. In this work, two techniques are pro-posed to realize high-performance and low-power TCAM for IPaddress lookup. One technique is the tree AND-type match-linescheme for high search speed. The other technique is the seg-mented search-line scheme for low power. The implemented 1.8 V0.18 m 256 128b TCAM macro achieves a 1.56 ns search timeusing a 1.42 fJ/bit/search of energy.
Index TermsAssociative memories, content-addressable
memory, high speed, low power, PF-CDPD, pseudo-footless, seg-mented search line, tree match line.
I. INTRODUCTION
THE content addressable memory (CAM) is an important
component for accelerating data search operation in many
applications such as data base access [1], pattern matching
[2], signal processing [3], and networking IP address lookup
[4][6]. In some applications such as IP-address lookup,ternary
contentaddressablememory (TCAM) is required to implement
the masking function through storing (dont care) in the
TCAM cell. When storing in the TCAM cell, the cell datumis always regarded as matched with the search datum no matter
what the search datum is. Fig. 1 shows a search engine realized
with a TCAM. The input data are fed into the search lines
through the search-line buffers, and are then compared simulta-
neously with all the stored data in the TCAM array. Row-based
data search is performed to generate matching results through
match-line circuits. Previous works [7][14] have demonstrated
that the design of the match-line circuit has a major impact on
search speed and power consumption.
It is generally recognized that NOR-type match-line circuits
[7][9] achieve high search speed but at the expense of high
power consumption, while NAND-type match-line circuits[10], [11] are power efficient with the penalty of low speed.
Recently, an AND-type match-line circuit [12] constructed
with the pseudo-footless clock-and-data pre-charged dynamic
(PF-CDPD) logic was proposed to achieve not only high speed
but also low power.
Manuscript received February 24, 2007; revised August 27, 2007. This workwas supported by the National Science Council, the Ministry of Economic Af-fairs, and the National Si-Soft Project of Taiwan.
The authors are with the Department of Electrical Engineering, Na-tional Chung Cheng University, Chia-Yi, 621 Taiwan, R.O.C. (e-mail:[email protected]).
Digital Object Identifier 10.1109/JSSC.2007.914330
Fig. 1. A search engine realized by a TCAM.
The works in [9][12] show that the power consumption of
match-line circuits has been greatly reduced by the advance-
ment in match-line circuit techniques. The work in [13] used in
the IP address lookup application used low-swing search-line
circuits for further power reduction, and added the pipelining
technique to the NOR-type match-line circuit [9] for enhancing
throughput. This indeed increases the throughput, but the area
overhead resulting from the flip-flops and the clock driverfor pipelining makes this design not cost effective, and the
extra power consumption required for these added components
nullifies any power saving from new search-line circuits. On
the other hand, the research in [14] added the non-pipelined
split-path technique on top of the AND-type match-line circuit
[12] to achieve over 50% search speed improvement compared
to the pipelined NOR-type match-line scheme [13]. However,
compared to the original AND-type match-line design [12], the
power efficiency of the split-path AND-type match-line scheme
is sacrificed due to a much larger clock loading.
This work proposes both high-speed and low-power design
techniques [15] for TCAM macros. The speed enhancementtechnique is a tree AND-type match-line scheme, which can ef-
ficiently speed-up search operations with only a slight sacrifice
in energy efficiency due to a slightly more complex intercon-
nection. In addition, total power consumption can be reduced
through the proposed segmentedsearch-line scheme by utilizing
the specific feature of IP address lookup.
The rest of the paper is organized as follows. Section II de-
scribes the tree match-line circuitry, and Section III describes
the segmented search-line circuitry. Other design considerations
are described in Section IV. Test chip implementation and ex-
perimental results are presented in Section V. Finally, conclu-
sions are drawn in Section VI.
0018-9200/$25.00 2008 IEEE
-
8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros
2/11
WANGet al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS 531
Fig. 2. (a) The original cascaded AND-type match-line circuit. (b) Logic transformation.
II. TREEMATCH-LINECIRCUITRY
In the first part of this section, we will point out the prob-
lems with the split-path AND-type match-line circuit [14]. The
analysis emphasizes the design concept of the proposed treeAND-type match-line circuitry, which will be described in the
second part of this section.
A. Problems With the Split-Path AND-Type Match-Line
An original -stage PF-CDPD AND-type match-line circuit
[12] is shown in the upper part of Fig. 2(a), while the lower
part of Fig. 2(a) depicts the same circuit represented by logic
symbols. Except for the first gate, all other gates perform a
p-input AND function. The evolution from the cascaded AND-
type match-line to the split-path AND-type match-line is shown
in Fig. 2(b). There are separated p-input AND gates
in the split-path AND-type match-line. It was shown [14] thata 23.33% speed gain (delay reduced from 2.1 ns to 1.61 ns) is
obtained by the logic transformation, mainly due to a much sim-
pler critical-path circuitry.
However, the speed enhancement comes at a cost. For a
256 128b BiCAM macro designed in a 0.18 m CMOS tech-
nology [16], the energy efficiency deteriorates substantially,
from 2.33 fJ/bit/search to 4.83 fJ/bit/search [14]. Our analysis
indicates that the energy efficiency deterioration results from
three reasons. First, the clock driver needs to be enlarged
because all the separated gates need to be triggered by the clock
signal resulting in increased power consumption of the clock
driver. Second, all separated p-input AND gates are evaluated
independently. This means that the evaluation does not depend
on the evaluation results of any other gate, and the switching
activity of these gates will be higher than that of the p-input
AND gate in the cascaded design. Third, the number of logic
gates is increased, and hence the interconnections among the
logic gates and the parasitic capacitance are correspondingly
increased.
B. The Proposed Tree AND-Type Match-Line Circuitry
The basic concept behind the split-path AND-type match-line
circuit is that it tries a different way to implement a big AND
function originally realized by m cascaded AND gates. How-
ever, there are several ways to achieve the same goal, and threeof them are shown in Fig. 3 (assuming 64b in each half plane).
Fig. 3. (a) Parallel, (b) 3-level tree, and (c) 2-level tree AND-type match lines.
The design in Fig. 3(a) uses two short parallel match lines in
each half plane and merges the outputs from both planes into
a 4-input AND gate to generate the final matching result. On
the other hand, the design in Fig. 3(b) and (c) adopt a 3-level
and 2-level tree match-line circuit, respectively, in each half
plane, and use an 8-input and 4-input AND gate, respectively,
to generate thefinal matching results. The electrical behaviors,
including delay time and power consumption, are used to de-termine thefinal choice. The evaluation results are described as
follows.
Let us take the design of a 0.18 m 128b TCAM match line
as an example. Post-layout evaluation results of different im-
plementations are listed in Table I. All the designs use the same
TCAM cell for a fair evaluation, and the cell layout is shown in
Fig. 4. The impacts of the TCAM cell design and the cell layout
design will be described in Section IV.
The following are the observations from the extracted fea-
tures and parameters.
1) Both designs 1 and 2 have the deepest logic depth, but
design 1 performs a more complex function in the critical
path than design 2. So, design 1 has the longest searchdelay.
-
8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros
3/11
532 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008
TABLE I
PERFORMANCE COMPARISONSBETWEENDIFFERENTMATCH LINES
Fig. 4. The TCAM cell layout used for match-line evaluation.
2) Designs 3, 4, and 5 have nearly 30% improvement on
search speed compared to design 1.
3) The differences of the delay times of designs 3, 4, and 5 do
not exceed 1.5%. Therefore, the final decision can be made
based on the power consumption.
4) Compared to 233.6 of power consumption of the cas-
caded design, the split-path design has 85% more power
consumption. On the other hand, compared to the cascaded
design, the parallel design and the 3-level tree design have
about 20% more power consumption and the 2-level design
has only 9% more power consumption.
5) Therefore, we adopt the 2-level tree match-line circuitry in
the TCAM design.
III. SEGMENTEDSEARCH-LINECIRCUITRY
In order to reduce power consumption, we must be aware
that a TCAM macro consumes power mainly for three parts: the
clock driver, match lines, and search lines. Due to the advance-
ment of match-line circuit techniques, the power consumption
of both match-line circuit and clock driver have been greatly re-duced. According to the data published in [9], [11], and [12],
the search time and the energy index breakdown normalized
at 1.2 V 0.1 m technology are shown in Fig. 5. We find that
the power consumption of search lines occupies about 54%,
71%, and 82% of the total power consumption of the CAM de-
signs [9], [11], and [12], respectively. To reduce the power con-
sumption for search lines, this study proposes the segmented
search-line technique for the TCAM macro in the application
of IP address lookup. In the following, we willfirst describe the
attributes of TCAM for IP address lookup, and then describe the
design of the segmented search-line circuitry.
A. Attributes of TCAM for IP Address Lookup
In Internet Protocol version 6 (IPv6), the length of an IP ad-
dress extends to 128 bits. In a routing table, the prefix region
stores either 0 or 1, and the rest stores . The statistic prefix
length distribution observed at a specific router [17] is shown in
Fig. 6(a). We find that more than 90% of IP addresses are shorter
than 64 bits. Therefore, when the routing table is constructed
with a TCAM array, a large portion of the array contains the
mask bits (i.e., the bits), as shown in Fig. 6(b).
B. The Proposed Segmented Search-Line Circuitry
Since the cells in Fig. 6(b) do nothing but pass matching
signals, they do not have to be involved with the search opera-tion. This property, when combined with the progressive layout
pattern, indicates that search lines behind the cells can be
turned off to save energy. The idea then leads to the segmented
search-line design as shown in Fig. 7. Many segmentation en-
tries (SEs) are inserted into the cell array. A segmentation entry
contains a row of segmentation cells (SCs), and SCs are used to
control signal propagation in the search lines.
The circuit containing an SC and two TCAM cells is shown in
Fig. 8. The SC is composed of a dummy cell and a path-control
switch. The word line (WL) for the upper TCAM cell is also
applied to the dummy cell. When writing an into the upper
TCAM cell, both WBLP and WBLN lines are raised to high. In
that case, the output of the dummy cell receives alowto cutoff the signal propagation, and the upper segment of the search
-
8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros
4/11
WANGet al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS 533
Fig. 5. Search time and energy index of conventional CAM macros.
Fig. 6. (a) The prefix length distribution of IP addresses, and (b) the corresponding TCAM array.
Fig. 7. Concept of the segmented search-line scheme.
lines (SBLNu and SBLPu) will be pulled down to ground. In
other words, when the ternary cell above the segmentation cell
(SC) stores an , the segmentation cell will automatically blockthe search data from propagating forward and so save energy.
Fig. 8. The circuit showing the relationship between the SC and neighboringTCAM cells.
The number and locations of segmentation entries can be de-
cided by the statistic features of the routing table. Once the
TCAM array has been designed, segmentation entries can not be
changed for a specific embedded application. If an entry needs
to be added to the look-up table, the table should be resorted atthe system level first, and then write operations are performed
-
8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros
5/11
534 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008
TABLE II
PERFORMANCE COMPARISON BETWEENDIFFERENTINTERCONNECTIONMANNERS
Fig. 9. The proposed TCAM cell.
to update the table. A design example and experimental results
will be described in Sections IV and V.
IV. OTHERDESIGNCONSIDERATIONS
In this section, we describe the physical design considera-
tions for a 1.8 V 0.18 m 256 128b TCAM macro, including
the TCAM cell design, the interconnections among TCAMcells, the TCAM cell layout, and the design of the segmentation
entries.
A. TCAM Cell Design
Fig. 9 depicts the schematic of the proposed TCAM cell
for the AND-type match line. This cell uses two independent
latches for storing three possible kinds of data, similar to the
TCAM cell [8] used for the NOR-type match line. QN and QP
are the storage nodes, and they store complementary values
when the stored datum is either 0 or 1. MN and MP perform the
comparison (XOR) function between (QN, QP) and (SBLN,
SBLP). When the cell needs to store , both QN and QP shouldbe written 0 to turn off MN and MP to disable the XOR
operation. In the AND-type match line, transistor MC in this
case should always be turned on, and MX1 and MX2 are used
to charge node cell_out to a high voltage level VDDC for
this purpose.
This cell is almost identical to the TCAM cell used in [12]
except that VDDC is rather than . This change
considers the charge sharing effect (CSE). We adopt the same
simulation model [Fig. 10(a)] as that used in [12] to observe the
worst CSE of the proposed design. During the evaluation period,
the voltage at node out should be kept sufficiently low to guar-
antee a correct function. However, due to the CSE, there will be
a ripple or even a logic change at nodeout. The simulationresults reflecting the relationship between the undesired ripple
voltage at node out and the channel length (L) of the feedback
pMOS at typical (TT) and worst (SF) process corners are
shown in Fig. 10(b). The results indicate that ifcell_out is con-
nected to , the adjustable range of for maximal not
exceeding 0.4 V is very limited. Moreover, for the same ripple
voltage, say 0.15 V, the design with can
use a longer (0.5 m) and obtain a shorter gate delay (188 ps),
while the design with should use a shorter
(0.21 m) and get a larger gate delay (379 ps).
B. Interconnections Among TCAM Cells
All the gates in the proposed 2-level tree match line [Fig. 3(c)]
should be arranged in one row in the memory array. Therefore,
it is necessary to make a long interconnection to link two
branches of the tree. The way an interconnection is made influ-
ences the amount of parasitic capacitance and in turn influences
both search speed and power consumption. We have studied
two interconnection methods for performance evaluation.
Fig. 11(a) and (b) show the conceptual and layout diagrams
of straightforward and leap-frog interconnection methods,
respectively. Post-layout simulation results are summarized in
Table II.
The simulation data in Table I are based on the leap-frog in-
terconnection. The data in Table II reveal that if a straightfor-
ward interconnection is adopted, not only will the search be de-
layed but also the power consumption will increase. This effect
is mainly because the long interconnection in the straightfor-
ward manner lies in the critical path and results in a larger RC
product.
C. TCAM Cell Layout
Both evaluation results in Tables I and II are based on theTCAM cell shown in Fig. 4. In the following we show per-
formance evaluations based on different cell layouts. Fig. 4 is
a TCAM cell with an aspect ratio of 1.17. We designed two
other cell layouts with a small aspect ratio, as shown in Fig. 12.
Table III summarizes the post-layout evaluation results for a
128b 2-level tree match line (ML).
The data in Table III show that the smaller the aspect ratio
of the TCAM cell, the longer the search delay, the larger the
power consumption of the match-line but the smaller the capac-
itance on the search lines (SL). A good tradeoff is to use the de-
sign of Fig. 12(a) because it only sacrifices 1.97% search delay
but obtains 33% SL capacitance reduction. The overall power
reduction from 33% SL capacitance reduction will more thancompensate for the 4% ML power increase.
-
8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros
6/11
WANGet al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS 535
Fig. 10. (a) Simulation model for observing the CSE and (b) simulation results.
Fig. 11. (a) Straightforward and (b) leap-frog interconnection manners.
TABLE IIIIMPACT OF DIFFERENTCELL LAYOUTS
D. Design of Segmentation Entries
Although we can save more power if more SEsused, the prop-
agation delay of the search signal will be strongly affected by
the sizing of the SC and the number of series SCs along the
search line. To compromise search speed and power consump-
tion, we design the SEs with the following steps. First, assume
there is only one SC along the search line, and size the trans-mission gate (TG) in the SC. For a 256-entry TCAM macro, the
TG should be sized up to 2.4 times the minimal size so that the
voltage at nodecell_outin the TCAM cell can be pulled up to
from 0 V or pulled down to 0 V from
within thefirst half clock cycle to guarantee a safe match op-
eration in the second half clock cycle. We also found that, with
this sizing, the operation at cell_out becomes too slow even
if there are only two SCs along the search path, as shown inFig. 13. Second, increase the number of SEs along the search
-
8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros
7/11
536 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008
Fig. 12. (a) The second style, and (b) the third style TCAM cell layouts.
Fig. 13. Simulation waveforms under different number of segmentation entries.
line for a larger power saving, but use the above sizing whileavoid further speed loss with the aid offloorplan design. Fig. 14
shows thefinalfloorplan of the 256 128b TCAM macro with
two SEs. One SE is located at the quarter and the other at the
half of the search line. The write and search buffers are located
at the center of the array so that they can drive search lines in
the upper and the lower half arrays simultaneously. With this
design, each search signal will pass only one SC although one
256b search line has two SCs.
In Fig. 14 we also show the diode used to generate VDDC.
The large diode is realized by many distributed small diodes
located at top and bottom of the cell array.
V. EXPERIMENTAL RESULTS
We implemented a 1.8 V 0.18 m TCAM test chip for veri-
fying the proposed design techniques. The critical-path circuit
of the TCAM macro is shown in Fig. 15(a). Before we can use
the TCAM for searching purposes, the TCAM array should be
filled with data using the write operation. The timing waveforms
for the write mode are shown in Fig. 15(b). When writing a
dont care, the corresponding mask bit will be set
as 0, and the corresponding bitline enable signal
and write bitlines ( , ) will be pulled low
and high, respectively. So, both storage nodes (QP and QN) of a
TCAM cell will be written a0and the inner nodecell_out
will be pulled up to the voltage level of VDDC as described ear-lier. When writing1or0,the signal goes high and
Fig. 14. Thefloorplan of the 256 2 128b TCAM macro.
one of the write bitlines will be pulled low according to the input
datum.
The timing waveforms for the search operation are shown
in Fig. 15(c). The signal is the internal clock signal for the
match circuit, and the complementary signal of the external
clock signalclk. When goes low, the match-line circuit enters
the pre-charge phase and the external datum and its complement
are fetched by the up-goingclkinto the search lines through the
search line buffers. In this phase, the datum on the search linesbegins to compare with all the data previously written and stored
-
8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros
8/11
WANGet al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS 537
Fig. 15. (a) Schematic of the critical-path circuit, (b) waveforms of the write operation, and (c) waveforms of the search operation.
into the TCAM array, and the voltage at nodecell_outof each
memory cell goes toward its final value. When goes high, the
match-line circuit enters the evaluation phase. Please refer to
[12] for the detailed operation of the PF-CDPD match-line cir-
cuit. All match lines are evaluated at the same time, and each
match line generates an output . will go high if
the search data matches with the stored data.
The block diagram of the test chip is shown in Fig. 16(a). The
TCAM macro contains two segmentation entries with the prefixlength of each being equal to 64 bits and 32 bits, respectively.
A voltage controlled oscillator (VCO) and a divide-by-two cir-
cuit are used to generate the clock signals with a 50% duty
cycle. The clock frequency range can be adjusted from 200 MHz
to 600 MHz. A dummy clock buffer synchronizes the rising
(falling) edge of the clock clkt for the peripheral circuits, and
the falling (rising) edge of the clock for the TCAM core.
The pre-stored data and the search data are generated by four
32b linear feedback shift registers (LFSRs), and the seed for
the LFSRs can be controlled for varying the data sequence. The
mask-bit control circuit is used to help generate the progres-
sive data pattern. The 8b counter is used for generating the ad-
dress for the write operation. In the beginning of measurement,the 4 32b LFSRs will generate a random pattern, which is
ANDed with the pattern output by the mask-bit control circuit
to generate the progressive pattern for the look-up table. When
doing matching operations, the random search data will be gen-
erated by LSFRs again with the same seed, and therefore each
search data will match with one of the stored data. In this sense,
the power is measured with one and only one match output per
clock cycle. The timing diagrams of the test chip are shown in
Fig. 16(b). From the timing diagrams, the search time can be
calculated as , where is themeasured clock cycle time, and is the simulated set-up time
of theflip-flop (5.2 ps).
The photograph of the test chip is shown in Fig. 17(a), and
the measured waveforms are shown in Fig. 17(b). The test chip
can run at a minimal clock cycle time of 3.12 ns at the typical
supply voltage of 1.8 V. In this case, the search time is calculated
as ns ps ns. The lowest working is
1.2 V as shown in Fig. 17(c), and the corresponding search time
is 5.63 ns. The energy indexes of the TCAM macro are measured
to be 1.42 and 0.63 fJ/bit/search at 1.8 V and 1.2 V, respectively.
Chip features extracted from the implemented chip are listed
in Table IV. We obtained eight samples from an educational pro-
gram, and all of them functioned correctly. The standard devia-tion of the search time and the energy index of these eight chips
-
8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros
9/11
538 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008
Fig. 16. (a) The block diagram and (b) the timing diagrams of the test chip.
Fig. 17. (a) Photograph, (b) measurement waveforms, and (c) shmoo chart of the test chip.
TABLE IV
FEATURESSUMMARY OF THE TESTCHIP
are 0.021 ns and 0.012 fJ/bit/search, respectively. Thisresult im-
plies that the proposed design techniques are robust to processvariations.
Performance comparisons are illustrated in Table V. Even
the proposed TCAM cell is 11% larger than the conventional
TCAM cell used in [13], the normalized area per bit of the
proposed design is 60% smaller than the conventional high-
speed pipelined design [13]. When adopting the tree AND-type
match-line circuitry and the segmented search-line technique,
the 256 128b TCAM macro achieves a search time of 1.56 ns
and an energy index of 1.42 fJ/bit/search at 1.8 V. This achieve-
ment represents a 51% improvement in the energy index as com-
pared to the TCAM design in [13]. It also represents a 38% re-
duction in the minimal clock cycle time and a 39% improvement
in the energy index compared to the BiCAM design in [12]. Be-
cause the segmented search line (SSL) is an application-specific
technique, we also show the energy index in Table V for the pro-posed TCAM without using the SSL technique. Compared to
-
8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros
10/11
WANGet al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS 539
TABLE V
FEATURESSUMMARY ANDPERFORMANCE COMPARISON
TABLE VIOTHERPERFORMANCE COMPARISONS
the TCAM design [13], the proposed design still shows a 25%
improvement in the energy index.
In order to see how the speed and power are affected by the
bit width and CMOS technology, we have also implemented a
0.18 m 1.8 V 256 144b TCAM macro and a 0.13 m 1.2 V
256 128b TCAM macro. Table VI summarizes the design fea-
tures. When realizing a 144b match line, a four-input PF-CDPD
AND gate is added at the end of each branch of the 2-level tree
AND-type match line (refer to Fig. 3(c)). As compared to the
128b-wide TCAM macro, the search delay and the energy index
of the 144b-wide TCAM match line increase 12% and 23%, re-
spectively. On the other hand, comparing the 0.13 m 1.2 V
256 128b TCAM design to the 0.18 m 1.8 V 256 128b
TCAM design, the search delay and the energy index improve
29% and 75%, respectively. The results indicate the benefits
from the technology scaling.
VI. CONCLUSION
In this work, the tree AND-type match-line scheme is pro-posed for its high search speed, and the segmented search line
scheme for its high energy efficiency in the TCAM-based appli-
cation of IP address lookup. The design of the TCAM cell, in-
terconnections among TCAM cells, TCAM cell layout, and seg-
mentation entries are also described. The realized 1.8 V 0.18 m
256 128b TCAM macro achieves a search time of 1.56 ns with
1.42 fJ/bit/search energy.
ACKNOWLEDGMENT
The authors thank the Chip Implementation Center for sup-
porting the chip fabrication.
REFERENCES
[1] K.-J. Lin and C.-W. Wu,A low-power CAM design for LZ data com-pression,IEEE Trans. Comput., vol. 49, pp. 11391145, Oct. 2000.
[2] F. Yu, R. H. Katz, and T. V. Lakshman,Gigabit rate packet pattern-matching using TCAM,in Proc. IEEE ICNP, 2004, pp. 174183.
[3] T. Ikenaga andT. Ogura, A fully parallel 1-Mb CAMLSI for real-timepixel-parallel image processing,IEEE J. Solid-State Circuits, vol. 35,no. 4, pp. 536544, Apr. 2000.
[4] R. Sangireddy and A. K. Somani,High-speed IP routing with binarydecision diagrams basedhardware address lookup engine,IEEE J. Sel.Areas Commun., vol. 21, no. 5, pp. 513521, May 2003.
-
8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros
11/11
540 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008
[5] T. Hayashi and T. Miyazaki, High-speed table lookup engine forIPv6 longest prefix match, in Proc. GLOBECOM99, 1999, vol. 2,pp. 15861571.
[6] N.-F. Huang, W.-E. Chen, J.-Y. Luo, and J.-M. Chen, Design ofmulti-field IPv6 packet classifiers using ternary CAMs, in Proc.GLOBECOM 2001, vol. 3, pp. 18771881.
[7] H. Miyatake, M. Tanaka, and Y. Mori, A design for high-speedlow-power CMOS fully parallel content-addressable memory macros,
IEEE J. Solid-State Circuits, vol. 36, no. 6, pp. 956968, Jun. 2001.[8] I. Arsovski, T. Chandler, and A. Sheikholeslami,A ternary content-addressable memory (TCAM) based on 4T static storage and including
a current-race sensing scheme, IEEE J. Solid-State Circuits, vol. 38,no. 1, pp. 155158, Jan. 2003.
[9] I. Arsovski and A. Sheikholeslami,A mismatch-dependent power al-location technique for match-line sensing in content-addressable mem-ories, IEEE J. Solid-State Circuits, vol. 38, no. 11, pp. 19581966,Nov. 2003.
[10] F. Shafai, K. J. Schultz, G. F. R. Gibson, A. G. Bluschke, and D. E.Somppi,Fully parallel 30-MHz, 2.5-Mb CAM,IEEE J. Solid-StateCircuits, vol. 33, no. 11, pp. 16901696, Nov. 1998.
[11] S. Choi, K. Sohn, and H.-J. Yoo,A 0.7 fJ/bit/search, 2.2 ns searchtime, hybrid type TCAM architecture, IEEE J. Solid-State Circuits,vol. 40, no. 1, pp. 254260, Jan. 2005.
[12] Li Hung-Yu, C.-C. Chen, J.-S. Wang, and C. Yeh, An AND-typematch-line scheme for high-performance energy-efficient content ad-
dressable memories, IEEE J. Solid-State Circuits, vol. 41, no. 5, pp.11081119, May 2006.
[13] K. Pagiamtzis and A. Sheikholeslami,A low-power content-address-able memory (CAM) using pipelined hierarchical search scheme,
IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 15121519, Sep. 2004.[14] C.-C. Chen, Li Hung-Yu, and J.-S. Wang, The split-path and-type
match-line scheme for very high-speed content addressablememories,inProc. Asian Solid-State Circuits Conf., 2005, pp. 525528.
[15] J.-S. Wang, C.-C. Wang, and C. Yeh,TCAM for IP-address lookupusing tree-style and-type match lines and segmented search lines,in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers , 2006, pp.
577586.[16] TSMC 0.18um mixed signal 1P6M + MIM salicide 1.8 V/3.3 V
process documents,Taiwan Semiconductor Manufacturing Co., Ltd.,T-018-MM-TM-002.
[17] BGP Table Statistics. 2006 [Online]. Available: http://bgp.potaroo.net& http://bgpview.6test.edu.cn
Chao-Ching Wang was born in Taiwan, R.O.C.,in 1981. He received the B.S. degree in electricalengineering from National Chung Cheng University,Taiwan, in 2003. He is currently working towardthe Ph.D. degree at the Institute of Electrical Engi-neering, National Chung Cheng University.
His research interests include high-speed,low-leakage, and low-power memory designs
Jinn-Shyan Wang (S85M88) was born inTaiwan, R.O.C., in 1959. He received the B.S.degree in electrical engineering from NationalCheng-Kung University, Tainan, Taiwan, in 1982,and the M.S. and Ph.D. degrees from the Instituteof Electronics, National Chiao-Tung University,
Hsinchu, Taiwan, in 1984 and 1988, respectively.He was with Industrial Technology Research In-
stitute (ITRI) from 1988 to 1995, engaged in ASICcircuit and system design, and became the Managerof the Department of VLSI Design. He joined the De-
partment of Electrical Engineering, National Chung-Cheng University,Chia-Yi,Taiwan, in 1995, where he is currently a full Professor. His research interestsare in low-power and high-speed digital integrated circuits and systems, analogintegrated circuits, IP and SOC design, and CMOS image sensors. He has pub-lishedover 20 journalpapers and40 conference papersand holds over 20 patentson VLSI circuits and architectures.
Chingwei Yehreceived the B.S. degree in electricalengineering from National Taiwan University,Taipei, Taiwan, in 1986, and the Ph.D. degreein electrical and computer engineering from theUniversity of California at San Diego in 1992.
Since then, he has been a faculty member with theElectrical Engineering Department, National Chung-
Cheng University, Taiwan. His research interests in-clude digital VLSI design and CAD.