high-speed and low-power design techinques for tcam macros

Upload: naga-karthik

Post on 02-Jun-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

    1/11

    530 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

    High-Speed and Low-Power DesignTechniques for TCAM Macros

    Chao-Ching Wang, Jinn-Shyan Wang, Member, IEEE, and Chingwei Yeh

    AbstractTernary content addressable memory (TCAM) is animportant component for many applications. For TCAM-basednetworking systems, the rapidly growing size of routing tablesbrings with it the challenge to design higher search speeds andlower power consumption. In this work, two techniques are pro-posed to realize high-performance and low-power TCAM for IPaddress lookup. One technique is the tree AND-type match-linescheme for high search speed. The other technique is the seg-mented search-line scheme for low power. The implemented 1.8 V0.18 m 256 128b TCAM macro achieves a 1.56 ns search timeusing a 1.42 fJ/bit/search of energy.

    Index TermsAssociative memories, content-addressable

    memory, high speed, low power, PF-CDPD, pseudo-footless, seg-mented search line, tree match line.

    I. INTRODUCTION

    THE content addressable memory (CAM) is an important

    component for accelerating data search operation in many

    applications such as data base access [1], pattern matching

    [2], signal processing [3], and networking IP address lookup

    [4][6]. In some applications such as IP-address lookup,ternary

    contentaddressablememory (TCAM) is required to implement

    the masking function through storing (dont care) in the

    TCAM cell. When storing in the TCAM cell, the cell datumis always regarded as matched with the search datum no matter

    what the search datum is. Fig. 1 shows a search engine realized

    with a TCAM. The input data are fed into the search lines

    through the search-line buffers, and are then compared simulta-

    neously with all the stored data in the TCAM array. Row-based

    data search is performed to generate matching results through

    match-line circuits. Previous works [7][14] have demonstrated

    that the design of the match-line circuit has a major impact on

    search speed and power consumption.

    It is generally recognized that NOR-type match-line circuits

    [7][9] achieve high search speed but at the expense of high

    power consumption, while NAND-type match-line circuits[10], [11] are power efficient with the penalty of low speed.

    Recently, an AND-type match-line circuit [12] constructed

    with the pseudo-footless clock-and-data pre-charged dynamic

    (PF-CDPD) logic was proposed to achieve not only high speed

    but also low power.

    Manuscript received February 24, 2007; revised August 27, 2007. This workwas supported by the National Science Council, the Ministry of Economic Af-fairs, and the National Si-Soft Project of Taiwan.

    The authors are with the Department of Electrical Engineering, Na-tional Chung Cheng University, Chia-Yi, 621 Taiwan, R.O.C. (e-mail:[email protected]).

    Digital Object Identifier 10.1109/JSSC.2007.914330

    Fig. 1. A search engine realized by a TCAM.

    The works in [9][12] show that the power consumption of

    match-line circuits has been greatly reduced by the advance-

    ment in match-line circuit techniques. The work in [13] used in

    the IP address lookup application used low-swing search-line

    circuits for further power reduction, and added the pipelining

    technique to the NOR-type match-line circuit [9] for enhancing

    throughput. This indeed increases the throughput, but the area

    overhead resulting from the flip-flops and the clock driverfor pipelining makes this design not cost effective, and the

    extra power consumption required for these added components

    nullifies any power saving from new search-line circuits. On

    the other hand, the research in [14] added the non-pipelined

    split-path technique on top of the AND-type match-line circuit

    [12] to achieve over 50% search speed improvement compared

    to the pipelined NOR-type match-line scheme [13]. However,

    compared to the original AND-type match-line design [12], the

    power efficiency of the split-path AND-type match-line scheme

    is sacrificed due to a much larger clock loading.

    This work proposes both high-speed and low-power design

    techniques [15] for TCAM macros. The speed enhancementtechnique is a tree AND-type match-line scheme, which can ef-

    ficiently speed-up search operations with only a slight sacrifice

    in energy efficiency due to a slightly more complex intercon-

    nection. In addition, total power consumption can be reduced

    through the proposed segmentedsearch-line scheme by utilizing

    the specific feature of IP address lookup.

    The rest of the paper is organized as follows. Section II de-

    scribes the tree match-line circuitry, and Section III describes

    the segmented search-line circuitry. Other design considerations

    are described in Section IV. Test chip implementation and ex-

    perimental results are presented in Section V. Finally, conclu-

    sions are drawn in Section VI.

    0018-9200/$25.00 2008 IEEE

  • 8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

    2/11

    WANGet al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS 531

    Fig. 2. (a) The original cascaded AND-type match-line circuit. (b) Logic transformation.

    II. TREEMATCH-LINECIRCUITRY

    In the first part of this section, we will point out the prob-

    lems with the split-path AND-type match-line circuit [14]. The

    analysis emphasizes the design concept of the proposed treeAND-type match-line circuitry, which will be described in the

    second part of this section.

    A. Problems With the Split-Path AND-Type Match-Line

    An original -stage PF-CDPD AND-type match-line circuit

    [12] is shown in the upper part of Fig. 2(a), while the lower

    part of Fig. 2(a) depicts the same circuit represented by logic

    symbols. Except for the first gate, all other gates perform a

    p-input AND function. The evolution from the cascaded AND-

    type match-line to the split-path AND-type match-line is shown

    in Fig. 2(b). There are separated p-input AND gates

    in the split-path AND-type match-line. It was shown [14] thata 23.33% speed gain (delay reduced from 2.1 ns to 1.61 ns) is

    obtained by the logic transformation, mainly due to a much sim-

    pler critical-path circuitry.

    However, the speed enhancement comes at a cost. For a

    256 128b BiCAM macro designed in a 0.18 m CMOS tech-

    nology [16], the energy efficiency deteriorates substantially,

    from 2.33 fJ/bit/search to 4.83 fJ/bit/search [14]. Our analysis

    indicates that the energy efficiency deterioration results from

    three reasons. First, the clock driver needs to be enlarged

    because all the separated gates need to be triggered by the clock

    signal resulting in increased power consumption of the clock

    driver. Second, all separated p-input AND gates are evaluated

    independently. This means that the evaluation does not depend

    on the evaluation results of any other gate, and the switching

    activity of these gates will be higher than that of the p-input

    AND gate in the cascaded design. Third, the number of logic

    gates is increased, and hence the interconnections among the

    logic gates and the parasitic capacitance are correspondingly

    increased.

    B. The Proposed Tree AND-Type Match-Line Circuitry

    The basic concept behind the split-path AND-type match-line

    circuit is that it tries a different way to implement a big AND

    function originally realized by m cascaded AND gates. How-

    ever, there are several ways to achieve the same goal, and threeof them are shown in Fig. 3 (assuming 64b in each half plane).

    Fig. 3. (a) Parallel, (b) 3-level tree, and (c) 2-level tree AND-type match lines.

    The design in Fig. 3(a) uses two short parallel match lines in

    each half plane and merges the outputs from both planes into

    a 4-input AND gate to generate the final matching result. On

    the other hand, the design in Fig. 3(b) and (c) adopt a 3-level

    and 2-level tree match-line circuit, respectively, in each half

    plane, and use an 8-input and 4-input AND gate, respectively,

    to generate thefinal matching results. The electrical behaviors,

    including delay time and power consumption, are used to de-termine thefinal choice. The evaluation results are described as

    follows.

    Let us take the design of a 0.18 m 128b TCAM match line

    as an example. Post-layout evaluation results of different im-

    plementations are listed in Table I. All the designs use the same

    TCAM cell for a fair evaluation, and the cell layout is shown in

    Fig. 4. The impacts of the TCAM cell design and the cell layout

    design will be described in Section IV.

    The following are the observations from the extracted fea-

    tures and parameters.

    1) Both designs 1 and 2 have the deepest logic depth, but

    design 1 performs a more complex function in the critical

    path than design 2. So, design 1 has the longest searchdelay.

  • 8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

    3/11

    532 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

    TABLE I

    PERFORMANCE COMPARISONSBETWEENDIFFERENTMATCH LINES

    Fig. 4. The TCAM cell layout used for match-line evaluation.

    2) Designs 3, 4, and 5 have nearly 30% improvement on

    search speed compared to design 1.

    3) The differences of the delay times of designs 3, 4, and 5 do

    not exceed 1.5%. Therefore, the final decision can be made

    based on the power consumption.

    4) Compared to 233.6 of power consumption of the cas-

    caded design, the split-path design has 85% more power

    consumption. On the other hand, compared to the cascaded

    design, the parallel design and the 3-level tree design have

    about 20% more power consumption and the 2-level design

    has only 9% more power consumption.

    5) Therefore, we adopt the 2-level tree match-line circuitry in

    the TCAM design.

    III. SEGMENTEDSEARCH-LINECIRCUITRY

    In order to reduce power consumption, we must be aware

    that a TCAM macro consumes power mainly for three parts: the

    clock driver, match lines, and search lines. Due to the advance-

    ment of match-line circuit techniques, the power consumption

    of both match-line circuit and clock driver have been greatly re-duced. According to the data published in [9], [11], and [12],

    the search time and the energy index breakdown normalized

    at 1.2 V 0.1 m technology are shown in Fig. 5. We find that

    the power consumption of search lines occupies about 54%,

    71%, and 82% of the total power consumption of the CAM de-

    signs [9], [11], and [12], respectively. To reduce the power con-

    sumption for search lines, this study proposes the segmented

    search-line technique for the TCAM macro in the application

    of IP address lookup. In the following, we willfirst describe the

    attributes of TCAM for IP address lookup, and then describe the

    design of the segmented search-line circuitry.

    A. Attributes of TCAM for IP Address Lookup

    In Internet Protocol version 6 (IPv6), the length of an IP ad-

    dress extends to 128 bits. In a routing table, the prefix region

    stores either 0 or 1, and the rest stores . The statistic prefix

    length distribution observed at a specific router [17] is shown in

    Fig. 6(a). We find that more than 90% of IP addresses are shorter

    than 64 bits. Therefore, when the routing table is constructed

    with a TCAM array, a large portion of the array contains the

    mask bits (i.e., the bits), as shown in Fig. 6(b).

    B. The Proposed Segmented Search-Line Circuitry

    Since the cells in Fig. 6(b) do nothing but pass matching

    signals, they do not have to be involved with the search opera-tion. This property, when combined with the progressive layout

    pattern, indicates that search lines behind the cells can be

    turned off to save energy. The idea then leads to the segmented

    search-line design as shown in Fig. 7. Many segmentation en-

    tries (SEs) are inserted into the cell array. A segmentation entry

    contains a row of segmentation cells (SCs), and SCs are used to

    control signal propagation in the search lines.

    The circuit containing an SC and two TCAM cells is shown in

    Fig. 8. The SC is composed of a dummy cell and a path-control

    switch. The word line (WL) for the upper TCAM cell is also

    applied to the dummy cell. When writing an into the upper

    TCAM cell, both WBLP and WBLN lines are raised to high. In

    that case, the output of the dummy cell receives alowto cutoff the signal propagation, and the upper segment of the search

  • 8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

    4/11

    WANGet al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS 533

    Fig. 5. Search time and energy index of conventional CAM macros.

    Fig. 6. (a) The prefix length distribution of IP addresses, and (b) the corresponding TCAM array.

    Fig. 7. Concept of the segmented search-line scheme.

    lines (SBLNu and SBLPu) will be pulled down to ground. In

    other words, when the ternary cell above the segmentation cell

    (SC) stores an , the segmentation cell will automatically blockthe search data from propagating forward and so save energy.

    Fig. 8. The circuit showing the relationship between the SC and neighboringTCAM cells.

    The number and locations of segmentation entries can be de-

    cided by the statistic features of the routing table. Once the

    TCAM array has been designed, segmentation entries can not be

    changed for a specific embedded application. If an entry needs

    to be added to the look-up table, the table should be resorted atthe system level first, and then write operations are performed

  • 8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

    5/11

    534 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

    TABLE II

    PERFORMANCE COMPARISON BETWEENDIFFERENTINTERCONNECTIONMANNERS

    Fig. 9. The proposed TCAM cell.

    to update the table. A design example and experimental results

    will be described in Sections IV and V.

    IV. OTHERDESIGNCONSIDERATIONS

    In this section, we describe the physical design considera-

    tions for a 1.8 V 0.18 m 256 128b TCAM macro, including

    the TCAM cell design, the interconnections among TCAMcells, the TCAM cell layout, and the design of the segmentation

    entries.

    A. TCAM Cell Design

    Fig. 9 depicts the schematic of the proposed TCAM cell

    for the AND-type match line. This cell uses two independent

    latches for storing three possible kinds of data, similar to the

    TCAM cell [8] used for the NOR-type match line. QN and QP

    are the storage nodes, and they store complementary values

    when the stored datum is either 0 or 1. MN and MP perform the

    comparison (XOR) function between (QN, QP) and (SBLN,

    SBLP). When the cell needs to store , both QN and QP shouldbe written 0 to turn off MN and MP to disable the XOR

    operation. In the AND-type match line, transistor MC in this

    case should always be turned on, and MX1 and MX2 are used

    to charge node cell_out to a high voltage level VDDC for

    this purpose.

    This cell is almost identical to the TCAM cell used in [12]

    except that VDDC is rather than . This change

    considers the charge sharing effect (CSE). We adopt the same

    simulation model [Fig. 10(a)] as that used in [12] to observe the

    worst CSE of the proposed design. During the evaluation period,

    the voltage at node out should be kept sufficiently low to guar-

    antee a correct function. However, due to the CSE, there will be

    a ripple or even a logic change at nodeout. The simulationresults reflecting the relationship between the undesired ripple

    voltage at node out and the channel length (L) of the feedback

    pMOS at typical (TT) and worst (SF) process corners are

    shown in Fig. 10(b). The results indicate that ifcell_out is con-

    nected to , the adjustable range of for maximal not

    exceeding 0.4 V is very limited. Moreover, for the same ripple

    voltage, say 0.15 V, the design with can

    use a longer (0.5 m) and obtain a shorter gate delay (188 ps),

    while the design with should use a shorter

    (0.21 m) and get a larger gate delay (379 ps).

    B. Interconnections Among TCAM Cells

    All the gates in the proposed 2-level tree match line [Fig. 3(c)]

    should be arranged in one row in the memory array. Therefore,

    it is necessary to make a long interconnection to link two

    branches of the tree. The way an interconnection is made influ-

    ences the amount of parasitic capacitance and in turn influences

    both search speed and power consumption. We have studied

    two interconnection methods for performance evaluation.

    Fig. 11(a) and (b) show the conceptual and layout diagrams

    of straightforward and leap-frog interconnection methods,

    respectively. Post-layout simulation results are summarized in

    Table II.

    The simulation data in Table I are based on the leap-frog in-

    terconnection. The data in Table II reveal that if a straightfor-

    ward interconnection is adopted, not only will the search be de-

    layed but also the power consumption will increase. This effect

    is mainly because the long interconnection in the straightfor-

    ward manner lies in the critical path and results in a larger RC

    product.

    C. TCAM Cell Layout

    Both evaluation results in Tables I and II are based on theTCAM cell shown in Fig. 4. In the following we show per-

    formance evaluations based on different cell layouts. Fig. 4 is

    a TCAM cell with an aspect ratio of 1.17. We designed two

    other cell layouts with a small aspect ratio, as shown in Fig. 12.

    Table III summarizes the post-layout evaluation results for a

    128b 2-level tree match line (ML).

    The data in Table III show that the smaller the aspect ratio

    of the TCAM cell, the longer the search delay, the larger the

    power consumption of the match-line but the smaller the capac-

    itance on the search lines (SL). A good tradeoff is to use the de-

    sign of Fig. 12(a) because it only sacrifices 1.97% search delay

    but obtains 33% SL capacitance reduction. The overall power

    reduction from 33% SL capacitance reduction will more thancompensate for the 4% ML power increase.

  • 8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

    6/11

    WANGet al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS 535

    Fig. 10. (a) Simulation model for observing the CSE and (b) simulation results.

    Fig. 11. (a) Straightforward and (b) leap-frog interconnection manners.

    TABLE IIIIMPACT OF DIFFERENTCELL LAYOUTS

    D. Design of Segmentation Entries

    Although we can save more power if more SEsused, the prop-

    agation delay of the search signal will be strongly affected by

    the sizing of the SC and the number of series SCs along the

    search line. To compromise search speed and power consump-

    tion, we design the SEs with the following steps. First, assume

    there is only one SC along the search line, and size the trans-mission gate (TG) in the SC. For a 256-entry TCAM macro, the

    TG should be sized up to 2.4 times the minimal size so that the

    voltage at nodecell_outin the TCAM cell can be pulled up to

    from 0 V or pulled down to 0 V from

    within thefirst half clock cycle to guarantee a safe match op-

    eration in the second half clock cycle. We also found that, with

    this sizing, the operation at cell_out becomes too slow even

    if there are only two SCs along the search path, as shown inFig. 13. Second, increase the number of SEs along the search

  • 8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

    7/11

    536 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

    Fig. 12. (a) The second style, and (b) the third style TCAM cell layouts.

    Fig. 13. Simulation waveforms under different number of segmentation entries.

    line for a larger power saving, but use the above sizing whileavoid further speed loss with the aid offloorplan design. Fig. 14

    shows thefinalfloorplan of the 256 128b TCAM macro with

    two SEs. One SE is located at the quarter and the other at the

    half of the search line. The write and search buffers are located

    at the center of the array so that they can drive search lines in

    the upper and the lower half arrays simultaneously. With this

    design, each search signal will pass only one SC although one

    256b search line has two SCs.

    In Fig. 14 we also show the diode used to generate VDDC.

    The large diode is realized by many distributed small diodes

    located at top and bottom of the cell array.

    V. EXPERIMENTAL RESULTS

    We implemented a 1.8 V 0.18 m TCAM test chip for veri-

    fying the proposed design techniques. The critical-path circuit

    of the TCAM macro is shown in Fig. 15(a). Before we can use

    the TCAM for searching purposes, the TCAM array should be

    filled with data using the write operation. The timing waveforms

    for the write mode are shown in Fig. 15(b). When writing a

    dont care, the corresponding mask bit will be set

    as 0, and the corresponding bitline enable signal

    and write bitlines ( , ) will be pulled low

    and high, respectively. So, both storage nodes (QP and QN) of a

    TCAM cell will be written a0and the inner nodecell_out

    will be pulled up to the voltage level of VDDC as described ear-lier. When writing1or0,the signal goes high and

    Fig. 14. Thefloorplan of the 256 2 128b TCAM macro.

    one of the write bitlines will be pulled low according to the input

    datum.

    The timing waveforms for the search operation are shown

    in Fig. 15(c). The signal is the internal clock signal for the

    match circuit, and the complementary signal of the external

    clock signalclk. When goes low, the match-line circuit enters

    the pre-charge phase and the external datum and its complement

    are fetched by the up-goingclkinto the search lines through the

    search line buffers. In this phase, the datum on the search linesbegins to compare with all the data previously written and stored

  • 8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

    8/11

    WANGet al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS 537

    Fig. 15. (a) Schematic of the critical-path circuit, (b) waveforms of the write operation, and (c) waveforms of the search operation.

    into the TCAM array, and the voltage at nodecell_outof each

    memory cell goes toward its final value. When goes high, the

    match-line circuit enters the evaluation phase. Please refer to

    [12] for the detailed operation of the PF-CDPD match-line cir-

    cuit. All match lines are evaluated at the same time, and each

    match line generates an output . will go high if

    the search data matches with the stored data.

    The block diagram of the test chip is shown in Fig. 16(a). The

    TCAM macro contains two segmentation entries with the prefixlength of each being equal to 64 bits and 32 bits, respectively.

    A voltage controlled oscillator (VCO) and a divide-by-two cir-

    cuit are used to generate the clock signals with a 50% duty

    cycle. The clock frequency range can be adjusted from 200 MHz

    to 600 MHz. A dummy clock buffer synchronizes the rising

    (falling) edge of the clock clkt for the peripheral circuits, and

    the falling (rising) edge of the clock for the TCAM core.

    The pre-stored data and the search data are generated by four

    32b linear feedback shift registers (LFSRs), and the seed for

    the LFSRs can be controlled for varying the data sequence. The

    mask-bit control circuit is used to help generate the progres-

    sive data pattern. The 8b counter is used for generating the ad-

    dress for the write operation. In the beginning of measurement,the 4 32b LFSRs will generate a random pattern, which is

    ANDed with the pattern output by the mask-bit control circuit

    to generate the progressive pattern for the look-up table. When

    doing matching operations, the random search data will be gen-

    erated by LSFRs again with the same seed, and therefore each

    search data will match with one of the stored data. In this sense,

    the power is measured with one and only one match output per

    clock cycle. The timing diagrams of the test chip are shown in

    Fig. 16(b). From the timing diagrams, the search time can be

    calculated as , where is themeasured clock cycle time, and is the simulated set-up time

    of theflip-flop (5.2 ps).

    The photograph of the test chip is shown in Fig. 17(a), and

    the measured waveforms are shown in Fig. 17(b). The test chip

    can run at a minimal clock cycle time of 3.12 ns at the typical

    supply voltage of 1.8 V. In this case, the search time is calculated

    as ns ps ns. The lowest working is

    1.2 V as shown in Fig. 17(c), and the corresponding search time

    is 5.63 ns. The energy indexes of the TCAM macro are measured

    to be 1.42 and 0.63 fJ/bit/search at 1.8 V and 1.2 V, respectively.

    Chip features extracted from the implemented chip are listed

    in Table IV. We obtained eight samples from an educational pro-

    gram, and all of them functioned correctly. The standard devia-tion of the search time and the energy index of these eight chips

  • 8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

    9/11

    538 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

    Fig. 16. (a) The block diagram and (b) the timing diagrams of the test chip.

    Fig. 17. (a) Photograph, (b) measurement waveforms, and (c) shmoo chart of the test chip.

    TABLE IV

    FEATURESSUMMARY OF THE TESTCHIP

    are 0.021 ns and 0.012 fJ/bit/search, respectively. Thisresult im-

    plies that the proposed design techniques are robust to processvariations.

    Performance comparisons are illustrated in Table V. Even

    the proposed TCAM cell is 11% larger than the conventional

    TCAM cell used in [13], the normalized area per bit of the

    proposed design is 60% smaller than the conventional high-

    speed pipelined design [13]. When adopting the tree AND-type

    match-line circuitry and the segmented search-line technique,

    the 256 128b TCAM macro achieves a search time of 1.56 ns

    and an energy index of 1.42 fJ/bit/search at 1.8 V. This achieve-

    ment represents a 51% improvement in the energy index as com-

    pared to the TCAM design in [13]. It also represents a 38% re-

    duction in the minimal clock cycle time and a 39% improvement

    in the energy index compared to the BiCAM design in [12]. Be-

    cause the segmented search line (SSL) is an application-specific

    technique, we also show the energy index in Table V for the pro-posed TCAM without using the SSL technique. Compared to

  • 8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

    10/11

    WANGet al.: HIGH-SPEED AND LOW-POWER DESIGN TECHNIQUES FOR TCAM MACROS 539

    TABLE V

    FEATURESSUMMARY ANDPERFORMANCE COMPARISON

    TABLE VIOTHERPERFORMANCE COMPARISONS

    the TCAM design [13], the proposed design still shows a 25%

    improvement in the energy index.

    In order to see how the speed and power are affected by the

    bit width and CMOS technology, we have also implemented a

    0.18 m 1.8 V 256 144b TCAM macro and a 0.13 m 1.2 V

    256 128b TCAM macro. Table VI summarizes the design fea-

    tures. When realizing a 144b match line, a four-input PF-CDPD

    AND gate is added at the end of each branch of the 2-level tree

    AND-type match line (refer to Fig. 3(c)). As compared to the

    128b-wide TCAM macro, the search delay and the energy index

    of the 144b-wide TCAM match line increase 12% and 23%, re-

    spectively. On the other hand, comparing the 0.13 m 1.2 V

    256 128b TCAM design to the 0.18 m 1.8 V 256 128b

    TCAM design, the search delay and the energy index improve

    29% and 75%, respectively. The results indicate the benefits

    from the technology scaling.

    VI. CONCLUSION

    In this work, the tree AND-type match-line scheme is pro-posed for its high search speed, and the segmented search line

    scheme for its high energy efficiency in the TCAM-based appli-

    cation of IP address lookup. The design of the TCAM cell, in-

    terconnections among TCAM cells, TCAM cell layout, and seg-

    mentation entries are also described. The realized 1.8 V 0.18 m

    256 128b TCAM macro achieves a search time of 1.56 ns with

    1.42 fJ/bit/search energy.

    ACKNOWLEDGMENT

    The authors thank the Chip Implementation Center for sup-

    porting the chip fabrication.

    REFERENCES

    [1] K.-J. Lin and C.-W. Wu,A low-power CAM design for LZ data com-pression,IEEE Trans. Comput., vol. 49, pp. 11391145, Oct. 2000.

    [2] F. Yu, R. H. Katz, and T. V. Lakshman,Gigabit rate packet pattern-matching using TCAM,in Proc. IEEE ICNP, 2004, pp. 174183.

    [3] T. Ikenaga andT. Ogura, A fully parallel 1-Mb CAMLSI for real-timepixel-parallel image processing,IEEE J. Solid-State Circuits, vol. 35,no. 4, pp. 536544, Apr. 2000.

    [4] R. Sangireddy and A. K. Somani,High-speed IP routing with binarydecision diagrams basedhardware address lookup engine,IEEE J. Sel.Areas Commun., vol. 21, no. 5, pp. 513521, May 2003.

  • 8/10/2019 High-Speed and Low-Power Design Techinques for TCAM Macros

    11/11

    540 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 43, NO. 2, FEBRUARY 2008

    [5] T. Hayashi and T. Miyazaki, High-speed table lookup engine forIPv6 longest prefix match, in Proc. GLOBECOM99, 1999, vol. 2,pp. 15861571.

    [6] N.-F. Huang, W.-E. Chen, J.-Y. Luo, and J.-M. Chen, Design ofmulti-field IPv6 packet classifiers using ternary CAMs, in Proc.GLOBECOM 2001, vol. 3, pp. 18771881.

    [7] H. Miyatake, M. Tanaka, and Y. Mori, A design for high-speedlow-power CMOS fully parallel content-addressable memory macros,

    IEEE J. Solid-State Circuits, vol. 36, no. 6, pp. 956968, Jun. 2001.[8] I. Arsovski, T. Chandler, and A. Sheikholeslami,A ternary content-addressable memory (TCAM) based on 4T static storage and including

    a current-race sensing scheme, IEEE J. Solid-State Circuits, vol. 38,no. 1, pp. 155158, Jan. 2003.

    [9] I. Arsovski and A. Sheikholeslami,A mismatch-dependent power al-location technique for match-line sensing in content-addressable mem-ories, IEEE J. Solid-State Circuits, vol. 38, no. 11, pp. 19581966,Nov. 2003.

    [10] F. Shafai, K. J. Schultz, G. F. R. Gibson, A. G. Bluschke, and D. E.Somppi,Fully parallel 30-MHz, 2.5-Mb CAM,IEEE J. Solid-StateCircuits, vol. 33, no. 11, pp. 16901696, Nov. 1998.

    [11] S. Choi, K. Sohn, and H.-J. Yoo,A 0.7 fJ/bit/search, 2.2 ns searchtime, hybrid type TCAM architecture, IEEE J. Solid-State Circuits,vol. 40, no. 1, pp. 254260, Jan. 2005.

    [12] Li Hung-Yu, C.-C. Chen, J.-S. Wang, and C. Yeh, An AND-typematch-line scheme for high-performance energy-efficient content ad-

    dressable memories, IEEE J. Solid-State Circuits, vol. 41, no. 5, pp.11081119, May 2006.

    [13] K. Pagiamtzis and A. Sheikholeslami,A low-power content-address-able memory (CAM) using pipelined hierarchical search scheme,

    IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 15121519, Sep. 2004.[14] C.-C. Chen, Li Hung-Yu, and J.-S. Wang, The split-path and-type

    match-line scheme for very high-speed content addressablememories,inProc. Asian Solid-State Circuits Conf., 2005, pp. 525528.

    [15] J.-S. Wang, C.-C. Wang, and C. Yeh,TCAM for IP-address lookupusing tree-style and-type match lines and segmented search lines,in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers , 2006, pp.

    577586.[16] TSMC 0.18um mixed signal 1P6M + MIM salicide 1.8 V/3.3 V

    process documents,Taiwan Semiconductor Manufacturing Co., Ltd.,T-018-MM-TM-002.

    [17] BGP Table Statistics. 2006 [Online]. Available: http://bgp.potaroo.net& http://bgpview.6test.edu.cn

    Chao-Ching Wang was born in Taiwan, R.O.C.,in 1981. He received the B.S. degree in electricalengineering from National Chung Cheng University,Taiwan, in 2003. He is currently working towardthe Ph.D. degree at the Institute of Electrical Engi-neering, National Chung Cheng University.

    His research interests include high-speed,low-leakage, and low-power memory designs

    Jinn-Shyan Wang (S85M88) was born inTaiwan, R.O.C., in 1959. He received the B.S.degree in electrical engineering from NationalCheng-Kung University, Tainan, Taiwan, in 1982,and the M.S. and Ph.D. degrees from the Instituteof Electronics, National Chiao-Tung University,

    Hsinchu, Taiwan, in 1984 and 1988, respectively.He was with Industrial Technology Research In-

    stitute (ITRI) from 1988 to 1995, engaged in ASICcircuit and system design, and became the Managerof the Department of VLSI Design. He joined the De-

    partment of Electrical Engineering, National Chung-Cheng University,Chia-Yi,Taiwan, in 1995, where he is currently a full Professor. His research interestsare in low-power and high-speed digital integrated circuits and systems, analogintegrated circuits, IP and SOC design, and CMOS image sensors. He has pub-lishedover 20 journalpapers and40 conference papersand holds over 20 patentson VLSI circuits and architectures.

    Chingwei Yehreceived the B.S. degree in electricalengineering from National Taiwan University,Taipei, Taiwan, in 1986, and the Ph.D. degreein electrical and computer engineering from theUniversity of California at San Diego in 1992.

    Since then, he has been a faculty member with theElectrical Engineering Department, National Chung-

    Cheng University, Taiwan. His research interests in-clude digital VLSI design and CAD.