a parallel multiple hashing architecture for ip address lookup

7/29/2019 A Parallel Multiple Hashing Architecture for IP Address Lookup

1/5

A Parallel Multiple Hashing Architecturefor IP Address LookupHyesookLim* and Y eojinJ ung

I nformation El ectronics, EwhaWUniversity, Seoul, Korea*E-mail: [email protected].!aTEL: +82-2-3277-3403

Abstract- Address lookup is one of the main functionsof theInternet routers and a very important feature in evaluating routerperformance. As the Internet traM c keeps growing and thenumber of routing table entries is continuously growing, eacientaddress-lookup mechanism is essential. In recent years, variousfast address-lookup schemes have been proposed, but most ofthose schemes are not practical in terms of the memory sizerequired for routing table and the complexity required in tableupdate. I n this paper, we have proposed a parallel I P addresslookup architecture based on multiple hashing. The proposedscheme has advantages in required memory sue, the number ofmemory accesses, and table update. We have evaluated theperformance of the proposed scheme through simulation usingdata from MAE-WEST router. The simulation result shows thatthe proposed scheme requires a single memory access for theaddress lookup of each route when 203kbytesof memory and afew-hundred-entryTCAM are used.Index Terms- I P address lookup, best matching prefix,parallel multiple hashing, longest prefix matching

I . INTRODUCTIONouters have to perform IP address lookup in real-timeR for the incoming packet in order to forward the packettoward the final destinations. Classless Inter-domain Routing(CIDR) scheme is introduced to solve the issue of IP addressspace exhaustion by allowing network aggregation. With CIDR,the IP prefixes of routing table have arbitrary lengths. As aresult, when a packet arrives, a router compares the destinationIP addressof input packet wi th all prefixes of its routing tableand determines the most specific matching among matchingentries, and this is called the longest prefix matching (LPM ).Since prefix length is not specified in the IP address ofincoming packet, the longest prefix matching is a complexoperation and becomes the bottleneck of router.performance.There are several important factors in evaluating router

performance related to I P address lookup. First one is thenumber of memory accesses since memory accesses are themajor overhead of router performance which has not beenimproved as much as link speed increases. Second factor is the

0-7803-8375-3/04/$20.00Q 2004 IEEE 91

required memory size. As a number of networks attached tobackbone router grow exponentially, the required memory sizefor storing routing table is signif icantly increased. It should beconsidered to keep numbers of prefixes compact in the routingtable. The complexity in routing table update and the scalabilitytoward IPv6 also has to be considered.In this paper, we propose an IP address lookup structurewhich fully explores these factors. The proposed schemecombines hardware parallelism with multiple hashing proposedin[1].The rest of the paper is organized as follows. Section I Idescribes issues of existing approaches in IP address lookup.Section I11 explains address lookup using multiple hashing, andSection IV describes, the proposed scheme. In Section V ,simulation results and the performance evaluation results areshown. Section V concludes the paper.

11. PREV IOU SCHEMESA number of previous 1P address lookup schemes areclassified as follows.First one is a TCAM -based scheme [2]. TCAM performs IP

address lookups of entire entries concurrently with one memoryaccess. However, it is more expensive than common memory,and it has smaller storage space than a same size RA M as wellas it has higher power consumption. Therefore, it is impracticalto implement routing table with several hundred-thousands ofprefixes with TCA M . Moreover, TCAM has a scalabil ity issueto IPv6 which has the 128-bit address space.Secondly, there have been proposed numerous lookupalgorithms based on trie structure. A trie is a tree-based datastructure allowing the organization ofprefixes on a digital basisby using the bits of prefixes to direct the branching (31.However, several issues are related to trie structure. AssumingW is the height of a trie, it requires W memory accesses. Triealso has large storage requirements[3]-[8].In order to reduce the number of memory accesses in trie,prefix expansion has been proposed [4], and the prefixexpansion has been used in [7]. Prefixes shorter than 24 bits areexpanded to 24 bits, and then initial lookup is performed on the224-entry able. I f the indexed entry indicates that there possiblyexist longer prefixes, an additional lookup is executed at the


2/5

second memory using the pointer indicated at the indexed enbyof the first memory. This scheme has an advantage ofmaximum two memory accesses. However, 32Mhytes ofmemory is requiredto store224entries.The scheme proposed in [8] first constlucts a forwardingtable with 2"entries after expanding prefixes into 16 bits, andthen builds sub-trees pointed by each entry. I f input prefix islonger than 16,address lookup is executed along the sub-tree.This scheme requires long preprocessing time 'to constructsub-trees, and hence table updating is an issue of the scheme.Finally, there are hash-based schemes. Hashing has beenpopularly used in address lookups by exact matching. Severalschemes have been proposed to apply hashing for IP addresslookup [9], [IO]. By constructing separate routing tables andseparate hash functions for each prefix length, L im et al. [ 9]suggested parallel hashing for each prefix length. Collidedprefixes are stored in sub-tables, and binary search is appliedfor collided prefixes. The number of memory accesses is variedbecause of the binary search on sub-tables in their scheme.Waldvogel et al. [IO] proposed to organize a routing table byprefix length and apply binary search on the prefix length in therouting table. Memory access in each prefix length isperformed by hashing in their scheme. The binary searchrequires the worst case of IogzW memory accesses (W is thenumber of different prefix lengths). Moreover, the schemerequires long preprocessing to compute markers noting theexistence of longer prefixes, and hence the routing table updateis not trivial. The scheme also assumes to find out a perfecthash function for a given prefix distribution, but it is knownthat it takes several minutes to find out the perfect hash function[4]. Therefore, the application of this scheme is limited onlywhen prefix information is infrequently changed.

111. ADDRESS LOOK UP SING MUL TIPLE ASHINGHashing converts a bit representation into a shorter bitrepresentation which is used as an index of a table, and hencehashing produces collision, in which different bitrepresentation is converted into a same hash index. Good hashfunctions produce few collisions, and it takes several minutes tofind out a semi-perfect hash function for a given prefixdistribution.Instead of looking for a semi-perfect hashing function,Broder et al. proposed multiple hashing [I]. Their analysis andsimulation results show that multiple hash functions associatedwith multiple entries in each routing table effectively improvehashing performance. Figure 1 shows the construction ofrouting table using multiple hashing. A hash key is passed

through both of hash functions and two hash indices areobtained. Each hash index is used as an index of each table, andthe hash key is stored into the bucket of the table which hasfewer loads. Therefore, keys are more evenly distributed intomultiple tables. In order to apply the multiple hashing into IPaddress lookup, Broder et al. have tailored the binary search onlevels presented in [ IO] after extending prefixes into 16,26, and

92

32bits. A s a result, their scheme requires 4Mbytes of memoryspace to store a routing table and achieves a route lookup withmaximum2 memory accesses.

Figure I Routingtable Constructionusingmultiple hash functions

IV. TH E PROPOSED SCHEL4EIn this section, we propose ii parallel IP address lookuparchitecture based on multiple hashing. Tht: proposed schemehas multiple routing tables separated by prefix length, andaddress lookup in each table is performed in parallel usinghashing. Hashing is implemented with cyclic redundancy code(CRC) checker in our proposed scheme. Figure 2 shows theproposed hardware structure.

Figure1Proposedhardwarestmchlre

A. Parallel LookupA s mentioned in earlier section, L PM is a complex operationsince there could exist many mat:hing prefixes with differentlengths in the routing table and the longest matched prefixshould be determined. In other words, search has to becontinued until theentiretableisexamined since there possiblyexist longer matched prefixes in the table even though a matchis found.The proposed scheme organizes multiple routing tablesseparated by prefix length and stores each table into a separatememory. Hence the LPM problem is converted into the exactmatching problem in each table, and as a result, parallellookups using hashing on each fable are achieved. In otherwords, finding a match in each prefix table is performed by aseparate process, and every process isexecuted in parallel. Thelongest match is finally determined among matches gatheredfrom all processes.


3/5

B. MultipleHashing UsingCRCA hash function takes an incoming IP address as input andgenerates a shorter fixed-size string known as a hash index. Thehash index is used as a pointer approaching a routing table.Good hash functions provide few collisions, which means notmany prefixes are mapped into the same hash index [I I].An important issue in using hashing for IP address lookup ishow to minimize collisions. Our proposed scheme applies themultiple hashing presented by Broder, et al. [I ] in order to solvecollision problem and uses CRC as a hash function which isknown as a semi-perfect hash function [12]. While Broder, et al.suggested the software-based scheme which searches anappropriate hash function for a given prefix distribution, ourproposed scheme is a hardware-based scheme which uses afixed hash function and applies parallel searching in each prefixtable. A dditionally, we showed a scheme which extractsmultiple hash indices for each prefix length from a single CRChardware and hence significantly reduces the burden ofhardware implementation.Figure 3depicts the CRC-32 hashing hardware structureused in the proposed scheme [ I I]. Since the routing table in theproposed scheme is organized independently by prefix length,separate hash indices are required as indices of routing tables ineach prefix length. Extracting hash indices from CRC hashinghardware is explained as follows. First, each bit of destinationIP address is serially entered into CRC hashing hardware. AfterL (for L =8,9, , ,32) cycles, two fixed hash indices for prefixL are extracted from the register 0 to the register L -I . Byrepeating the same procedure, hash indices for different prefixlengths are taken at different timing from CRC registers in theproposed scheme. For example, hash index for prefix length8

i s taken from CRC registers after 8cycles, and one clock cyclelater, the hash index for prefix length 9 is chosen. A ll the hashindices for each routing table are available after 32cycles.

Forwardng ForwardngPreix PAMwinterumberOf Items Prefix Pointer

Figure3 CRC-32We analyze whether the 32 cycles to obtain all the hashindices do not prevent routers fromworking at line rate.Suppose that minimum size packet length is 72bytes includingpreamble and Start of Frame Delimiter (SFD), and Inter-FrameGap (IFG) is 12bytes. In case that the forwarding engineoperates at IOOMHz clock, the required time to obtain all thehash indices are 320ns, and hence a router which has theaggregated bandwidthof up to 2.1Gbps works at line rate. At200MHz clock, a router with the aggregated bandwidth ofup to

4.2Gbps works at line rate. Beyond this rate, multiple hashinghardwares are required as shown in Figure2.

C. Building Forwarding TablesWe refer the analysis of [ ] in order to determine the numberof buckets and the number of entries per bucket in each table.Assuming that N prefixes are hashed into NI2 buckets usingtwo hash indices, the analysis shows that the probability for abucket to have twoormore loads is 5.0e-7. Figure 4shows thebucket structure which stores maximum two loads.Each bucket ofthe routing table consists of a field to indicatea number of item and multiple fields to store loads, and eachload consists of a field for prefix and a field for forwardingRAM pointer as shown in Figure4.

Figure4 Enntrystructure of the forwardingtablc in theproposedschemeThe length ofhash hit is determined according to the numberof buckets. In order to store N prefixes into tables with totalcapacity of 2N loads, we need two tables composed of NI2buckets, each bucket having two loads. Because hash index isused as an index of routing table, the required length of hashindex is the nearest integer of log@/Z). We use the minimumlengthof hash index is 2.Forwarding table is built using the algorithm shown inFigure5.First, a prefix enters to CRC hashing hardware bit bybit, and after L (L isthe prefix length) cycles, two hash indicesare extracted from CRC registers between bit 0 to bit L-1. Eachhash index indicates a bucket of each table, and the new prefixis stored into the bucket which has smaller number of loads. I ftwobuckets have equal loads, the prefix isplaced in the bucketof the first table in default. In case of overflows, the prefix isstored into the overflow table. I n the proposed scheme, twohash tables per each lengthofprefixes are used, and hence total

48 hash tables (prefix 8-32, except the prefix length 31) and anovertlow table are used. Figure 6 depicts the process ofconstructing the forwarding table.

Figure5 Algorithm to build forwardingtable

93


4/5

in parallel.

Figure 6 Construction procedureof forwardingtable

D. Overflow TableIn case that both buckets are full and have no space to store

more prefix, a newly inserted prefix has to be stored in theoverflow table. As will be shown on Table 1 of sectionV, usingtwo hash indices and using 203 K hytes of memory, overtlowrate is 0.52% of the entire prefixes. A small-sized TCAM canhe used for the overflow table. TCAM is also searched inparallel.E. Searching F orwarding TablesSearches in each table are executed in parallel using the hashindices obtained from CRC hash function. Asmentioned earlier,48 hash indices are obtained from a single CRC hardware. Asshown in Figure 7, entries in each table are concurrentlysearched for the buckets indicated by hash indices.Additionally, overflow TCAM is also searched in parallel. TheLPM (Longest Prefix Match) is selected by the priority encoderamong matching entries resulted from the tables and theoverflow TCAM. The packet is finally forwarded to theoutputport pointed by the forwarding RAM pointer indicated by tbeselected entry. Figure 8 is the block diagram of searchingprocedure. As shown in Figure 8,only a single memory accesstime is required since lookup in each prefix length is performed

A t cycleyfor L=&-32) letD[31:31-L tl] isL bitsof destination addressD.D[31:31-L tl] serialiy enteredtoCRChash functionExlrictH,(L), H,(L) fromCRC registersDo Parallel (L -8-32)tablelgtr=H,(L)table2gh=H,(L)I f(D[3l:31-L tl[=pre~r(tablelgtr))

Elsei f (D[31: 31- Ltl [=prel x(tabl e2gtr ))Then fwdgt r= wd~ r( tabi el ~ r)Then fwdsl r= wdstr ( tabl e2~ r)End Do ParaleSearch fromo v emow CA MDetermneLPM among matchingentries

Figure 7 Searching Algorithm

Figure 8 Search Pmsdurr offorwarding table

F . Update andExpansion to IPv6Routing table update for the proposed scheme is incremental.Update process is the same as building process. The newlyadded prefix is located into the bucket with fewer loads. If bothof buckets are full, the newly added prefix is stored into theoverflow TCA M. Prefix deletion is also incremental. A fterdeleting the prefix from the huoket indicated by hash index,loads of the bucket are re-arranged in order to make no invalidently between the valid entries, and the number of items isreduced by one. I f the bucket has no matching prefix, the prefixis searched on overflow table and deleted. It does not requirelong computation to build a new table, and hence fast update isachieved. Too many memories may be required toward I Pv6since proposed scheme uses separate memory in each prefixlength, and the problem isexpected to be solved by prefixgrouping. The grouping solutiorl is not included in this paperdue to the page limitation.

V . SI MULATI ONRESULT ANDCOMPARISONWe have performed address, lookup simulation for ourproposed scheme using data from a snapshot of theMAE-WEST (2002/03/35, which has29584prefixes. Figure9shows the prefix distribution through MAE-WEST. In order tofind out memory efficiency rate and overflow rate, we haveperformed several testcases based on given prefix distribution.The memory efficiency rate represents how many prefixes are

fi lled over the entire table entrier;.Table 1describes the testcases and the results according tothe number of buckets and the number of hash functions. TheCase 1 is to use a single hash function. For each prefix length(except the prefix length 31), a !;ingle table is used, and hencetotal 24 tables and an overtlow TCAM are used in simulation.WhenN items are hashed into N/2buckets which have4 entries

94


5/5

per bucket, about 203Kbytes of memory is required, and theoverflow rate i s 3.4%. I n case thattwohash functions are usedMAE-WEST 0311 512QOZ

1 . E+01b5 1 E+OJB$ I E102

1 E+Ol

-z

1 F I M. 6 10 12 14 16 1 8 20 22 2a 26 28 30 32Prelix Lenmh IS-321

Figurc 9 Prefixes Distribution of MAE- WESTRoutel(Case 2) under the same condition, in which 48 tables and anoverflow TCA M are used, the overflow rate is significantlyreduced to 0.52% (154 entries). The reason is that prefixes aremore evenly distributed by using multiple hash functions. TheCase3uses twohash functions and 3 entries per bucket instead2, and the overflow i s completely removed. The Case 4 usesNi4 buckets with three hash functions and results in highmemory efficiency rate but more overilows. The caseconsumes 152 Kbytes of memory, and 136 overflows areoccurred. The Case 4 can be chosen for optimum memoryusage. The testcases show that multiple hash functions improvehashing performance, and the memory efficiency is traded offwith memory ovefflow rate.

TABL E MEMORYIZE AND ENTRY FFICIENCY ANALYSISNumber Number Number Entrssl Memory Memory Ovcflowof Item of Bucket of Hash Bucket Size Efficiency Rate

case

I N NI2 I 4 203KB 49.85% 3.4%2 N Ni2 2 2 203KB 49.85% 0.52%3 N NI2 2 3 303KB 33.41% 0%4 N NI4 3 2 1S2KB 66.3% 0.46%

We compare our proposed scheme with existing schemes inthe Table 2. The proposed scheme is worth to pay closeattention in considering memory size and memory access times.

VI. CONCLUSIONWe have proposed a practical and efficient hardwarestructure for IP address lookup. The essence of the proposedstructure is to apply parallelisms onto multiple hashing.Prefixes are classified according to prefix length, and tables are

separately constructed in each prefix length. Therefore,searches in each prefix length are performed in parallel. Thisalso makes it possible to apply hashing to IP address lookup.The proposed scheme applies multiple hashing to improvehashing performance. The proposed scheme requires just onememory access to find out the longest prefix match using total203 Kbytes of memories and a small-sized TCA M. It also hasexcellent characteristics in routing table update and in thescalability to IPv6. Our proposed structure can be easilyimplemented in VLSl since it has a regular and modularstructure.

TABLE COMPARISONITH EXISTING CHEMESScheme (Minimum, Maximum) Size

Hung s scheme [81 I, 450KB -470KBAddress Lwkup NumbcrofM emory Accesses Forwarding Table

DIR-24-8 [7] 1,2 33MBDIR-21-3-8 [7] 1.3 9MB

SF? [I31 2.9 l50KB- I60KBParallel hashing [9] 1.5 189KB

PIOPOSed 203KBI* +I54-enuyCAMrchitccNreREFERENCES

[I ][2][I]

A. Broder and M. Mitzenmachcr, Using Multiple Hash Functions toImprove I P Laokups, IEEE MFOCOM, pp. 1454.1 463,2001.A. McAuley and P. Francis,Fast routing lookup using CAMs, inP m.IEEE MFOCOM . 1993.p~. 382-1391.M. A. Ruiz-Sanchez, E. W. Biersack and W. Dabbous, Survey andTaxonomy of IP A ddrcsa Lookup Algorithms, IEEE Network pp. 8-23,March/April2OOl,V. Srinivasan and G. Varghesc, Fast address lookups using control ledprefix expansion, in P m. CM Sigmetricr98 Conf., Madison, WI, pp.1-11.W. N. Eathenon, Hardwarc-based Internet pmtocol prefix lookups,M.S. thesis, Washington Univ., St. Lauis, MO, 1998. [Online]. Available81wwv.arl.wstl.edu.David E. Taylor, J onathan S . Turner, J ohn W. Lockwood, Todd S.Spraull and David B. Parlour, Scalable IP L aokup for lntemct routers,IEEE Journal an Selected Areas in Communications, Vol.21, N 0.4,pp.522-533, May 2003.N. McK eown, P. Gupta and S . Lin, Routing lookups in hardware stmemory access speeds, in hoc. IEEE INFOCOM98 Co d,pp. I24LL 1247.Ncn-Fu Huang and Shi-Ming Zhao, A Novel IP Routing LookupSchemeand Hardwarc A rchitceerc for Multigiga bit Switchingrouten,IEEE J oumal on Sclccted Areas in Communications, Vol. 17,No. 6, pp.1093-1104, June1999.Hycrook Lim, J i-Hyun Sco and Yca-J in l ung, High Speed IP A ddressLookup A rchi tecm Using Hashing, IEEE CommunicationLctters,Vol.7,No. 10,pp. 502-504, Ocl. 2003.

[ I O] M. Waldvogcl, G.Varghcse, 1.Turner, and E. Platmsr, Sealable highspeed I P routing lookups, in P m. ACM SIGCOMM97 Cont. Cannes,France, pp. 25-35.

[4]

151

[6]

[7]

[SI

[9l

[ I l l Rich SeifcrST he Switch book, Wiley, 2000[I21 Raj lain. Comparison of Hashing Schemes far Address h k u p inComputerNewa?# in IEEE TmnsaclionsonCommunications, Vol. 40,No. 1O.pp. 1570-1573, Ocl. 1992.[I31 M. Dcgcrmark.A. Bmdnik, S. Carlsson. S . Pink, Small ForwardingTables for Fast Routing look up ". Pmc. ACM SIGCOMM, pp.3-14.1997

95
http://wwv.arl.wstl.edu/http://wwv.arl.wstl.edu/http://wwv.arl.wstl.edu/

a parallel multiple hashing architecture for ip address lookup

Documents