page 1 john morgan infrastructure processor division september 2004 intel® ixp2xxx network...

32
Page 1 John Morgan John Morgan Infrastructure Processor Division Infrastructure Processor Division September 2004 September 2004 Intel® IXP2XXX Network Intel® IXP2XXX Network Processor Architecture Processor Architecture Overview Overview

Post on 19-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Page 1

John MorganJohn Morgan

Infrastructure Processor DivisionInfrastructure Processor Division

September 2004September 2004

Intel® IXP2XXX Network Intel® IXP2XXX Network Processor Architecture OverviewProcessor Architecture Overview

Page 2: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Customer ASICs

IXP2400 External FeaturesIXP2400 External Features

Utopia 1/2/3 orPOS-PL2/3Interface

PCI 64-bit / 66 MHz

IXP2400

(Ingress)

HostCPU

(Optional)

ATM / POS PHY

or Ethernet MAC

Flash

Classification Accelerator

CoProc BusMicro-Engine

Clusters

Slow Port

Switch Fabric Port Interface

Utopia 1,2,3SPI – 3 (POS-PL3)

CSIX

IXP2400(Egress)

Flow Control Bus

External InterfacesExternal Interfaces MSF Interface supports UTOPIA 1/2/3, MSF Interface supports UTOPIA 1/2/3,

SPI-3 (POS-PL3), and CSIX.SPI-3 (POS-PL3), and CSIX. Four independent, configurable, 8-bit Four independent, configurable, 8-bit

channels with the ability to aggregate channels with the ability to aggregate channels for wider interfaces.channels for wider interfaces.

Media interface can support Media interface can support channelized media on RX and 32-bit channelized media on RX and 32-bit connect to Switch Fabric over SPI-3 on connect to Switch Fabric over SPI-3 on TX (and vice versa) to support Switch TX (and vice versa) to support Switch Fabric option.Fabric option.

2 Quad Data Rate SRAM channels.2 Quad Data Rate SRAM channels. A QDR SRAM channel can interface to A QDR SRAM channel can interface to

Co-Processors.Co-Processors. 1 DDR SDRAM channel.1 DDR SDRAM channel. PCI 64/66 Host CPU interface.PCI 64/66 Host CPU interface. Flash and PHY Mgmt interface.Flash and PHY Mgmt interface. Dedicated inter-IXP channel to Dedicated inter-IXP channel to

communicate fabric flow control communicate fabric flow control information from egress to ingress for information from egress to ingress for dual chip solution.dual chip solution.

DDR DRAM2 GByte

QDR SRAM1.6 GBs

64 M Byte

IXA SW

Page 3: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

MEv26

MEv27

MEv25

MEv28

Intel®XScale™

Core32K IC32K DC

Rbuf64 @ 128B

Tbuf64 @ 128B

Hash64/48/128

Scratch16KB

QDRSRAM

1

QDRSRAM

2

DDRAM

GASKET

PCI

(64b)66 MHz

32b32b

32b32b

1818 18181818 1818

7272

64b64b

SPI3orCSIX

E/D Q E/D Q

MEv22

MEv23

MEv21

MEv24

CSRs -Fast_wr -UART-Timers -GPIO-BootROM/Slow Port

IXP2400IXP2400

Page 4: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

IXP2400 Resources SummaryIXP2400 Resources Summary Half Duplex OC-48 / 2.5 Gb/sec Network ProcessorHalf Duplex OC-48 / 2.5 Gb/sec Network Processor (8) Multi-Threaded Microengines(8) Multi-Threaded Microengines Intel® XScale™ CoreIntel® XScale™ Core Media / Switch Fabric InterfaceMedia / Switch Fabric Interface PCI interfacePCI interface 2 QDR SRAM interface controllers2 QDR SRAM interface controllers 1 DDR SDRAM interface controller1 DDR SDRAM interface controller 8 bit asynchronous port8 bit asynchronous port

– Flash and CPU busFlash and CPU bus Additional integrated featureAdditional integrated feature

– Hardware Hash Unit Hardware Hash Unit – 16 KByte Scratchpad Memory,Serial UART port 16 KByte Scratchpad Memory,Serial UART port – 8 general purpose I/O pins8 general purpose I/O pins– Four 32-bit timersFour 32-bit timers– JTAG SupportJTAG Support

Page 5: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

IXP2800 External FeaturesIXP2800 External Features

Customer ASICs

SPI-4 or CSIX-L1

PCI 64-bit / 66 MHz

IXP2800

(Ingress)

HostCPU

(Optional)

ATM / POS PHY

or Ethernet MAC

Flash

Classification Accelerator

CoProc BusMicro-Engine

Clusters

Slow Port

Switch Fabric Port Interface

SPI – 4, CSIX-L1

IXP2800(Egress)

Flow Control Bus

External InterfacesExternal Interfaces Media Interface supports Media Interface supports

both SPI-4 and CSIXboth SPI-4 and CSIX 4 Quad Data Rate (QDR) 4 Quad Data Rate (QDR)

SRAM channelsSRAM channels Each channel can Each channel can

interface to Co-interface to Co-processorsprocessors

3 RDRAM Channels3 RDRAM Channels PCI 64/66 Host CPU interfacePCI 64/66 Host CPU interface Flash and PHY Management Flash and PHY Management

interfaceinterface Dedicated inter-IXP channel Dedicated inter-IXP channel

to communicate fabric flow to communicate fabric flow control information from control information from egress to ingress for dual egress to ingress for dual chip solutionchip solution

RDR DRAM50+Gbps

2 Gbyte total for 3 channels

QDR SRAM12.8 Gbps x 464 M Byte x 4

channels

IXA SW

Page 6: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Page 6

Intel®XScale™

Core32K IC32K DC MEv2

10MEv2

11MEv2

12

MEv215

MEv214

MEv213

Rbuf64 @ 128B

Tbuf64 @ 128B

Hash48/64/128

Scratch16KBQDR

SRAM2

QDRSRAM

1

RDRAM1

RDRAM3

RDRAM2

GASKET

PCI

(64b)66 MHz

IXP2800IXP2800

16b16b

16b16b

1818 18181818 1818

1818 1818 1818

64b64b

SPI4orCSIX

Stripe

E/D Q E/D Q

QDRSRAM

3

E/D Q

1818 1818

MEv29

MEv216

MEv22

MEv23

MEv24

MEv27

MEv26

MEv25

MEv21

MEv28

CSRs -Fast_wr -UART-Timers -GPIO-BootROM/SlowPort

QDRSRAM

4

E/D Q

1818 1818

Page 7: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

IXP2800 Resources SummaryIXP2800 Resources Summary Half Duplex OC-192 / 10 Gb/sec Network ProcessorHalf Duplex OC-192 / 10 Gb/sec Network Processor (16) Multi-Threaded Microengines(16) Multi-Threaded Microengines Intel® XScale™ CoreIntel® XScale™ Core Media / Switch Fabric InterfaceMedia / Switch Fabric Interface PCI interfacePCI interface 4 QDR SRAM Interface Controllers4 QDR SRAM Interface Controllers 3 Rambus* DRAM Interface Controllers3 Rambus* DRAM Interface Controllers 8 bit asynchronous port8 bit asynchronous port

– Flash and CPU busFlash and CPU bus Additional integrated featuresAdditional integrated features

– Hardware Hash Unit for generating of 48-, 64-, or 128-bit adaptive Hardware Hash Unit for generating of 48-, 64-, or 128-bit adaptive polynomial hash keyspolynomial hash keys

– 16 KByte Scratchpad Memory 16 KByte Scratchpad Memory – Serial UART port for debug Serial UART port for debug – 8 general purpose I/O pins 8 general purpose I/O pins – Four 32-bit timers Four 32-bit timers – JTAG SupportJTAG Support

Page 8: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

IXP2800 and IXP2400IXP2800 and IXP2400 Comparison Comparison

Dual chip full duplex OC48Dual chip full duplex OC48Dual chip full duplex OC192Dual chip full duplex OC192PerformancePerformance

8 (MEv2)8 (MEv2)16 (MEv2)16 (MEv2)Number of Number of

MicroEnginesMicroEngines

Separate 32 bit Tx & Rx Separate 32 bit Tx & Rx

configurable to SPI-3, UTOPIA 3 configurable to SPI-3, UTOPIA 3

or CSIX_L1or CSIX_L1

Separate 16 bit Tx & Rx Separate 16 bit Tx & Rx

configurable to SPI-4 P2 or configurable to SPI-4 P2 or

CSIX_L1CSIX_L1

Media InterfaceMedia Interface

2 channels QDR (or co-2 channels QDR (or co-

processor)processor)4 channels QDR (or co-4 channels QDR (or co-

processor)processor)SRAM MemorySRAM Memory

1 channel DDR DRAM - 150MHz; 1 channel DDR DRAM - 150MHz;

Up to 2GBUp to 2GB3 channels RDRAM 3 channels RDRAM

800/1066MHz; Up to 2GB800/1066MHz; Up to 2GBDRAM MemoryDRAM Memory

600/400MHz600/400MHz1.4/1.0 GHz/ 650 MHz1.4/1.0 GHz/ 650 MHzFrequencyFrequency

IXP2400IXP2400IXP2800IXP2800

Page 9: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

128GPR

Control Store

4K/8K Instructions

128 GPR

Local Memory640 words

128 Next Neighbor

128 S Xfer Out

128 D Xfer Out

OtherLocal CSRs

CRC Unit

128 S Xfer In

128 D Xfer In

LM Addr 1LM Addr 0

D-Push Bus

S-Push Bus

D-Pull Bus S-Pull Bus

To Next Neighbor

From Next Neighbor

A_Operand B_Operand

ALU_Out

P-Random #

32-bit ExecutionData Path

Multiply

Find first bit

Add, shift, logical

2 per CTX

CRC remain

Lock0-15

StatusandLRULogic(6-bit)

TAGs 0-15

Status Entry#

CA

M

Timers

Timestamp

Prev B

B_op

Prev A

A_op

MicroEngine v2MicroEngine v2

Page 10: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Clock RatesClock Rates– IXP2400 – 600/400 MHzIXP2400 – 600/400 MHz– IXP2800 - 1.4/1.0 GHz/ 650 MHzIXP2800 - 1.4/1.0 GHz/ 650 MHz

Control StoreControl Store– IXP2400 – 4K Instruction storeIXP2400 – 4K Instruction store– IXP2800 – 8K Instruction storeIXP2800 – 8K Instruction store

Configurable to 4 or 8 threadsConfigurable to 4 or 8 threads– Each thread has its own program counter, registers, Each thread has its own program counter, registers,

signal and wakeup eventssignal and wakeup events– Generalized Thread Signaling (15 signals per thread)Generalized Thread Signaling (15 signals per thread)

Local Storage OptionsLocal Storage Options– 256 GPRs256 GPRs– 256 Transfer Registers256 Transfer Registers– 128 Next Neighbor Registers128 Next Neighbor Registers– 640 - 32bit words of local memory640 - 32bit words of local memory

Microengine v2 Features – Part 1Microengine v2 Features – Part 1

Page 11: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

CAM (Content Addressable Memory)CAM (Content Addressable Memory)– Performs parallel lookup on 16 - 32bit entriesPerforms parallel lookup on 16 - 32bit entries– Reports a 9-bit lookup result Reports a 9-bit lookup result

– 4 State bits (software controlled, no impact to hardware)4 State bits (software controlled, no impact to hardware)– Hit – entry number that hit; Miss – LRU entryHit – entry number that hit; Miss – LRU entry– 4-bit index of Cam entry (Hit) or LRU (Miss)4-bit index of Cam entry (Hit) or LRU (Miss)

– Improves usage of multiple threads on same dataImproves usage of multiple threads on same data CRC hardwareCRC hardware

– IXP2400 - Provides CRC_16, CRC_32IXP2400 - Provides CRC_16, CRC_32– IXP2800 - Provides CRC_16, CRC_32, iSCSI, CRC_10 and CRC_5IXP2800 - Provides CRC_16, CRC_32, iSCSI, CRC_10 and CRC_5– Accelerates CRC computation for ATM AAL/SAR, ATM OAM and Storage Accelerates CRC computation for ATM AAL/SAR, ATM OAM and Storage

applicationsapplications Multiply hardwareMultiply hardware

– Supports 8x24, 16x16 and 32x32 Supports 8x24, 16x16 and 32x32 – Accelerates metering in QoS algorithmsAccelerates metering in QoS algorithms

– DiffServ, MPLSDiffServ, MPLS Pseudo Random Number generationPseudo Random Number generation

– Accelerates RED, WRED algorithmsAccelerates RED, WRED algorithms 64-bit Time-stamp and 16-bit Profile count64-bit Time-stamp and 16-bit Profile count

Microengine v2 Features – Part 2Microengine v2 Features – Part 2

Page 12: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Intel® XScale™ Core OverviewIntel® XScale™ Core Overview

High-performance, Low-power, 32-bit Embedded High-performance, Low-power, 32-bit Embedded RISC processorRISC processor

Clock rateClock rate– IXP2400 600 MHzIXP2400 600 MHz

– IXP2800 700/500/325 MHzIXP2800 700/500/325 MHz

32 Kbyte instruction cache32 Kbyte instruction cache 32 Kbyte data cache32 Kbyte data cache 2 Kbyte mini-data cache2 Kbyte mini-data cache Write bufferWrite buffer Memory management unitMemory management unit

Page 13: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Page 13

Web Switch Design Using Web Switch Design Using Network Processors – NSF Network Processors – NSF

Project 2002-2005Project 2002-2005

Funded by NSF and Intel – Not Intel ConfidentialFunded by NSF and Intel – Not Intel ConfidentialL. Zhao, Y. Luo, L. Bhuyan and R. Iyer, “A NetworkL. Zhao, Y. Luo, L. Bhuyan and R. Iyer, “A Network

Processor-Based Content Aware Switch”Processor-Based Content Aware Switch”IEEE Micro, May/June 2006IEEE Micro, May/June 2006

Page 14: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Web Switch or Layer 5 SwitchWeb Switch or Layer 5 Switch

Layer 4 switchLayer 4 switch

– Content blindContent blind

– Storage overheadStorage overhead

– Difficult to administerDifficult to administer Content-aware (Layer 5/7) switchContent-aware (Layer 5/7) switch

– Partition the server’s database over different nodesPartition the server’s database over different nodes

– Increase the performance due to improved hit rateIncrease the performance due to improved hit rate

– Server can be specialized for certain types of requestServer can be specialized for certain types of request

Switch

Image Server

Application Server

HTML Server

www.yahoo.comInternet

GET /cgi-bin/form HTTP/1.1 Host: www.yahoo.com…

APP. DATATCPIP

Page 15: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Layer-7 Two-way MechanismsLayer-7 Two-way Mechanisms

TCP gatewayTCP gateway Application level proxy on Application level proxy on

the web switch mediates the web switch mediates the communication the communication between the client and the between the client and the serverserver

TCP splicingTCP splicing Reduce the overhead in Reduce the overhead in

TCP gateway by TCP gateway by forwarding directly by OSforwarding directly by OS

kernel

user

kernel

Page 16: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

TCP SplicingTCP Splicing

Establish connection Establish connection with the clientwith the client

– Three-way handshakeThree-way handshake

Choose the serverChoose the server Establish connection Establish connection

with the serverwith the server Splice two connectionsSplice two connections Map the sequence for Map the sequence for

subsequent packets subsequent packets

SYNC

SYND,ACKC+1

Client Switch Server

Time

SYNS,ACKC+1

ACKD+1,DataC+1

ACKD+len+1 D ->S

ACKS+len+1

SYNC

ACKS+1,DataC+1D ->S

D<- SACKC+len+1,DataD+1 ACKC+len+1,DataS

+1

Page 17: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Partitioning the WorkloadPartitioning the Workload

Page 18: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Latency on a Linux-based switchLatency on a Linux-based switch

Latency is reduced by TCP splicingLatency is reduced by TCP splicing

Page 19: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Latency using NPLatency using NP

02468

101214161820

1 4 16 64 256 1024

Request file size (KB)

Late

ncy o

n t

he s

wit

ch

(m

s)

Linux Splicer

SpliceNP

Page 20: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

ThroughputThroughput

0

100

200

300

400

500

600

700

800

1 4 16 64 256 1024

Request file size (KB)

Th

rou

gh

pu

t (M

bp

s) Linux Splicer

SpliceNP

Page 21: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

NePSim: NePSim: http://www.cs.ucr.edu/~yluo/nepsim/http://www.cs.ucr.edu/~yluo/nepsim/ ObjectivesObjectives

– Open-sourceOpen-source

– Cycle-level accuracyCycle-level accuracy

– FlexibilityFlexibility

– Integrated power modelIntegrated power model

– Fast simulation speedFast simulation speed

ChallengesChallenges– Domain specific instruction set Domain specific instruction set

– Porting network benchmarks Porting network benchmarks

– Difficulty in debugging multithreaded programsDifficulty in debugging multithreaded programs

– Verification of the functionality and timing Verification of the functionality and timing

Yan Luo, Jun Yang, Laxmi Bhuyan, Li Zhao, NePSim, IEEE Micro Special Issue on NP, Sept/Oct 2004, Intel IXP Summit Sept 2004, Users from UCSD, Univ. of Arizona, Georgia Tech, Northwestern Univ., Tsinghua Univ. NePSim has so far 3530 web page visits, 806 downloads by October 2006 since July  2004

Page 22: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

NePSim Software ArchitectureNePSim Software Architecture

Microengine (six)Microengine (six)

Memory (SRAM/SDRAM)

Network Device

Debugger

Statistic

Verification

Microengine SRAM

SDRAM Network Device

Stats

Debugger

Verification

NePSim

Page 23: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Power ModelPower Model

H/W componentH/W component Model TypeModel Type ToolTool ConfigurationsConfigurations

GPR per GPR per MicroengineMicroengine

ArrayArray XCactiXCacti 2 64-entry files, one read/write 2 64-entry files, one read/write port per fileport per file

Control store, Control store, scratchpadscratchpad

Cache w/o Cache w/o tag pathtag path

XCactiXCacti 4KB, 4byte per block, direct 4KB, 4byte per block, direct mapped, 10-bit addressmapped, 10-bit address

ALU, shifterALU, shifter ALU and ALU and shiftershifter

Wattch Wattch 32bit32bit

…… …… …… ……

Page 24: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

BenchmarksBenchmarks

ipfwdripfwdr

– IPv4 forwarding(header validation, IP lookup)IPv4 forwarding(header validation, IP lookup)

– Medium SRAM accessMedium SRAM access natnat

– Network address translationNetwork address translation

– Medium SRAM accessMedium SRAM access urlurl

– Examines payload for URL pattern Examines payload for URL pattern

– Heavy SDRAM accessHeavy SDRAM access md4md4

– Compute a 128-bit message “signature”Compute a 128-bit message “signature”

– Heavy computation and SDRAM accessHeavy computation and SDRAM access

Page 25: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Verification of NePSimVerification of NePSim

NePSimIXP1200 PerformanceStatistics

benchmarks

?=

23990 inst.(pc=129) executed

24008 sram req issued

24009 ….

23990 inst.(pc=129) executed

24008 sram req issued

24009 ….

Assertion Based Verification(Linear Temporal Logic/Logic Of Constraint)

X. Chen, Y. Luo, H. Hsieh, L. Bhuyan, F. Balarin, "Utilizing Formal Assertions for System Design of Network Processors," Design Automation and Test in Europe (DATE), 2004.

Page 26: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Performance-Power TrendPerformance-Power Trend Power consumption increases faster than

performance

url ipfwdr

md4 nat

Power

Performance

Power

Power

Power

Performance

Performance

Performance

Page 27: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Dynamic Voltage ScalingDynamic Voltage Scaling

Reduce PE voltage and frequency when PE has idle timeReduce PE voltage and frequency when PE has idle time

Voltage Frequency

Power = C • α • V2 • f

Page 28: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Power Reduction with DVSPower Reduction with DVS

Yan Luo, Jun Yang, Laxmi Bhuyan, Li Zhao, NePSim: A Network Processor Simulator with Power Evaluation Framework, IEEE Micro Special Issue on Network Processors, Sept/Oct 2004

Power Reduction

Perf. Reduction

url ipfwdr md4 nat avg

Page 29: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Power Saving by Clock GatingPower Saving by Clock Gating

Shutdown unnecessary PEs, re-activate PEs when needed

Clock gating retains PE instructions

Yan Luo, Jia Yu, Jun Yang, Laxmi Bhuyan, Low Power Network Processor Design Using Clock Gating, IEEE/ACM Design Automation Conference (DAC), June , 2005 , Extended Version to appear in ACM Trans on Architecture and Code Optimization

Page 30: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Challenges of Clock Gating PEsChallenges of Clock Gating PEs

Terminating threads safelyTerminating threads safely– Threads request memory resources Threads request memory resources

– Stop unfinished threads result in resource leakageStop unfinished threads result in resource leakage

Reschedule packets to avoid “orphan” ports Static thread-port mapping prohibits shutting down

PEs Dynamically assign packets to any waiting threads

Avoid “extra” packet loss Burst packet arrival can overflow internal buffer Use a small extra buffer space to handle burst

Page 31: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Experiment Results of Clock GatingExperiment Results of Clock Gating

<4% reduction on system throughput

Page 32: Page 1 John Morgan Infrastructure Processor Division September 2004 Intel® IXP2XXX Network Processor Architecture Overview

Main ContributionsMain Contributions

Constructed an execution driven multiprocessor router simulation Constructed an execution driven multiprocessor router simulation framework, proposed a set of benchmark applications and framework, proposed a set of benchmark applications and evaluated performance evaluated performance

Built NePSim, the first open-source network processor simulator, Built NePSim, the first open-source network processor simulator, ported network benchmarks and conducted performance and ported network benchmarks and conducted performance and power evaluationpower evaluation

Applied dynamic voltage scaling to reduce power consumptionApplied dynamic voltage scaling to reduce power consumption Used clock gating to adapt number of active PEs according to real-Used clock gating to adapt number of active PEs according to real-

time traffictime traffic