a methodology for evaluating runtime support in network processors
DESCRIPTION
A Methodology for Evaluating Runtime Support in Network Processors. University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu. Runtime Support in Network Processor. Network processor (NP) Multi-core system-on-chip - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/1.jpg)
Department of Electrical and Computer Engineering
University of Massachusetts, Amherst Xin Huang and Tilman Wolf
{xhuang,wolf}@ecs.umass.edu
A Methodology for Evaluating Runtime Support in Network
Processors
![Page 2: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/2.jpg)
2Department of Electrical and Computer Engineering
Runtime Support in Network Processor
Network processor (NP)• Multi-core system-on-chip• Programmability & high packet processing rate
Heterogeneous resources• Control processors• Multiple packet processors• Co-processors• Memory hierarchy• Interconnection
Runtime support• Dynamic task allocation
Receiveand
Transmit
Scratchpad
Hash Unit
μEμEμEμE
μEμEμEμE
SRAMand
DRAMInterface
XscaleControl
Processor
μEμEμEμE
μEμEμEμE
IXP 2800
![Page 3: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/3.jpg)
3Department of Electrical and Computer Engineering
Receiveand
Transmit
Scratchpad
Hash Unit
μEμEμEμE
μEμEμEμE
SRAMand
DRAMInterface
XscaleControl
Processor
μEμEμEμE
μEμEμEμE
NP Hardware Resources
SRAM
Flash
Memory Mapped I/O
SDRAM
Workload
Task Allocation on the Processors
Runtime Mapping
General Operation of Runtime Support in NP
Input• Hardware resources• Workload
Mapping method Output
• Task allocation
Dynamic adaptation• Different runtime
support systems• Difficult to compare
AP2
AP1
AP3AP2 AP3AP3
![Page 4: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/4.jpg)
4Department of Electrical and Computer Engineering
Contributions
Evaluation methodology• Traffic representation• Analytical system model based on queuing networks• Results
Specific: 3 example runtime support systemI. Ideal AllocationII. Full Processor Allocation
• R. Kokku, T. Riche, A. Kunze, J. Mudigonda, J. Jason, and H. Vin. A case for run-time adaptation in packet processing systems. In Proc. of the 2nd workshop on Hot Topics in Networks (HOTNETS-II), Cambridge, MA, Nov. 2003
III.Partitioned Application Allocation• T. Wolf, N. Weng, and C.-H. Tai. Design consideration for network
processor operating systems. In Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication System (ANCS), pages 71-80, Princeton, NJ, Oct. 2005
![Page 5: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/5.jpg)
5Department of Electrical and Computer Engineering
Outline
Introduction Evaluation Methodology
• Dynamic Workload Model• Runtime System Model
Result Summary
![Page 6: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/6.jpg)
6Department of Electrical and Computer Engineering
Workload
NP workload is characterized by applications and traffic
How to represent workload?
![Page 7: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/7.jpg)
7Department of Electrical and Computer Engineering
Dynamic Workload Model
Workload graph:• Application/Task: T• Traffic: • Processing requirement:
Example:
Processing requirement:• R. Ramaswamy and T. Wolf. PacketBench: A tool for workload
characterization of network processing. In Proc. of IEEE 6th Annual Workshop on Workload Characterization (WWC-6), page 42-50, Austin, TX, Oct. 2003
( , )W T U
,t tU R( )iD t
![Page 8: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/8.jpg)
8Department of Electrical and Computer Engineering
Outline
Introduction Evaluation Methodology
• Dynamic Workload Model• Runtime System Model
Result Summary
![Page 9: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/9.jpg)
9Department of Electrical and Computer Engineering
Runtime System Model
Unified approach for all runtime systems• Queuing networks• Specific solution for each runtime system
• Runtime mapping: • Graph:• Packet arrival rate:• Service time:
Metrics for all runtime systems• Processor utilization:• Average number of packets in the system:
( , )i jD t p,ti j
:t tM T P( , )S P Q
K
![Page 10: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/10.jpg)
10Department of Electrical and Computer Engineering
Three Example Runtime Support Systems
System I: Ideal Allocation System II: Full Processor Allocation System III: Partitioned Application Allocation
Workload
T1 T2T2
T1 & T2T1 & T2
T1 & T2T1 & T2
T1
T2 T2
T1_1
T2_1T2_1T2_1
T1_2T2_2T2_2
T1_4T2_4T2_4
T1_3T2_3T2_3
Ideal Allocation Full Processor Allocation Partitioned Application Allocation
![Page 11: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/11.jpg)
11Department of Electrical and Computer Engineering
Example Evaluation Model – System I
Ideal Allocation • All processors can process all packets completely• Unrealistic, but can provide baseline
M/G/m FCFS single station
![Page 12: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/12.jpg)
12Department of Electrical and Computer Engineering
M/G/m Single Station Queuing System
Cosmetatos approximation
Evaluation metrics
2 2/ / / / / /
11
/ /
0
1/ / / /
(1 ) ,
( ) ( ) ( ) 1; ; [ ] ,
(1 ) !(1 ) ! ! (1 )
1 1 4 5 2; (1 (1 )( 1) )
2 16
M G m M M m M D mB B
m k mmm
M M m mk
M D m M M m DmDm
W c W c W
where
P m m mW P
m m m k m
and
mW W nc m
nc m
K W m
G. Cosmetatos. Some Approximate Equilibrium Results for the Multiserver Queue (M/G/r). Operations Research Quarterly, USA, pages 615 – 620, 1976
G. Bolch, S. Greiner, H. de Meer, and K. S. Trivedi. Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. John Wiley & Sons, Inc., New York, NY, August 1998
;m
![Page 13: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/13.jpg)
13Department of Electrical and Computer Engineering
Example Evaluation Model – System II
Full Processor Allocation• Allocate entire tasks to subsets of processors• Allocate as few processors as possible to save power• One processor run one type of task• Reallocation is triggered by queue length
BCMP M/M/1-FCFS model
(Jackson network)
![Page 14: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/14.jpg)
14Department of Electrical and Computer Engineering
BCMP Network
BCMP: Basket, Chandy, Muntz, and Palacios Characteristics: Open, closed, and mixed queuing network;
Several job classes; Four types of nodes: M/M/m–FCFS (class-independent service time), M/G/1–PS, M/G/∞–IS, and M/G/1–LCFS PR
Product-form steady-state solution: Open M/M/1-FCFS BCMP Queuing Network:
• Evaluation metrics:
11
1( ,..., ) ( ) ( ),
( )
N
N i ii
s s d s n sG K
11
( ,..., ) ( ), ( ) (1 ) i
Nk
N i i i i i ii
k k k k
F. Baskett, K. Chandy, R. Muntz, and F. Palacios. Open, Closed, and Mixed Networks of Queues wit Different Classes of Customers. Journal of the ACM, 22(2): 248 – 260, April 1975
,1 1 1
,1
C C Cir ir
i iri ir ir rr r r i i
eK K
![Page 15: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/15.jpg)
15Department of Electrical and Computer Engineering
Example Evaluation Model – System III
Partitioned Application Allocation• Tasks be partitioned across multiple processors• Synchronized pipelines• Allocate tasks equally across all processors to maximize
throughput• Reallocate at fixed time intervals
Equations for evaluation metrics are the same as System II.
BCMP M/M/1-FCFS model(Jackson network)
![Page 16: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/16.jpg)
16Department of Electrical and Computer Engineering
Outline
Introduction Evaluation Methodology
• Dynamic Workload Model• Runtime System Model
Result Summary
![Page 17: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/17.jpg)
17Department of Electrical and Computer Engineering
Setup
System• 16 100MIPS processing engines • Queue lengths are infinite
Workload
Other assumptions• Partition applications into 7-15 subtasks
![Page 18: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/18.jpg)
18Department of Electrical and Computer Engineering
Processor Allocation Over Time
Ideal:• 16 processors
Full Processor:• Change with traffic
Partitioned Application:• 16 processors
Full processor allocation system
![Page 19: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/19.jpg)
19Department of Electrical and Computer Engineering
Processor Utilization Over Time
Ideal:• Lowest processor
utilization Full Processor:
• Highest processor utilization because using fewer number of processors
Partitioned Application:• Low processor utilization• Not equal to ideal case
due to the unbalanced task allocation and pipeline overhead
![Page 20: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/20.jpg)
20Department of Electrical and Computer Engineering
Packets in System Over Time
Ideal:• Least number of packets
Full Processor:• Packets queued up due to
its high processor utilization
Partitioned Application:• Most number of packets
due to unbalanced task allocation and pipeline overhead
• More stable performance because of finer processor allocation granularity
![Page 21: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/21.jpg)
21Department of Electrical and Computer Engineering
Performance for Different Data Rates
Ideal:• Smooth increase
Full Processor: • Periodical peak
Partitioned Application:• Smooth increase
The maximum data rate supported by the systems• Ideal: 100%• Full Processor: 79.6%• Partitioned application:
75.1%
![Page 22: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/22.jpg)
22Department of Electrical and Computer Engineering
Implication of the Results
Ideal Allocation• Provide a base line
Full Processor Allocation• Allocate as few processors as possible to save power• Use entire processor as the allocation granularity• Good: High processor utilization• Bad: High performance variance
Partitioned Application Allocation• Equally distribute tasks on all the processors• Finer processor allocation granularity• Good: Stable performance• Bad: Difficult to get optimized solution => pipeline
synchronization overhead
![Page 23: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/23.jpg)
23Department of Electrical and Computer Engineering
Summary
Analytical methodology for evaluating different runtime support NP systems
Dynamic workload model and runtime system model
Results: 3 example runtime support systems• Quantitative metrics• Tradeoffs
![Page 24: A Methodology for Evaluating Runtime Support in Network Processors](https://reader036.vdocuments.site/reader036/viewer/2022062519/568150eb550346895dbf056f/html5/thumbnails/24.jpg)
24Department of Electrical and Computer Engineering
Questions ?