3 codesign chap3 mlp
DESCRIPTION
co designTRANSCRIPT
-
Hardware-Software Co-partitioning for Distributed Embedded Systems
-
Outline 1. Introduction2. Related Work3. Distributed Embedded System and System Model 4. Multi-Level Partitioning 5. Case Study
-
1. IntroductionHardware-Software CodesignDistributed Embedded SystemMotivationTask GraphPhysical RestrictionsDistributed Embedded System Codesign (DESC)Object Modeling Technique (OMT)Linear Hybrid Automata (LHA) SES Models
-
1. Introduction (cont)Multi-Level PartitioningPartitioning AlgorithmSharing, ClusteringCase Studies
-
2. Related Work Target Embedded System1-CPU and 1-ASIC Topologyn-CPU and m-ASIC TopologyOptimal CodesignHeuristic Codesign
-
2. Related Work (cont) Codesign of 1-CPU and 1-ASIC Topology Kumar et al. 1993 Kalavade and Lee 1993 Thomas et al. 1993 Gupta and De Micheli 1993 Barros et al. 1994
-
2. Related Work (cont)Codesign of n-CPU and m-ASIC TopologyOptimal Codesign Approaches: Mixed integer linear programming Prakash and Parker 1992 Exhaustive search Wolf 1994, Haworth et al. 1993 DAmbrosio and Hu 1994
-
2. Related Work (cont) Heuristic Codesign Approaches: Iterative and Constructive
Iterative: Dick and Jha 1998 --- MOGAC, CORDS Dick and Jha 1999 --- MOCYN
-
2. Related Work (cont) Constructive: Wolf 1996 --- object-oriented Yen and Wolf 1996 --- sensitivity-driven Dave, Lakshminarayana, and Jha 1999 --- COSYN Dave and Jha 1999 --- COFTA Dave and Jha 1998 --- COHRA Our proposed: Distributed Embedded System Codesign (DESC)
-
3. Distributed Embedded Systems and System Models An embedded computer system is a system which uses computers but is not a general-purpose computer.In 1971, there were about 142,000 computers world-wide. In 1999, there are now some 350 to 400 million personal computers alone and at least of magnitude more embedded devices.
-
3. Distributed Embedded Systems and System Models (cont)There are several reasons to build distributed hardware engine for embedded systemCheaper Faster response timeThe devices control may be physically distributed
-
3. Distributed Embedded Systems and System Models (cont)System Models Object Modeling Technique (OMT) Models Object Model Dynamic Model Functional Model
-
3. Distributed Embedded Systems and System Models (cont)Linear Hybrid Automata (LHA) ModelsInternal system modelFor verifying systemsSES ModelsSES/workbench is a popular modeling and simulation tool for system performance evaluation
-
4. Multi-Level PartitioningMulti-Level Partitioning (MLP) Three Main PhasesCodesign Space Exploration (CSE)System Structural Partitioning (SSP)Binary Search Copartitioning (BSC)
-
Fig. 5.1 Overall Flow Chart of Multi-Level Partitioning
Overall Flow Chart of Multi-Level Partitioning
No
No
Number of CPU and hardware cost
Yes
Yes
Output Heuristically Optimal Partition
Last Design Alternative?
Last Structural Partition?
Copartitioning
Generate Structural Partition
Explore Design Space
CPU allocation to distributed subsystems
Initialization
SSP level
CPU Sharing
ASIC Sharing
Next structural partition
CSE level
Hardware Clustering
Software Grouping
BSC level
-
Detailed Flow Diagram of Multi-Level Partitioning
No satisfactory partition
Select number of CPU and hardware cost (Explore Design Space)
Increase hardware objects
Increase software objects
Cost constraint is satisfied, but performance constraints are not satisfied
Cost constraint is not satisfied, but performance constraints are satisfied
Cost and performance constraints are satisfied
Cost is more important
Performance is more important
Store structural partition result and perform sharing and clustering
Yes
Check if the partition is a heuristically optimal solution?
Check if the partition result satisfies system constraints
Use software to implement all objects with CPD ratios not less than that of the selected median object.
Use hardware to implement all objects with CPD ratios less than that of the selected median object.
Select an object with median CPD ratio
Sort all MLA objects in an ascending order of their CPD ratios
Calculate CPD ratios of each object in MLA
Place all objects of hardware parts into ILA and all other objects into MLA
Yes
Allocate CPU to distributed Subsystem
(Generate Structural Partition)
Yes
Output least costly partition
No
Partition found?
LHA Models
OMT Models
Initialization
Copartitioning
CSE level
SSP level
BSC level
Next structural partition
Last design alternative?
Last structural partition?
No
Yes
Print No partition
No
-
where x is a objectCPD: Cost-Performance Difference4. Multi-Level Partitioning (cont)
-
4. Multi-Level Partitioning (cont)CPU/ASIC Sharing Sharing Threshold Distance (STD) SLI: Subsystem Location Inter-distance
0
SLI:
(
Sharing
No Sharing
STD
_1001885837.doc
EMBED Word.Picture.8
_1001523197.doc
Sharing
No Sharing
_1002113725.doc
EMBED Word.Picture.8
_1001523197.doc
Sharing
No Sharing
_1002113715.doc
Sharing
No Sharing
_1001885821.doc
EMBED Word.Picture.8
_1001523197.doc
Sharing
No Sharing
-
Interconnect Cost (IC) Model IC (X1, X2) = SLI(S1, S2) #Link(X1, S2) BW(X1, S2) + EC(X1)
SLI: Subsystem Location Inter-distanceS1 and S2 : SubsystemsX1 and X2 : A component (PE or ASIC) : A parameter that depends on the interconnection technology#Link(X1, S2) : The number of links between X1 and S2BW(X1, S2) : The communication bandwidth between X1 and S2EC(X1) : The cost for enhancing X1 such that both S1 and S2 can use X1. 4. Multi-Level Partitioning (cont)
-
4. Multi-Level Partitioning (cont)Algorithm 5.2 Share Components AlgorithmShare_Components(s){/* s=, si=(si1, si2) where si1 is the number of PE in subsystem Si and si2 is the number of ASIC in subsystem Si. si1, si2{0,1, } */for (i = 1, i , i++) { for (j = i, j , j++) {if SLI(si, sj) STD { if (si1 0 sj1 0) Share_PE(Si, Sj); /* Refer to Algorithm 5.3 */ if (si2 0 sj2 0) Share_ASIC(Si, Sj); /* Refer to Algorithm 5.4 */} }}}
-
Hardware Clustering and Software GroupingIn DESC, hardware clustering is based on Kernighan and Lin basic graph partitioning algorithm, but it is enhanced to include DEMS characteristics.Software grouping technique similar to load balancing on multiple processors 4. Multi-Level Partitioning (cont)
-
4. Multi-Level Partitioning (cont)Analysis and Validation of MLPComplexity analysis
r: the number of objects : the number of subsystems
-
5. Case StudiesVehicle Parking Management System (VPMS)Examples of Sharing and Clustering in MLPApplication of MLP to Coal Mine System
-
5. Case Studies (cont)Vehicle Parking Management System (VPMS)VPMS Specifications A VPMS consists of three subsystems: ENTRY management, EXIT management, and DISPLAY. An ENTRY (or an EXIT) subsystem consists of three parts: a ticket facility, a gate controlled by a gate-motor, and a pair of sensors.A DISPLAY subsystem
-
Constraints for the VPMS system A maximum cost of $1,300,A maximum display response time of 14,000 s, and A maximum ENTRY (EXIT) gate response time of 250 s. 7. Case Study (cont)
-
Specification and Mapping of VPMS VPMS is described using OMT models consisting of Object Dynamic, and Functional models. 5. Case Study (cont)
-
Object Model of VPMS
Vehicle Parking
Management System
ENTRY Management Subsystem
Display Subsystem
Gate Controller
Ticket Checker
Motor
Control Unit
ENTRY gate
EXIT gate
isa
isa
Sensor
Send/Receive Device
Control Unit
ENTRY Sensor
EXIT Sensor
isa
isa
Display Device
Control System
Counter
Display Interface
7-Segment
LCD
Dot Matrix
Time Stamp
EXIT Management Subsystem
: Represent Aggregation
: Represent Generalization
isa: is a kind of
-
Dynamic Model of a DISPLAY Subsystem
Decrement
counter
Update Display
Idle
Car in
Increment
counter
Car out
Push time stamp button
Read count
Count > 0, send ACK!
Count = 0,
out of space
-
Functional Model of a DISPLAY Subsystem
Counter Data
Car in signal
Car out signal
Update
Display
Decrement
Counter
ENTRY Sensor
EXIT Sensor
Increment
Counter
Counter
-
LHA Model of VPMS Hardware LHA Model Software LHA Model5. Case Study (cont)
-
Hardware LHA of a DISPLAY Subsystem
t = 18ns
t = 42ns, t := 0
Count := Count + 1
t = 42ns, t := 0
Count := Count ( 1
Car in
t := 0
Push time stamp button
t := 0
Car out
t := 0
t = 100ns
Read Count
Increment Counter
Idle
Decrement Counter
Count:=500
t:= 0,
Update Display
-
Software LHA of a DISPLAY Subsystem
Count:=500
t:= 0, x := 0,
t = 5.12s,
t := 0,
x ( 33ms, x := 0
t = 10s ,
t := 0
t = 3.2s, t := 0
Count := Count +1
x ( 33ms, x := 0
t = 3.2s, t := 0
Count := Count ( 1
x ( 33ms, x := 0
Car in,
t := 0
Push time stamp button,
t := 0
Car out,
t := 0
t = 10ms, t:= 0
x := 0
Read Count
Increment Counter
Polling
Decrement Counter
Update Display
-
SES Models Using SES/workbench Model A car-simulator An ENTRY management subsystem An EXIT management subsystem A DISPLAY subsystem 5. Case Study (cont)
-
SES Model of a DISPLAY Subsystem5. Case Study (cont)
-
Applying MLP to VPMS5. Case Study (cont)
Calculation of CPD for VPMS Parts
Hardware
Cost
Software
Cost
Hardware Performance
Software Performance
CPD
Sensor Driver
115
90
210
1,030
7.622
Counter
120
90
290
13,200
32.533
Motor Driver
260
90
820
1,030
202.381
-
5. Case Study (cont)
Applying MLP to the VPMS Example
Codesign Space Exploration (CSE)
(Number of CPU)
Binary Search Copartitioning (BSC)
Feasi-bility
Partitions(SSP)
Cost ($)
Response time ((s)
(sensor to display)
Response time ((s)
(sensor to gate)
0
A(HC, HS, HM)
1,450
190
0.2
No
1
B(HC, HS, SM)
1,280
190
215.0
Yes
2
C(HC, HS, EMBED Equation.3 )
1,370
13,200
820.0
No
D(SC, HS, SM)
1,250
13,100
215.0
Yes
3
E( EMBED Equation.3 , HS, SM)
1,340
13,100
210.0
No
F(SC, SS, SM)
1,225
13,200
1,030.0
No
H: hardware, S: software, subscripts: C = Counter, S = Sensor Driver, M = Motor Driver,
superscripts: 1 ( One CPU, 2 ( Two CPUs, 3 ( Three CPUs
_1009110897.unknown
_1009110949.unknown
-
VPMS Emulation Block Diagram for Prototype D(SC, HS, SM) 5. Case Study (cont)
_1037297279.doc
Push time stamp button(i)
Acknowledgment(o)
Ticket taken(i)
Signal Processing
Signal Processing
Car in(i)
Parking fees paid(i)
Display scan data(o)
Open(o) or Close(o)
Open(o) or Close(o)
Car out(i)
Ticket Checker
Exit Sensor & Driver
Entry Sensor & Driver
Display Device
Interface
M
Exit gate
Single-chip Processor
(8751)
Entry gate
M
Interface
Time Stamp
Machine
Single-chip Processor
(8751)
-
VPMS Emulation Results5. Case Study (cont)
-
Examples of Sharing and Clustering in MLPSharing and clustering techniques in MLP based on several variants of the VPMS case study. How object oriented modeling can be advantageous in hierarchical partitioning. Coal mine control and monitoring system5. Case Study (cont)
-
Advantage of Sharing in MLP
Partitioning Results for three VPMS Specifications
with and without Sharing
Specifications
VPMS-1
VPMS-2
VPMS-3
STD (m)
1.0
1.0
1.0
SLI(ENTRY, EXIT) (m)
6.0
0.5
0.8
SLI(Display, EXIT) (m)
7.0
3.0
0.5
SLI(Display, ENTRY) (m)
2.0
3.0
0.5
Partitioning Results
Number and Locations of PE
3
(1) ENTRY gate control
(2) EXIT gate control
(3) Display
2
(1) ENTRY/
EXIT gate control
(2) Display
1
(1)ENTRY/
EXIT/
Display Subsystem
Number and Locations of ASIC
2
(1) ENTRY sensor control
(2) EXIT sensor control
1
(1) ENTRY/
EXIT sensor
1
(1) ENTRY/
EXIT/
Display Subsystem Interface
System Cost ($)
1,430
1,250
1,180
Performance
Display response time ((s)
13,200
13,200
14,020
Gate response time ((s)
210
210
1030
MLP Execution Time (sec)
0.602
3.857
14.789
-
Advantage of Clustering in MLP
Partitioning Results for five VPMS Specifications
with and without Clustering
Specifications
VPMS-A
VPMS-B
VPMS-C
VPMS-D
VPMS-E
Number of Subsystems
1
2
2
2
3
Subsystems
(1) ENTRY/
EXIT/
Display Subsystem
(1) ENTRY/
EXIT Subsystem
(2) Display Subsystem
(1) ENTRY/
Display Subsystem
(2) EXIT Subsystem
(1) ENTRY Subsystem
(2) EXIT/
Display Subsystem
(1) ENTRY Subsystem
(2) EXIT Subsystem
(3) Display Subsystem
Partitioning Results
Number and locations of PE
1
(1) Motor Driver/
Counter
2
(1) Motor Driver
(2) Counter
2
(1) ENTRY Motor Driver/
Counter
(2) EXIT Motor Driver
2
(1) ENTRY Motor Driver
(2) EXIT Motor Driver/
Counter
3
(1) ENTRY Motor Driver
(2) EXIT Motor Driver
(3) Counter
Number and locations of ASIC
1
(1) Sensor Driver
1
(1) Sensor Driver
2
(1) ENTRY Sensor
(2) EXIT Sensor
2
(1) ENTRY Sensor
(2) EXIT Sensor
2
(1) ENTRY Sensor
(2) EXIT Sensor
System Cost ($)
1,180
1,250
1,340
1,340
1,430
Perfor-mance
Display response time ((s)
14,020
13,200
13,100
13,100
13,200
Gate response time ((s)
1,030
210
110
110
110