3 codesign chap3 mlp

41
Hardware-Software Co-partitioning for Distributed Embedded Systems

Upload: raghu-ram

Post on 16-Dec-2015

247 views

Category:

Documents


1 download

DESCRIPTION

co design

TRANSCRIPT

  • Hardware-Software Co-partitioning for Distributed Embedded Systems

  • Outline 1. Introduction2. Related Work3. Distributed Embedded System and System Model 4. Multi-Level Partitioning 5. Case Study

  • 1. IntroductionHardware-Software CodesignDistributed Embedded SystemMotivationTask GraphPhysical RestrictionsDistributed Embedded System Codesign (DESC)Object Modeling Technique (OMT)Linear Hybrid Automata (LHA) SES Models

  • 1. Introduction (cont)Multi-Level PartitioningPartitioning AlgorithmSharing, ClusteringCase Studies

  • 2. Related Work Target Embedded System1-CPU and 1-ASIC Topologyn-CPU and m-ASIC TopologyOptimal CodesignHeuristic Codesign

  • 2. Related Work (cont) Codesign of 1-CPU and 1-ASIC Topology Kumar et al. 1993 Kalavade and Lee 1993 Thomas et al. 1993 Gupta and De Micheli 1993 Barros et al. 1994

  • 2. Related Work (cont)Codesign of n-CPU and m-ASIC TopologyOptimal Codesign Approaches: Mixed integer linear programming Prakash and Parker 1992 Exhaustive search Wolf 1994, Haworth et al. 1993 DAmbrosio and Hu 1994

  • 2. Related Work (cont) Heuristic Codesign Approaches: Iterative and Constructive

    Iterative: Dick and Jha 1998 --- MOGAC, CORDS Dick and Jha 1999 --- MOCYN

  • 2. Related Work (cont) Constructive: Wolf 1996 --- object-oriented Yen and Wolf 1996 --- sensitivity-driven Dave, Lakshminarayana, and Jha 1999 --- COSYN Dave and Jha 1999 --- COFTA Dave and Jha 1998 --- COHRA Our proposed: Distributed Embedded System Codesign (DESC)

  • 3. Distributed Embedded Systems and System Models An embedded computer system is a system which uses computers but is not a general-purpose computer.In 1971, there were about 142,000 computers world-wide. In 1999, there are now some 350 to 400 million personal computers alone and at least of magnitude more embedded devices.

  • 3. Distributed Embedded Systems and System Models (cont)There are several reasons to build distributed hardware engine for embedded systemCheaper Faster response timeThe devices control may be physically distributed

  • 3. Distributed Embedded Systems and System Models (cont)System Models Object Modeling Technique (OMT) Models Object Model Dynamic Model Functional Model

  • 3. Distributed Embedded Systems and System Models (cont)Linear Hybrid Automata (LHA) ModelsInternal system modelFor verifying systemsSES ModelsSES/workbench is a popular modeling and simulation tool for system performance evaluation

  • 4. Multi-Level PartitioningMulti-Level Partitioning (MLP) Three Main PhasesCodesign Space Exploration (CSE)System Structural Partitioning (SSP)Binary Search Copartitioning (BSC)

  • Fig. 5.1 Overall Flow Chart of Multi-Level Partitioning

    Overall Flow Chart of Multi-Level Partitioning

    No

    No

    Number of CPU and hardware cost

    Yes

    Yes

    Output Heuristically Optimal Partition

    Last Design Alternative?

    Last Structural Partition?

    Copartitioning

    Generate Structural Partition

    Explore Design Space

    CPU allocation to distributed subsystems

    Initialization

    SSP level

    CPU Sharing

    ASIC Sharing

    Next structural partition

    CSE level

    Hardware Clustering

    Software Grouping

    BSC level

  • Detailed Flow Diagram of Multi-Level Partitioning

    No satisfactory partition

    Select number of CPU and hardware cost (Explore Design Space)

    Increase hardware objects

    Increase software objects

    Cost constraint is satisfied, but performance constraints are not satisfied

    Cost constraint is not satisfied, but performance constraints are satisfied

    Cost and performance constraints are satisfied

    Cost is more important

    Performance is more important

    Store structural partition result and perform sharing and clustering

    Yes

    Check if the partition is a heuristically optimal solution?

    Check if the partition result satisfies system constraints

    Use software to implement all objects with CPD ratios not less than that of the selected median object.

    Use hardware to implement all objects with CPD ratios less than that of the selected median object.

    Select an object with median CPD ratio

    Sort all MLA objects in an ascending order of their CPD ratios

    Calculate CPD ratios of each object in MLA

    Place all objects of hardware parts into ILA and all other objects into MLA

    Yes

    Allocate CPU to distributed Subsystem

    (Generate Structural Partition)

    Yes

    Output least costly partition

    No

    Partition found?

    LHA Models

    OMT Models

    Initialization

    Copartitioning

    CSE level

    SSP level

    BSC level

    Next structural partition

    Last design alternative?

    Last structural partition?

    No

    Yes

    Print No partition

    No

  • where x is a objectCPD: Cost-Performance Difference4. Multi-Level Partitioning (cont)

  • 4. Multi-Level Partitioning (cont)CPU/ASIC Sharing Sharing Threshold Distance (STD) SLI: Subsystem Location Inter-distance

    0

    SLI:

    (

    Sharing

    No Sharing

    STD

    _1001885837.doc

    EMBED Word.Picture.8

    _1001523197.doc

    Sharing

    No Sharing

    _1002113725.doc

    EMBED Word.Picture.8

    _1001523197.doc

    Sharing

    No Sharing

    _1002113715.doc

    Sharing

    No Sharing

    _1001885821.doc

    EMBED Word.Picture.8

    _1001523197.doc

    Sharing

    No Sharing

  • Interconnect Cost (IC) Model IC (X1, X2) = SLI(S1, S2) #Link(X1, S2) BW(X1, S2) + EC(X1)

    SLI: Subsystem Location Inter-distanceS1 and S2 : SubsystemsX1 and X2 : A component (PE or ASIC) : A parameter that depends on the interconnection technology#Link(X1, S2) : The number of links between X1 and S2BW(X1, S2) : The communication bandwidth between X1 and S2EC(X1) : The cost for enhancing X1 such that both S1 and S2 can use X1. 4. Multi-Level Partitioning (cont)

  • 4. Multi-Level Partitioning (cont)Algorithm 5.2 Share Components AlgorithmShare_Components(s){/* s=, si=(si1, si2) where si1 is the number of PE in subsystem Si and si2 is the number of ASIC in subsystem Si. si1, si2{0,1, } */for (i = 1, i , i++) { for (j = i, j , j++) {if SLI(si, sj) STD { if (si1 0 sj1 0) Share_PE(Si, Sj); /* Refer to Algorithm 5.3 */ if (si2 0 sj2 0) Share_ASIC(Si, Sj); /* Refer to Algorithm 5.4 */} }}}

  • Hardware Clustering and Software GroupingIn DESC, hardware clustering is based on Kernighan and Lin basic graph partitioning algorithm, but it is enhanced to include DEMS characteristics.Software grouping technique similar to load balancing on multiple processors 4. Multi-Level Partitioning (cont)

  • 4. Multi-Level Partitioning (cont)Analysis and Validation of MLPComplexity analysis

    r: the number of objects : the number of subsystems

  • 5. Case StudiesVehicle Parking Management System (VPMS)Examples of Sharing and Clustering in MLPApplication of MLP to Coal Mine System

  • 5. Case Studies (cont)Vehicle Parking Management System (VPMS)VPMS Specifications A VPMS consists of three subsystems: ENTRY management, EXIT management, and DISPLAY. An ENTRY (or an EXIT) subsystem consists of three parts: a ticket facility, a gate controlled by a gate-motor, and a pair of sensors.A DISPLAY subsystem

  • Constraints for the VPMS system A maximum cost of $1,300,A maximum display response time of 14,000 s, and A maximum ENTRY (EXIT) gate response time of 250 s. 7. Case Study (cont)

  • Specification and Mapping of VPMS VPMS is described using OMT models consisting of Object Dynamic, and Functional models. 5. Case Study (cont)

  • Object Model of VPMS

    Vehicle Parking

    Management System

    ENTRY Management Subsystem

    Display Subsystem

    Gate Controller

    Ticket Checker

    Motor

    Control Unit

    ENTRY gate

    EXIT gate

    isa

    isa

    Sensor

    Send/Receive Device

    Control Unit

    ENTRY Sensor

    EXIT Sensor

    isa

    isa

    Display Device

    Control System

    Counter

    Display Interface

    7-Segment

    LCD

    Dot Matrix

    Time Stamp

    EXIT Management Subsystem

    : Represent Aggregation

    : Represent Generalization

    isa: is a kind of

  • Dynamic Model of a DISPLAY Subsystem

    Decrement

    counter

    Update Display

    Idle

    Car in

    Increment

    counter

    Car out

    Push time stamp button

    Read count

    Count > 0, send ACK!

    Count = 0,

    out of space

  • Functional Model of a DISPLAY Subsystem

    Counter Data

    Car in signal

    Car out signal

    Update

    Display

    Decrement

    Counter

    ENTRY Sensor

    EXIT Sensor

    Increment

    Counter

    Counter

  • LHA Model of VPMS Hardware LHA Model Software LHA Model5. Case Study (cont)

  • Hardware LHA of a DISPLAY Subsystem

    t = 18ns

    t = 42ns, t := 0

    Count := Count + 1

    t = 42ns, t := 0

    Count := Count ( 1

    Car in

    t := 0

    Push time stamp button

    t := 0

    Car out

    t := 0

    t = 100ns

    Read Count

    Increment Counter

    Idle

    Decrement Counter

    Count:=500

    t:= 0,

    Update Display

  • Software LHA of a DISPLAY Subsystem

    Count:=500

    t:= 0, x := 0,

    t = 5.12s,

    t := 0,

    x ( 33ms, x := 0

    t = 10s ,

    t := 0

    t = 3.2s, t := 0

    Count := Count +1

    x ( 33ms, x := 0

    t = 3.2s, t := 0

    Count := Count ( 1

    x ( 33ms, x := 0

    Car in,

    t := 0

    Push time stamp button,

    t := 0

    Car out,

    t := 0

    t = 10ms, t:= 0

    x := 0

    Read Count

    Increment Counter

    Polling

    Decrement Counter

    Update Display

  • SES Models Using SES/workbench Model A car-simulator An ENTRY management subsystem An EXIT management subsystem A DISPLAY subsystem 5. Case Study (cont)

  • SES Model of a DISPLAY Subsystem5. Case Study (cont)

  • Applying MLP to VPMS5. Case Study (cont)

    Calculation of CPD for VPMS Parts

    Hardware

    Cost

    Software

    Cost

    Hardware Performance

    Software Performance

    CPD

    Sensor Driver

    115

    90

    210

    1,030

    7.622

    Counter

    120

    90

    290

    13,200

    32.533

    Motor Driver

    260

    90

    820

    1,030

    202.381

  • 5. Case Study (cont)

    Applying MLP to the VPMS Example

    Codesign Space Exploration (CSE)

    (Number of CPU)

    Binary Search Copartitioning (BSC)

    Feasi-bility

    Partitions(SSP)

    Cost ($)

    Response time ((s)

    (sensor to display)

    Response time ((s)

    (sensor to gate)

    0

    A(HC, HS, HM)

    1,450

    190

    0.2

    No

    1

    B(HC, HS, SM)

    1,280

    190

    215.0

    Yes

    2

    C(HC, HS, EMBED Equation.3 )

    1,370

    13,200

    820.0

    No

    D(SC, HS, SM)

    1,250

    13,100

    215.0

    Yes

    3

    E( EMBED Equation.3 , HS, SM)

    1,340

    13,100

    210.0

    No

    F(SC, SS, SM)

    1,225

    13,200

    1,030.0

    No

    H: hardware, S: software, subscripts: C = Counter, S = Sensor Driver, M = Motor Driver,

    superscripts: 1 ( One CPU, 2 ( Two CPUs, 3 ( Three CPUs

    _1009110897.unknown

    _1009110949.unknown

  • VPMS Emulation Block Diagram for Prototype D(SC, HS, SM) 5. Case Study (cont)

    _1037297279.doc

    Push time stamp button(i)

    Acknowledgment(o)

    Ticket taken(i)

    Signal Processing

    Signal Processing

    Car in(i)

    Parking fees paid(i)

    Display scan data(o)

    Open(o) or Close(o)

    Open(o) or Close(o)

    Car out(i)

    Ticket Checker

    Exit Sensor & Driver

    Entry Sensor & Driver

    Display Device

    Interface

    M

    Exit gate

    Single-chip Processor

    (8751)

    Entry gate

    M

    Interface

    Time Stamp

    Machine

    Single-chip Processor

    (8751)

  • VPMS Emulation Results5. Case Study (cont)

  • Examples of Sharing and Clustering in MLPSharing and clustering techniques in MLP based on several variants of the VPMS case study. How object oriented modeling can be advantageous in hierarchical partitioning. Coal mine control and monitoring system5. Case Study (cont)

  • Advantage of Sharing in MLP

    Partitioning Results for three VPMS Specifications

    with and without Sharing

    Specifications

    VPMS-1

    VPMS-2

    VPMS-3

    STD (m)

    1.0

    1.0

    1.0

    SLI(ENTRY, EXIT) (m)

    6.0

    0.5

    0.8

    SLI(Display, EXIT) (m)

    7.0

    3.0

    0.5

    SLI(Display, ENTRY) (m)

    2.0

    3.0

    0.5

    Partitioning Results

    Number and Locations of PE

    3

    (1) ENTRY gate control

    (2) EXIT gate control

    (3) Display

    2

    (1) ENTRY/

    EXIT gate control

    (2) Display

    1

    (1)ENTRY/

    EXIT/

    Display Subsystem

    Number and Locations of ASIC

    2

    (1) ENTRY sensor control

    (2) EXIT sensor control

    1

    (1) ENTRY/

    EXIT sensor

    1

    (1) ENTRY/

    EXIT/

    Display Subsystem Interface

    System Cost ($)

    1,430

    1,250

    1,180

    Performance

    Display response time ((s)

    13,200

    13,200

    14,020

    Gate response time ((s)

    210

    210

    1030

    MLP Execution Time (sec)

    0.602

    3.857

    14.789

  • Advantage of Clustering in MLP

    Partitioning Results for five VPMS Specifications

    with and without Clustering

    Specifications

    VPMS-A

    VPMS-B

    VPMS-C

    VPMS-D

    VPMS-E

    Number of Subsystems

    1

    2

    2

    2

    3

    Subsystems

    (1) ENTRY/

    EXIT/

    Display Subsystem

    (1) ENTRY/

    EXIT Subsystem

    (2) Display Subsystem

    (1) ENTRY/

    Display Subsystem

    (2) EXIT Subsystem

    (1) ENTRY Subsystem

    (2) EXIT/

    Display Subsystem

    (1) ENTRY Subsystem

    (2) EXIT Subsystem

    (3) Display Subsystem

    Partitioning Results

    Number and locations of PE

    1

    (1) Motor Driver/

    Counter

    2

    (1) Motor Driver

    (2) Counter

    2

    (1) ENTRY Motor Driver/

    Counter

    (2) EXIT Motor Driver

    2

    (1) ENTRY Motor Driver

    (2) EXIT Motor Driver/

    Counter

    3

    (1) ENTRY Motor Driver

    (2) EXIT Motor Driver

    (3) Counter

    Number and locations of ASIC

    1

    (1) Sensor Driver

    1

    (1) Sensor Driver

    2

    (1) ENTRY Sensor

    (2) EXIT Sensor

    2

    (1) ENTRY Sensor

    (2) EXIT Sensor

    2

    (1) ENTRY Sensor

    (2) EXIT Sensor

    System Cost ($)

    1,180

    1,250

    1,340

    1,340

    1,430

    Perfor-mance

    Display response time ((s)

    14,020

    13,200

    13,100

    13,100

    13,200

    Gate response time ((s)

    1,030

    210

    110

    110

    110