coen 691b: embedded system design lecture 3:...

34
Samar Abdi (slides courtesy of A. Gerstlauer, D. Gajski and R. Doemer) Assistant Professor Electrical and Computer Engineering Concordia University http://www.ece.concordia.ca/~samar COEN 691B: Embedded System Design Lecture 3: Computation Modeling

Upload: lamtuyen

Post on 25-May-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Samar Abdi (slides courtesy of A. Gerstlauer, D. Gajski and R. Doemer)

Assistant Professor

Electrical and Computer Engineering

Concordia University

http://www.ece.concordia.ca/~samar

COEN 691B: Embedded System Design

Lecture 3: Computation Modeling

2

System Design Flow

Computation

Co

mm

un

icat

ion

A B

C

D F

Un- timed

Approximate- timed

Cycle- timed

Un- timed

Approximate- timed

A. System specification model B. Timed functional model C. Transaction-level model (TLM) D. Bus cycle-accurate model (BCAM) E. Computation cycle-accurate model (CCAM) F. Cycle-accurate model (CAM)

E

Cycle- timed

• Abstraction based on level of detail & granularity – Computation and communication

System design flow Path from model A to model F

Design methodology and modeling flow

Set of models and transformations between models

Source: L. Cai, D. Gajski. “Transaction level modeling: An overview”, ISSS 2003

COEN 691B: Embedded System Design

COEN 691B: Embedded System Design 3

Lecture 3: Outline

• Profiling

• Timing estimation using processor model

• RTOS modeling

• Hardware abstraction layer modeling

4

Profiling

• Input specification MoC – Hierarchy

– Computation & communication

• Multi-dimensional analysis – Multi-entities

• Behavior, channel, port, variable

– Multi-metrics • Operation, traffic, storage

• Static, dynamic

– Multi-levels • Application, transaction, bus-

functional

v c B1 B2

B

Profiling

Profiled App.

Simulation

Instr. Appl

Static Analysis

Counters

Instrumentation

Application

COEN 691B: Embedded System Design

5

Profiling

• Instrumentation-based profiling – Bb: The execution counts of basic

block b • Enumerate execution paths

– Cb,i,d: No. of computed characteristics for item type i and data type d in the block b

– Data type i: float, int, .. – Item type d: metric-dependent

Specification metrics Ri,d = bCb,i,d Bb

R = idRi,d

R++,int= i [ Bi * Ci,++,int ]

= 1 * 1 + 3 * 2

= 7

B1 = 1

B3 = 3

C1,++,int = 1

C3,++,int = 2

Source: L. Cai, A. Gerstlauer, D. Gajski, “Retargetable Profiling for Rapid, Early System-Level Design Space Exploration,“ DAC, 2004.

int b,c;

if( a = 0){

b++;

}

else{

b++;

c++;

}

COEN 691B: Embedded System Design

6

Retargeting

• Target machine model – Wi,d : weights of

components which the entity mapped to

• Manual • Simulation • Complex cost function/

algorithm

Implementation estimates E = id(Ri,d * Wi,d) Time complexity: O(n)

v c B1 B2

B

PE1 PE2

Mem

R(B1)++,int= 7

W(PE1)++,int= 1

E(B1,PE1)++,int= 7 x 1 = 7

Source: L. Cai, A. Gerstlauer, D. Gajski, “Retargetable Profiling for Rapid, Early System-Level Design Space Exploration,“ DAC, 2004.

COEN 691B: Embedded System Design

7

Computational complexity of top-level Vocoder behaviors:

LP_Analysis Open_Loop Closed_Loop Codebook Update

377.0 MOp 337.1MOp 478.7 MOp 646.4 MOp 43.6 MOp

Codebook operation mix: (x, int) (+, int) (-, int) (/,int) (others,int)

46.2% 33.5% 9.1% 7.1% 4.1%

HW acceleration

Floating –point not required Dedicated hardware multipliers

Vocoder Example Profiling

COEN 691B: Embedded System Design

8

Mapping of 8 top-level encoder behaviors onto ColdFire + DSP + HW 85:04h for 6561 alternatives (1.7s simulation + 3s refinement each) 100% fidelity

HW (144.1, 12.24 ms)

SW (20.0, 30.73 ms)

10 ms

15 ms

20 ms

25 ms

30 ms

35 ms

10 30 50 70 90 110 130 150 170

Cost

Tra

ns

co

din

g d

ela

y

Vocoder Design Space Exploration

Timing constraint

COEN 691B: Embedded System Design

COEN 691B: Embedded System Design 9

General Processor Micro-Architecture • Basic computation component is a processor (PE)

– Programmable, general-purpose software processor (CPU) – Programmable special-purpose processor (e.g. DSPs) – Application-specific instruction set processor (ASIP) – Custom hardware processor

Functionality and timing

PE

Controller Datapath

Bus interface CLK

Control signals

Status lines ∆t

COEN 691B: Embedded System Design 10

Processor Models (1) • Structural RTL models

Sub-cycle accurate

HW

Controller

State

Next state logic

Output logic

Datapath

Register file

Memory

Bus interface CLK

FU1

CPU

Controller Datapath

Register file

Memory (data & progr.)

Load/store unit CLK

ALU

IR

PC

Decode

Fetch

Software processor Hardware processor

COEN 691B: Embedded System Design 11

Processor Models (2) • Behavioral RTL/IS models

Cycle accurate

HW

HW_CLK

CPU

CPU_CLK

HAL

ISS

RTOS

App.

Instruction set simulation (ISS) FSMD

Bin

ary

COEN 691B: Embedded System Design 12

Computation Modeling • Application modeling

– Native process execution (C code) – Back-annotated execution timing

• Processor modeling

– Operating system • Real-time multi-tasking (RTOS model) • Bus drivers (C code)

– Hardware abstraction layer (HAL) • Interrupt handlers • Media accesses

– Processor hardware • Bus interfaces (I/O state machines) • Interrupt suspension and timing

P1 P2

OS

CP

U

Drv

Interrupts

Bus

ISR HAL

Process B1()

{

waitfor(15000);

waitfor(25000);

};

COEN 691B: Embedded System Design 13

• High-level, abstract programming model – Hierarchical process graph

• ANSI C leaf processes • Parallel-serial composition

– Abstract, typed inter-process communication

• Channels • Shared variables

Timed simulation of application functionality – Back-annotate timing

• Estimation or measurement (trace, ISS)

• Function or basic block level granularity

– Execute natively on simulation host

• Discrete event simulator • Fast, native compiled simulation

Application Layer

Logical time

5 10 0

CPU

B2 C1

B1

B3C2

…p1

.c

...

void f() {

waitfor(5);

...

}

...

Timing Estimation Input: Application Model

v1

C1

P1 P2

P3 P4

C2

14

• Application model consists of • Processes for computation (eg. P1, P2, P3, P4) • Channels for communication (eg. C1 between P1 and P3) • Variables for storage (eg. v1)

14 COEN 691B: Embedded System Design

Application Model Objects

• Processes

– Symbolic representation of computation – Contain C/C++ code imported from reference

• Process ports

– Symbolic representation of communication services required by processes

– Provide object orientation by allowing processes to connect to different channels

• Channels – Symbolic representation of inter-process

communication – Implement communication services such as

blocking, non-blocking, handshake, FIFO etc. – Encapsulation for communication functions

• Variables

– Symbolic representation of data storage

15

v1

C1

P1 P2

P3 P4

C2

15 COEN 691B: Embedded System Design

Timing Estimation Input: Platform Architecture

TX

CPU1 Mem

HW CPU2

Arb

ite

r

Bus1 Bus2

OS2

OS1

16

• Platform consists of • Hardware: PEs (eg. CPU1, HW), Buses (eg. Bus1), Memories (eg. Mem),

Interfaces (eg. Transducer) • Software: Operating systems (eg. OS1) on SW PEs

16 COEN 691B: Embedded System Design

Platform Objects

• Processing element (PE) – Symbolic representation of computation resources – Different types such as SW processors, HW IPs etc.

• Bus – Symbolic representation of communication media – Types include shared, point-to-point, link, crossbar etc.

• Memory – Symbolic representation of physical storage – May contain shared variables or SW program/data

• Transducer – For protocol conversion and store-forward routing – Necessary for PEs with different bus protocols

• Operating system (OS) – Software platform for individual PEs – Needed for scheduling multiple processes on a PE

17

TX

CPU1 Mem

HW CPU2

Arb

ite

r

Bus1 Bus2

OS2

OS1

17 COEN 691B: Embedded System Design

Timing Estimation Input: Mapping

TX

v1 C

1

P1 P2

CPU1 Mem

HW IP

P3

CPU2

P4

C2

Arb

ite

r

Bus1 Bus2

OS

OS

18

• Processes PEs • Channels Routes • Variables Memories

18 COEN 691B: Embedded System Design

Mapping Rules

• Processes to PEs – Each process in the application must be mapped to a PE – Multiple processes may be mapped to SW PE with OS support – Example: P1, P2 CPU1

• Channels to Routes – All channels between processes mapped to different PEs are mapped to

routes in the platform – Route consists of bus segments and interfaces – Channel on each bus segment is assigned a unique address

• Variables to Memories – Variables accessed by processes mapped to different PEs are mapped to

shared memories – All variables are assigned an address range depending on size

19 19 COEN 691B: Embedded System Design

Computation Timing Estimation

• Stochastic memory delay model

• DFG scheduling to compute basic block delay [DATE 08]

• RTOS model added for PEs with multiple processes

Timing Estimation

Timed Process

Processor Model

const

status

RF

OR

ALUAR

MemDR

offset

CMem

CW

PC

AG P

bL

Sum

Add

aL

Mul

wait(t1)

BB1

If

If Y N

Y N

BB2 BB3

wait(t2) wait(t3)

Process CDFG

BB1

If

If Y N

Y N

BB2 BB3

20 COEN 691B: Embedded System Design

Stochastic Memory Delay Model

Mem. Overhead= 4.1 Branch Delay= 1.2

• Assumption – Cache and branch prediction hit rate available in data model

• Delay Estimation – Operation access overhead = Nop * ((1.0 – HRi) * (CD + Lmem))

– Data access overhead = Nld * ((1.0 – HRd) * (CD + Lmem))

– Branch prediction miss penalty = MPrate * Penalty

Cache

D-Mapped

16K

Icache: 97.79%

Dcache: 69.96%

Delay : 1

Memory

Delay: 8

BrPredictPolicy: Taken

Penalty : 260.00%

Memory/Branch Model

Mem./Br. Delay Calcutation

1: a = $i - 1 2: t1 = a + 2 3: t2 = $n * $m 4: t3 = t1 - t2 5: load b 6: t4 = b / 10 7: jmp

LLVM Bytecode

21 COEN 691B: Embedded System Design

Pipeline Scheduling

1: a = $i - 1 2: t1 = a + 2 3: t2 = $n * $m 4: t3 = t1 - t2 5: load b 6: t4 = b / 10 7: jmp 8: wait 47*CT

• Assumptions – In-order, single issue processor – Optimistic during scheduling (100% cache hit)

Operations Datapath

Processor Data Model

Add

IF

ID

EX: int-ALU IntAdd

Sub

IF

ID

EX: int-ALU IntSub

Int-ALU

Qty: 1

IntAdd IntSubLat: 1 Lat: 1

Processor Timing Estimation

LLVM Bytecode

Operation delay= 42

Total BB delay= Op.+Mem.+Br. = 47.3 cycles

22 COEN 691B: Embedded System Design

Output: SystemC Timed Model

Bus1

P1 P2

OS

CP

U1

Mem

CPU2

P3

HW IP

Bus2

TX

Model Generation Technique • Application code sc_thread • Processing element sc_module • OS Model sc_module • Bus sc_channel • Memory Array inside sc_module • Interface FIFO channel+sc_process

P4

OS

23 23 COEN 691B: Embedded System Design

COEN 691B: Embedded System Design 24

Operating System Layer • Scheduling

– Group processes into tasks • Static scheduling

– Schedule tasks • Dynamic scheduling, multitasking • Preemption, interrupt handling • Task communication (IPC)

Scheduling refinement – Flatten hierarchy – Reorder behaviors

OS refinement – Insert OS model – Task refinement – IPC refinement

OSApp

Task

P2

C1

P1

Task

P3C2

OS Model

Application

SLDL

Task Scheduler

P1 P2

COEN 691B: Embedded System Design 25

OS Modeling • High-level RTOS abstraction

– Specification is fast but inaccurate • Native execution, concurrency model

– Traditional ISS-based validation infeasible • Accurate but slow (esp. in multi-processor context), requires full binary

Model of operating system High accuracy but small overhead at early stages Focus on key effects, abstract unnecessary implementation details Model all concepts: Multi-tasking, scheduling, preemption, interrupts,

IPC

Specification TLM Implementation

Source: A. Gerstlauer, H. Yu, D. Gajski. "RTOS Modeling for System-Level Design," DATE03.

Application

SLDL

Channels

RTOS

Model

T1 T2Application

SLDL

Channels

T1 T2

RTOS

Application

SLDL

Comm. & Sync. API

Instruction Set Simulator

COEN 691B: Embedded System Design 26

Simulated Dynamic Behavior

C1

c1.recv()

c1.send()

Bu

s

bus.recv()

P2 P3

S1

Logical time

t0

t1

t2

t3

t5

t8

t6

t4

t7

Unscheduled

t0

t1

t2

t3

t4

t5

t6

t7

t8

Inaccuracy due to timing granularity

waitfor() waitfor()

waitfor()

waitfor() waitfor()

waitfor()

ISR

P1

waitfor()

Scheduled

C1

c1.recv()

c1.send()

Bu

s

bus.recv()

Task P2 Task P3

S1

time_wait(

)

time_wait(

)

time_wait(

)

ISR

time_wait(

)

time_wait(

)

time_wait(

)

time_wait(

)

P1

COEN 691B: Embedded System Design 27

RTOS Model Implementation • RTOS model

– OS, task, event management • Descriptors & queues

– Scheduling • Select and dispatch task based on

algorithm

• Block all but active task on SystemC level

– Preemption • Allow rescheduling at simulation time

increases

– Event handling • Remove task temporarily from OS while

waiting for SystemC event

RTOS model library

– RTOS models for different scheduling strategies

• Round robin, priority based

– Parametrizable • Task parameters (priorities)

channel OS implements OSAPI {

Task current = 0;

os_queue rdyq;

void dispatch(void) {

current = schedule();

notify(current.event);

}

void yield() {

task = current;

dispatch();

wait(task.event);

}

void time_wait(time t) {

waitfor(t);

yield();

}

Task pre_wait(void) {

Task t = rdyq.get(current);

dispatch(); return t;

}

void post_wait(Task t) {

rdyq.put(t);

wait(t.event);

}

};

1

5

10

15

20

25

schedule();

COEN 691B: Embedded System Design 28

RTOS Model Interface

interface OSAPI

{

void init();

void start(int sched_alg);

void interrupt_return();

Task task_create(char *name, int type,

sim_time period);

void task_terminate();

void task_sleep();

void task_activate(Task t);

void task_endcycle();

void task_kill(Task t);

Task par_start();

void par_end(Task t);

Task pre_wait();

void post_wait(Task t);

void time_wait(sim_time nsec);

};

1

5

10

15

20

Task management

OS management

Event handling

Delay modeling

• Canonical, target-independent API

Back

COEN 691B: Embedded System Design 29

Task Refinement process task_B2(OSAPI os) {

void main(void) {

...

/* model execution delay */

waitfor(BLOCK1_DELAY);

...

send();

/* model execution delay */

waitfor(BLOCK2_DELAY);

...

}

void send() {

wait(ack);

}

};

1

5

10

15

20

25

os.task_terminate(h)

;

Convert processes into tasks Task initialization

– Register task with OS model

Task activation – Wait for task start trigger from OS

Replace delay model – Trigger rescheduling in OS Preemption points

Communication and synchronization

– Wrap around SLDL event handling

os.time_wait(BLOCK1_DELAY)

;

os.time_wait(BLOCK2_DELAY)

;

Task h;

void task_B2(void) {

h = os.task_create(“B2”,

APERIODIC, 0);

}

os.task_activate(h);

t = os.pre_wait();

os.post_wait(t);

Back

COEN 691B: Embedded System Design 30

Operating System Layer OS model

– On top of standard SystemC

– Wrap around SystemC primitives, replace event handling

• Block all but active task • Select and dispatch tasks

– Target-independent, canonical API

• Task management • Channel communication • Timing and all events

OSApp

Task

P2

C1

P1

Task

P3C2

OS Model

Application

SLDL

OS Model

Task P2 Task P3

COEN 691B: Embedded System Design 31

Hardware Abstraction Layer (HAL)

• External communication – Software Drivers

• Presentation, session, network communication layers

• Synchronization (interrupts)

– Hardware/software boundary

• Low-level HW access • Bus drivers and interrupt

handlers • Canonical HW/SW

interface

– External interface • Bus transactions (TLM) • Interrupt trigger

HALOSApp

Task

P2

C1

P1

Task

P3C2

OS Model

INTA INTB INTC

UsrInt2UsrInt1

Drive

rD

rive

r

INTD

Bus

TLM

sample.send(v1);

void send(…) {

intr.receive();

bus.masterWrite(0xA000,

&tmp,

len);

}

Ap

p.

Dri

ver

COEN 691B: Embedded System Design 32

Hardware Layer

• Processor TLM – HW interrupt handling

• Interrupt logic – Suspend user code

• Interrupt scheduling – Priority, nesting

– Peripherals • Interrupt controller • Timers

– TLM bus model • Bus transactions

time

TB1

IntA

t1 t2

TB2

t3 time

TB1

IntA

t1 t2

TB2

t3

HAL: Hardware:

HWHALOSApp

Task

P2

C1

P1

Task

P3C2

OS Model

Acce

ss

HW

Int

IntA IntB IntC

UsrInt2UsrInt1

Drive

rD

rive

r

IntD

Bus

TLM

INTAINTBINTCINTD

COEN 691B: Embedded System Design 33

Hardware Layer

• Bus-functional model (BFM)

– Pin-accurate processor model

• Timing-accurate bus and interrupt protocols

– Bus model

• Pin- and cycle-accurate

• Driving and sampling of bus wires

GRANT

CNTRL

ADDR

WDATA

READY

0x27000000

REQ

nonseq.

word

0xA000 0000

0x2F00 9801

HWHALOSApp

Task

P2

C1

P1

Task

P3C2

OS Model

Acce

ss

HW

Int

IntA IntB IntC

UsrInt2UsrInt1

Drive

rD

rive

r

IntD

Pro

t

INTAINTBINTCINTD

COEN 691B: Embedded System Design 34

Features

Target approx. computation timing Appl.

Processor Model OS

App

Task

P2

C1

P1

Task

P3C2

OS Model

App

Task

P2

C1

P1

Task

P3C2

HALOSApp

Task

P2

C1

P1

Task

P3C2

OS Model

Acce

ss

UsrIntr2UsrIntr1

Drive

rD

rive

r

IntB IntC IntDIntA

HWHALOSApp

Task

P2

C1

P1

Task

P3C2

OS Model

Acce

ss

HW

Int.

UsrIntr2UsrIntr1

Drive

rD

rive

r

Bus

TLM

INTAINTBINTC

INTD

intB intC intDintA

OS

Features

Target approx. computation timing

Task mapping, dynamic scheduling

Task communication, synchronization

Appl. OS HA

L

Features

Target approx. computation timing

Task mapping, dynamic scheduling

Task communication, synchronization

Interrupt handlers, low level SW drivers

Appl. OS HA

L

HW

-TL

M

HW

-BF

M

Features

Target approx. computation timing

Task mapping, dynamic scheduling

Task communication, synchronization

Interrupt handlers, low level SW drivers

HW interrupt handling, int. scheduling

Cycle accurate communication

Appl. OS HA

L

HW

-TL

M

HW

-BF

M

BF

M - IS

S

Features

Target approx. computation timing

Task mapping, dynamic scheduling

Task communication, synchronization

Interrupt handlers, low level SW drivers

HW interrupt handling, int. scheduling

Cycle accurate communication

Cycle accurate computation

Appl.

• Processor layers – Application

• Native, host-compiled C

• Annotated timing

– OS • OS model • Middleware,

drivers

– HAL • Firmware

– Processor hardware

• Bus interfaces • Interrupts

handling & suspension Source: G. Schirner, A. Gerstlauer, R. Doemer. “Fast and Accurate Processor Models for Efficient MPSoC Design," TODAES, 2009.