optimizing systemc for higher speed and coverage

30
Optimizing SystemC for higher speed and coverage Dogan Fennibay

Upload: caraf

Post on 23-Feb-2016

76 views

Category:

Documents


0 download

DESCRIPTION

Optimizing SystemC for higher speed and coverage. Dogan Fennibay. Y?. SystemC becoming the de facto system-level design language SystemC emulates parallelism via scheduling Additional element effecting the result Makeup for this hole in coverage We want faster simulations - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optimizing SystemC for higher speed and coverage

Optimizing SystemC forhigher speed and coverage

Dogan Fennibay

Page 2: Optimizing SystemC for higher speed and coverage

Y? SystemC becoming the de facto system-level

design language SystemC emulates parallelism via scheduling

Additional element effecting the result Makeup for this hole in coverage

We want faster simulations To do more executions / cheaper executions SystemC’s flexibility adds up to slowness

Page 3: Optimizing SystemC for higher speed and coverage

Outline Automatic generation of schedulings for higher

coverage Introduction Related work Definitions Algorithms Evaluation

Scoot Introduction Related work Idea Evaluation

Conclusion

Page 4: Optimizing SystemC for higher speed and coverage

SystemC Do you know SystemC?

No Yes

Page 5: Optimizing SystemC for higher speed and coverage

Introduction

3 different schedulings => 3 different results a; b; a; te; b; a => “Ok” a; b; a; te; a; b => “Ko” b; a; te; b => deadlock (lost notification)

Include process C 30 different schedulings => same 3 different results Equivalence classes

void top::A() { wait(e); wait(20,SC_NS); if (x) cout << "Ok\n"; else cout << "Ko\n";}

void top::B() { e.notify(); x = 0; wait(20,SC_NS); x = 1;}

void top::C() { sc_time T(20,SC_NS); wait(T);}

Page 6: Optimizing SystemC for higher speed and coverage

Introduction Scheduling also effects the results

Not just input data We have to test all possible schedulings

Impossible schedulings: do not test them Due to synchronization constraints

Equivalent schedulings: test only one e.g. two reads from a shared variable

Focus is on scheduling Input data generation is not considered

Page 7: Optimizing SystemC for higher speed and coverage

Introduction Dynamic Dependency Graphvoid top::P() { wait(e); wait(20,SC_NS); if (x) cout << "Ok\n"; else cout << "Ko\n";}

void top::Q() { e.notify(); x = 0; wait(20,SC_NS); x = 1;}

green: non-permutable, red: non-commutative

Page 8: Optimizing SystemC for higher speed and coverage

Related work Formal model

Extract a formal model from SystemC model Combine with a formal model of the non-

deterministic scheduler => Model checking State space explosion

Partial order reduction Dynamic extension is new Used by model checkers, but no non-abstract uses

Test case generation & output checker Assertion based verification promising

Page 9: Optimizing SystemC for higher speed and coverage

Definitions SUTD: System Under Test + one test data

Assume: Independent test case generator Generator always independent of scheduling

Process: event or thread p, q, r, …

Transition: one execution of a process in a scheduling a, b, c, … or p1, p2, q1, r1, p3, r2, …

Scheduling String of transitions & new cycles (delta or

te) Full state

Full memory dump incl. PC of processes

p:a = x;wait(e);printf(“%d\n”, a);wait(e2);a = x * 2;

q:e.notify();b = x;wait(20, SC_NS);x = b * 2;

p1

p2

p3

q1

q2

Page 10: Optimizing SystemC for higher speed and coverage

Definitions Permutation

Modify a scheduling Change the order of a and b Other transitions may come in-between

Equivalence Two different schedulings lead to the same full

statep:

a = xq:

b = x

Page 11: Optimizing SystemC for higher speed and coverage

Definitions Permutability: a and b in a

scheduling can be exchanged An equivalent scheduling with a

& b consecutive available a and b can be exchanged in

this equivalent scheduling Commutativity: which

permutations are useful? Exchanging a and b produces an

equivalent scheduling Non-commutative permutations

are interesting

void t1() {…wait(e1);v2 = 2;}

void t2() {…v2 = 1;e1.notify();wait();}

void t3() {…printf(“%d\n”, v1);wait();}

void t4() {…printf(“%d\n”, v1 + 1);wait();}

++v1

Page 12: Optimizing SystemC for higher speed and coverage

Definitions Dependency

Boolean:permutable’ + permutable.commutative’

a must come before b, otherwise (1) is impossible or (2) a different result will be produced

Causal order: Permutable transitions wrt dependency Equivalent schedulings have the

same causal order

void t1() {…wait(e1);v2 = 2;}

void t2() {…v2 = 1;e1.notify();wait();}

void t3() {…printf(“%d\n”, v1);wait();}

void t4() {…printf(“%d\n”, ++v1);wait();}

Page 13: Optimizing SystemC for higher speed and coverage

Algorithms: Computing commutativity Shared variables

Read, then modifying write

Modifying write, then read

Write, then modifying write

Events Notification, then wait Wait, then notification Caught notification,

then notification

Non-commutative actions All other actions do not harm commutativity

Page 14: Optimizing SystemC for higher speed and coverage

Algorithms: Causal Partial Order Computed step-by-step Start with empty scheduling Choose candidates a, b; where

a or b are new cycles (delta or te) a and b from the same process b is woken up by a

Extend CPO set Add (a, b) Add non-commutative transitions of b Compute & add transitive closure of calculated

relations

Page 15: Optimizing SystemC for higher speed and coverage

Algorithms: Generating schedulings Generating one alternative scheduling

Choose two non-commutative transitions: a and b Execute the scheduling until a Execute additional transitions not causally ordered

to a Execute b, then a Execute the rest

Generating all schedulings

Page 16: Optimizing SystemC for higher speed and coverage

Evaluation: prototype

Model and kernel instrumented Checker

Get the scheduling, generate new one, feed it to patched kernel

Until no more schedulings available

Page 17: Optimizing SystemC for higher speed and coverage

Evaluation: experiments Is the overhead of

calculating schedulings worth it? V. T vs G.T + O

3 examples Indexer

Small, V calculable MPEG Decoder

50 KLOC, 4 processes Full SoC

250 KLOC, 57 processes

Indexer 128 element array for

hash table n components, each

with 2 threads, each write 4 elements

G << V n = 2, V = 3.35e11;

n = 3, V = 2.43e25

Page 18: Optimizing SystemC for higher speed and coverage

Evaluation: experiments MPEG decoder

Overhead is insignificant G.T = 50 s, O = 18 s

Special structures in code not recognized Persistent events

Complete SoC Scalability Not tested fully

because of manual instrumentation

Expectation: ok up to 200 transitions

Observation: more detailed models produce more constrained schedulings => longer schedulings testable

Page 19: Optimizing SystemC for higher speed and coverage

Scoot Helmstetter et al explore all schedulings

To much time spent Let’s go in the opposite direction

Make SystemC less flexible to get it faster Blanc et al

Page 20: Optimizing SystemC for higher speed and coverage

Introduction Faster execution (up to 5 times!) Use verification back-ends

CBMC, SATABS => Get a plain C++ model from SystemC => Use C++ frontend to support more

language constructs

Page 21: Optimizing SystemC for higher speed and coverage

Related work Work on HW synthesis via model extraction

Kostaras & Vergos and Castillo et al Only for small subset of C++

Savoiu et al Speedup via Petri-net reductions Only 1.5 times

Pérez et al Static scheduling Only event processes considered

Page 22: Optimizing SystemC for higher speed and coverage

Idea SystemC is very

flexible Dynamic run-time

binding of ports Via polymorphism

Sensitivity lists Module hierarchy => Consolidate

hierarchy

Scheduler’s inefficiencies Run-time memory

allocations Processes triggered

via function pointers => Convert to static

schedule

Page 23: Optimizing SystemC for higher speed and coverage

Evaluation AES encryption/decryption Speedup achieved up to 5.3 times

Page 24: Optimizing SystemC for higher speed and coverage

Conclusion Helmstetter et al

Eliminated the effect of scheduling

At a reasonable overhead

Problems at scability

Scoot Significant speedup

achieved Most structures of C+

+ supported Preparation for model

checking Further discussion

Why equivalence classes among schedulings? Shouldn’t all schedulings produce the same result?

Why not use Helmstetter’s algorithm for regular software to catch races?

Uses for Scoot?

Page 25: Optimizing SystemC for higher speed and coverage

Extras

Page 26: Optimizing SystemC for higher speed and coverage

SystemC Primer A system-level

design language Used for HW/SW

codesign Based on native C++ Different abstraction

levels: TLM to RTL

SC_MODULE(nand2) {sc_in<bool> A, B;sc_out<bool> F;

void do_nand2() // a C++ function

{F.write( !(A.read() &&

B.read()));}

SC_CTOR(nand2){

SC_METHOD(do_nand2); sensitive << A << B;

}};

Page 27: Optimizing SystemC for higher speed and coverage

SystemC Primer: Concepts Modules

Containers for other SystemC elements incl. modules

Channels Communication means

of modules Ports

Connection point of modules to channels

Interfaces Connection point of

channels to modules

Method processes Non-blocking code parts

triggered on events they’re sensitive to

Thread processes Independent flows of

executions May call wait

Events Basic means of

synchronization Shared variables

Same as C++

Page 28: Optimizing SystemC for higher speed and coverage

SystemC Primer: Scheduler Non-deterministic

specification: unspecified order

Non-preemptive Delta cycles used to

imitate concurrency True parallelism on

the real system

Page 29: Optimizing SystemC for higher speed and coverage

Properties of relations Reflexivity:

aRa Symmetry

aRb => bRa e.g. “is equal to”

Totality aRb or bRa

Transitivity aRb and bRc => aRc e.g. “is ancestor of”

Transitive closure e.g. all ancestors in a community

Page 30: Optimizing SystemC for higher speed and coverage

ReferencesBlanc, N., Kroening, D. and Sharygina, N., 2008,

“Scoot: A Tool for the Analysis of SystemC Models”, TACAS, 2008, 467-470.

Helmstetter, C., Maraninchi, F., Maillet-Contoz, L. and Moy, M., 2006, “Automatic Generation of Schedulings for Improving the Test Coverage of Systems-on-a-Chip”, Verimag Research Report, TR-2006-06.

Helmstetter, C., Maraninchi, F. and Maillet-Contoz, L., 2007, “Test Coverage for Loose Timing Annotations”, Formal Methods: Applications and Technology, 4346/2007, 100-115.