ctos coding tips

Upload: d-jah

Post on 19-Oct-2015

36 views

Category:

Documents


0 download

DESCRIPTION

From Cadence

TRANSCRIPT

  • #

    Jan 31, 2011

    Coding Tips for High Quality of Results

    Module 6

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-2

    Module Objective

    Your objective:

    To code your design for optimal Quality of Results

    Topics:

    Hardcoding compiler optimizations

    Controlling expression size and dynamics

    Facilitating scheduler optimizations

    Current C-to-Silicon known problems and solutions

    Miscellaneous issues

  • #

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-3

    General Compiler Optimizations

    Compilers automatically do some of these optimizations:

    Move loop-invariant code out of loop statements

    Reduce operation strength

    You can guarantee these optimizations by coding them yourself!

    The tool may

    perform such

    optimizations

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-4

    Move Invariant Expressions out of the Loop

    Move loop-invariant calculations to before or after the loop.

    Schedules unnecessary operations.

    for (i=0; i b ? a : b; c[i] = max * b;

    }

    max = a > b ? a : b; for (i=0; i

  • #

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-5

    Reduce Operation Strength

    Convert multiplication and division operations to shift operations to extent practical.

    Synthesis infers at least 6-bit ops. Operation strength reduced.

    Reduce strength

    a = b * 48; c = b / 48;

    a = (b > 4) / 3;

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-6

    Controlling Expression Size and Dynamics

    Synthesis tools automatically do some of these optimizations:

    Explicitly specify constant expressions

    Explicitly size state variables

    Explicitly size expressions

    Control variable dynamics

    Pad array inner dimensions to powers of 2

    You can guarantee these optimizations by coding them yourself!

    The tool may

    perform such

    optimizations

  • #

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-7

    Explicitly Specify Constants

    Synthesis cannot always statically determine your design intent.

    Explicitly declare constants to clarify your design intent

    This code infers a barrel shifter. This code infers a constant shift (wires).

    sc_int a, b;

    sc_int c;

    c = 10; ...

    a = b >> c;

    sc_int a, b;

    const sc_int c = 10; ...

    ...

    a = b >> c;

    Use a constant

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-8

    Explicitly Size State Variables

    Synthesis cannot always statically determine your design intent.

    Explicitly size state variables to clarify your design intent

    Synthesis infers 32-bit counter. Synthesis infers 5-bit counter.

    int counter = 0; ...

    counter++;

    if (counter == 25)

    counter = 0;

    sc_uint counter = 0; ...

    counter++;

    if (counter == 25)

    counter = 0;

    Size the variable

  • #

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-9

    Explicitly Size Expressions

    Synthesis cannot always statically determine your design intent.

    Explicitly size expressions to clarify your design intent

    Synthesis infers 64-bit comparator. Synthesis infers 4-bit comparator.

    Explicitly size expressions only when needed and be very

    careful to not induce errors!

    sc_uint a, b;

    ...

    if ((a-1) > b)

    ...

    sc_uint a, b;

    ...

    if ((a-sc_uint(1)) > b)

    ...

    Size the expression

    Synthesis assumes maximum width i.e. long long (1LL)

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-10

    Control Variable Dynamics

    Synthesis cannot always statically determine your design intent.

    Explicitly control variable dynamics to clarify your design intent

    32-bit variable shift. 16-bit variable shift.

    sc_in valid_in;

    sc_in word_in;

    ...

    unsigned shift(unsigned data)

    {

    while (!valid_in) wait();

    sc_uint word = word_in;

    return data

  • #

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-11

    Pad Array Inner Dimensions to Powers of 2

    Simplify address calculation concatenate instead of multiply and add.

    If mapped to registers, unused registers are removed

    If mapped to RAM, unused RAM may remain

    Multiply and add: i*9+j Concatenate: { i[1:0], j[3:0] }

    int A[3][9]; ...

    for (int i=0; i

  • #

    01/31/2011 6-13 SystemC Synthesis using C-to-Silicon Compiler

    Code an Optimal Control Flow

    Eliminate code not reachable in the target operating environment

    Caused by input value constraints synthesis does not know about

    Simplify and compact consecutive or nested if conditions to reduce the

    number of multiplexors

    Rewrite a cascaded if else if statement (priority implementation) as a switch statement (parallel implementation) where applicable

    if (cond) do_this();

    if (!cond) do_that();

    if (cond)

    do_this();

    else do_that();

    if (value==0) do_this(); else

    if (value==1) do_that(); else

    do_other();

    switch (value) { case 0: do_this(); break;

    case 1: do_that(); break;

    default: do_other(); break;

    }

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-14

    Provide Realistic Timing Constraints

    Do not overconstrain the clock!

    An overconstrained clock unnecessarily increases area and timing

    Can prevent resource sharing that otherwise would occur

    Can prevent operator rescheduling to a less-utilized pipeline state

    If the operator delay exceeds the clock cycle

    The tool will not move the operator

    Potentially leaving it bundled with other ops

    clock constraint +

    clock

    realized constrain

    latency

    and ops,

    not clock

  • #

    01/31/2011 6-15 SystemC Synthesis using C-to-Silicon Compiler

    Fully Describe a Datapath in One Thread

    The scheduler cannot share resources between threads (or by extension, modules).

    Group datapath operations into as few threads as practical

    (cannot always group operations executing at different throughputs)

    Operations that can be grouped. Operations grouped into one thread.

    my_module::proc1() {

    wait();

    for (;;;) {

    if (cond)

    ya = a1 + a2;

    wait();

    }

    }

    my_module::proc2() {

    wait();

    for (;;;) {

    if (!cond)

    yb = b1 + b2;

    wait();

    }

    }

    my_module::proc() {

    wait();

    for (;;;) {

    if (cond)

    ya = a1 + a2;

    else

    yb = b1 + b2;

    wait();

    }

    }

    Reduce number of

    threads

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-16

    Pass Function Arguments by Value

    Pass-by-Pointer Pass-by-Reference Pass-by-Value

    int func(int *in,

    int *out);

    int func(int &in,

    int &out);

    int func(int in);

    Accepted Better Best

    May produce

    inoptimal timing

    May produce

    inoptimal area

    Most aggressive

    optimization

  • #

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-17

    Move Local Write/Read Arrays to Module Body

    Synthesis initializes non-const function-local variables i.a.w. C++ semantics

    Synthesis must schedule initialization of non-const local arrays mapped to RAM

    Local array mapped to RAM. Member array mapped to RAM.

    SC_MODULE (my_module) {

    ...

    private:

    ...

    };

    void my_module::foo() {

    int array[100]={}; ...

    }

    SC_MODULE (my_module) {

    ...

    private:

    int array[100]={}; };

    void my_module::foo() {

    ...

    ...

    } Initialized

    Not initialized

    Make array

    member

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-18

    Separate I/O and Computation to Facilitate Scheduling

    Synthesis must schedule I/O operations in the cycle where coded.

    Separate I/O and computation to allow scheduling flexibility

    I/O and computation in one cycle. Flexible scheduling.

    while (true) {

    ...

    wait();

    ...

    result.write( subtract.read()

    ? a - b

    : a + b ); }

    while (true) {

    bool sub = subtract.read(); wait();

    ...

    result.write( sub

    ? a - b

    : a + b );

    }

    Generally separate

    out I/O ops

  • #

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-19

    Combine I/O and Computation to Facilitate Sharing

    Combining I/O and computation in one cycle can reduce resources.

    Opcode registers are not shared.

    ALU is shared.

    Opcode register is shared (muxed).

    ALU is shared.

    while (true) {

    op1 = opcode1.read();

    op2 = opcode2.read();

    ...

    opN = opcodeN.read();

    result1 = ALU(data,op1);

    result2 = ALU(data,op2);

    ...

    resultN = ALU(data,opN);

    wait(N);

    }

    while (true) {

    op1 = opcode1.read();

    result1 = ALU(data,op1);

    wait();

    op2 = opcode2.read();

    result2 = ALU(data,op2);

    wait();

    ...

    opN = opcodeN.read();

    resultN = ALU(data,opN);

    wait();

    }

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-20

    Forcing Signal Semantics Suppresses Register Sharing

    Prohibiting resource sharing can improve timing by removing multiplexors.

    Not recommended style but sometimes can be useful

    Assume each ALU operation fully utilizes the clock cycle

    Register shared between cycles. Registers not shared between cycles.

    int result1;

    int result2;

    };

    int module::func(int data_in) {

    result1=ALU(data_in,opcode1);

    wait();

    result2=ALU(result1,opcode2);

    wait();

    return ALU(result2,opcode3);

    }

    sc_signal result1;

    sc_signal result2;

    };

    int module::func(int data_in) {

    result1=ALU(data_in,opcode1);

    wait();

    result2=ALU(result1,opcode2);

    wait();

    return ALU(result2,opcode3);

    }

  • #

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-21

    Known Problems and Solutions

    Tips and current limitations specific to the C-to-Silicon Compiler:

    Declare large classes as SystemC modules

    Limit each pointer to maximum of 16 objects

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-22

    Declare Large Classes as SystemC Modules

    The C-to-Silicon Compiler handles modules more efficiently than arbitrary classes.

    Converting arbitrary classes to modules may solve a capacity problem.

    Potential capacity problem. May resolve capacity problem.

    class mpeg_decoder { // A really big class

    ...

    };

    SC_MODULE(my_module) {

    ...

    private:

    mpeg_decoder my_decoder;

    };

    SC_MODULE (mpeg_decoder) { // A really big class

    ...

    };

    SC_MODULE(my_module) {

    ...

    SC_CTOR(my_module)

    : my_decoder("my_decoder")

    {...}

    private:

    mpeg_decoder my_decoder;

    };

    Make it a module

  • #

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-23

    Limit Each Pointer to Maximum of 16 Objects

    The C-to-Silicon Compiler tracks up to 16 objects that a pointer can point to.

    You can assign any number of addresses of the up to 16 objects.

    Cannot use 1 pointer for 18 objects. Use 1 pointer for maximum 16 objects.

    SC_MODULE(...) {

    ...

    private:

    int buf00[32];

    int buf01[32];

    ...

    int buf19[32];

    ...

    int *ptr00to19; };

    SC_MODULE(...) {

    ...

    private:

    int buf00[32];

    int buf01[32];

    ...

    int buf19[32];

    ...

    int *ptr00to15; int *ptr16to19;

    };

    Maximum of 16

    objects

    01/31/2011 6-26 SystemC Synthesis using C-to-Silicon Compiler

    Coding for High QoR Quiz

    1. Explain how explicitly sizing expressions might cause problems.

    2. Suggest a reason why synthesis might not be able to remove code

    representing functionality that your device will never use.

  • #

    01/31/2011 SystemC Synthesis using C-to-Silicon Compiler 6-27

    01/31/2011 6-28 SystemC Synthesis using C-to-Silicon Compiler

    Coding for High QoR Quiz Solution

    1. Explain how explicitly sizing expressions might cause problems.

    While explicitly sizing expressions, you can very easily inadvertently lose the more significant result bits for operations such as addition

    and multiplication.

    2. Suggest a reason why synthesis might not be able to remove code

    representing functionality that your device will never use.

    Synthesis cannot be aware of how the target environment might restrict the value ranges of data inputs and combinations of control

    inputs, thus sometimes cannot strip design functionality that will

    never be used.