synthesis of custom processors based on extensible platforms

Download Synthesis of Custom Processors based on Extensible Platforms

If you can't read please download the document

Upload: velma

Post on 08-Jan-2016

40 views

Category:

Documents


1 download

DESCRIPTION

Synthesis of Custom Processors based on Extensible Platforms. Fei Sun + , Srivaths Ravi ++ , Anand Raghunathan ++ and Niraj K. Jha + + : Dept. of Electrical Engineering Princeton University ++ : NEC Laboratories America, Inc. Outline. SoC design constraints Background - PowerPoint PPT Presentation

TRANSCRIPT

  • Synthesis of Custom Processors based on Extensible PlatformsFei Sun+, Srivaths Ravi++, Anand Raghunathan++ and Niraj K. Jha++: Dept. of Electrical EngineeringPrinceton University++: NEC Laboratories America, Inc.

  • OutlineSoC design constraintsBackgroundPrevious work in ASIP designXtensa platformManual custom instruction generation procedureAutomatic custom instruction generation flowExperimental resultsConclusions

  • SoC Design ConstraintsTime to marketCostPerformancePowerCost-performance trade-offFlexibility

  • Comparison of Different ApproachesASICASIPGPPTime to market -- + ++ Cost ++ + -- Performance ++ + -- Power ++ + -- Cost-performance ++ + -- Flexibility -- + ++ ++ Very good + Good -- Very bad

  • Flexibility vs. Energy Efficiency

  • Previous Work in ASIP DesignASIP architectures and overall design methodologies[Huang, 1994], [Adams, 1996], [Fisher, 1999], [Kucukcakar, 1999]Application-specific instruction set selection[Choi, 1999], [Gschwind, 1999], [Arnold, 1999] Low power ASIP design[Kalambur, 1997], [Dougherty, 1999], [Ishihara, 2000], [Sami, 2001]Commercial offeringsXtensa, ARCtangent, Jazz, SP-5flex, Carmel

  • Xtensa ArchitectureProcessor ControlsTRACE PortJTAG Tap ControlOn Chip DebugAlign and DecodeCoprocessor Register FileCoprocessor Execution UnitsWindow Register FileALU & Address GenerationMAC 16Designer Defined Instruction Execution UnitInstruction Memory or Cache & TagsBranch Logic & Instruction FetchDate Memory or Cache &TagsProcessor InterfaceWrite BufferTimers 1 to nSpecial Function Register AccessData Address Watch 0 to n Instruction Address Watch 0 to nInstructionBase ISA FeatureConfigurable FunctionOptional FunctionConfigurable & Optional FunctionExtensibleDataInstruction AddressData AddressException SupportInterrupt ControlMemory Protection UnitSource: www.tensilica.com

  • Xtensa Processor Design FlowProcessor Configuration InputsDesigner-Defined Instruction DescriptionsConfiguration FileConfigured GNU C/C++ CompilerConfigured GNU Assembler/ DisassemblerConfigured Instruction Set Simulator/EmulatorConfigured Processor HDLArea, Power and Timing EstimationApplication Source CodeSample Application DataOptimized SoftwareOptimized HardwareGenerator OutputInternal DatabaseDesign dataUse of Generated DataSource: www.tensilica.com

  • Manual Custom Instruction Generation ProcedureIdentify potential new instructionsDescribe custom instructionsInsert custom instructionsVerify functional correctnessProfile, read source codeUnderstand source codeRewrite source codeSlow and error-prone

  • Contributions of Our WorkAutomatic custom instruction selectionApplication program to extensible processors with custom instructionsFeaturesEfficient design space searchUse accurate information from instruction set simulator and synthesisBridge the gap between automatic synthesized and manually designed architectures

  • Automatic Custom Instruction Generation Flow

    Title

    Application program (C)

    Generate individual custom instr

    Profile C program

    6 - 13

    Generate program dependence graphs

    Rank control blocks

    Generate templates

    Select templates

    1

    2

    3

    4

    5

    Select custom instr combination

    Generate custom instr combination

    Build processor

    14

    15

    16

    17

    18

    19

    Aristotle analysis system

    Profiler (xt-gprof)

    Synthesize custom instr combination

    Clock period/area constraints met?

    Next instr combination

    N

    Profile C with instr combination

    Y

    Synthesize processor

    20

  • Automatic Custom Instruction Generation Flow

    Title

    Application program (C)

    Generate individual custom instr

    Profile C program

    6 - 13

    Generate program dependence graphs

    Rank control blocks

    Generate templates

    Select templates

    1

    2

    3

    4

    5

    Select custom instr combination

    Generate custom instr combination

    Build processor

    14

    15

    16

    17

    18

    19

    Aristotle analysis system

    Profiler (xt-gprof)

    Synthesize custom instr combination

    Clock period/area constraints met?

    Next instr combination

    N

    Profile C with instr combination

    Y

    Synthesize processor

    20

  • Example Illustration of Template Generation

    c = a & 0xff;// node 1d = b & 0xff + c;// node 2e = d

  • Example Illustration of Template Generation

    c = a & 0xff;// node 1d = b & 0xff + c;// node 2e = d

  • Example Illustration of Template Generation

    1

    2

    3

    4

    2

    1

    3

    4

    0.03

    0.03

    0.03

    0.06

    a

    f

    b

    c

    d

    e

    g

    c = a & 0xff;// node 1d = b & 0xff + c;// node 2e = d

  • Example Illustration of Template Generation

    2

    3

    1

    2

    3

    4

    Basic templates

    1

    2

    3

    Dependent templates

    1

    2

    2

    1

    3

    4

    0.03

    0.03

    0.03

    0.06

    a

    f

    b

    c

    d

    e

    g

    c = a & 0xff;// node 1d = b & 0xff + c;// node 2e = d

  • Example Illustration of Template Generation

    1

    2

    3

    4

    Basic templates

    1

    2

    3

    1

    2

    2

    3

    2

    4

    3

    4

    1

    2

    4

    2

    3

    4

    1

    2

    3

    4

    1

    4

    Dependent templates

    Independent templates

    2

    1

    3

    4

    0.03

    0.03

    0.03

    0.06

    a

    f

    b

    c

    d

    e

    g

    c = a & 0xff;// node 1d = b & 0xff + c;// node 2e = d

  • Key Observations for Pruning

    Higher the weight of the template, higher the potential for improvement --- Amdahls lawScope for optimization determined by computation --- No. of cycles needed for executing the templateScope for optimization determined by read/write ports limitation --- Additional cycles needed for extra reading/writing of input/output variables

  • Pruning AlgorithmRanking criterion:

    OriginalTime: Fraction of the total execution time of the original program spent in the template (weight)In, Out: Number of inputs and outputs of the template, respectively, : Number of inputs/outputs encoded in the instruction: No. of cycles needed for executing the templateHigher priority means greater potential for speed up

  • Template Generation with Pruning10.517.924.052.13Ranked pool of seed templatesThreshold: 0.1Template set

  • Template Generation with Pruning12.73Highest priority1.1816.35Threshold: 0.1Template setRanked pool of seed templates

  • Template Generation with Pruning12.73Highest priority16.35Threshold: 0.1Template setRanked pool of seed templates

  • Template Generation with Pruning12.73Highest priority16.35Threshold: 0.1Template setRanked pool of seed templates

  • No. of Templates vs. Threshold Ratio

  • Automatic Custom Instruction Generation Flow

    Title

    Application program (C)

    Generate individual custom instr

    Profile C program

    6 - 13

    Generate program dependence graphs

    Rank control blocks

    Generate templates

    Select templates

    1

    2

    3

    4

    5

    Select custom instr combination

    Generate custom instr combination

    Build processor

    14

    15

    16

    17

    18

    19

    Aristotle analysis system

    Profiler (xt-gprof)

    Synthesize custom instr combination

    Clock period/area constraints met?

    Next instr combination

    N

    Profile C with instr combination

    Y

    Synthesize processor

    20

  • Automatic Custom Instruction Generation Flow (Contd.)

    Title

    Select templates

    Generate individual custom instr

    6 - 13

    Next template

    5

    6

    7

    All templates built?

    8

    9

    10

    11

    12

    13

    N

    Y

    Extract templates

    Generate custom instr

    Generate RTL Verilog

    Synthesize Verilog

    Profile C with custom instr

    Clock period constraint met?

    Insert custom instr

    TIE compiler

    Synopsys design compiler

    Y

    N

    Increase number of cyclesor increase clock period

  • Automatic Custom Instruction Generation Flow (Contd.)

    Title

    Select templates

    Generate individual custom instr

    6 - 13

    Next template

    5

    6

    7

    All templates built?

    8

    9

    10

    11

    12

    13

    N

    Y

    Extract templates

    Generate custom instr

    Generate RTL Verilog

    Synthesize Verilog

    Profile C with custom instr

    Clock period constraint met?

    Insert custom instr

    TIE compiler

    Synopsys design compiler

    Y

    N

    Increase number of cyclesor increase clock period

  • Custom Instruction InsertionCare must be taken to insert custom instructions into appropriate places without affecting programs functional correctnessIf custom instructions need extra inputs (outputs), care must be taken to select appropriate variables to write to (read from) user-defined registers

  • Example Illustration of Custom Instruction Insertion

    1

    4

    3

    5

    2

    t = s >> 24;// 1r = t & 0xff;// 2a[5] = t + d;// 3m = b[0];// 4y = x + m;// 5

    3

    4

    1,2,5

    m = b[0];// 4y = CustomInstr(s,m);//1,2,5t = RUR(0);//1,2,5a[5] = t + d;// 3

    (a)

    (b)

  • Example Illustration of Custom Instruction Insertion (Contd.)(a) (b).... offset = t + 1; for (i=0; i
  • Automatic Custom Instruction Generation Flow

    Title

    Application program (C)

    Generate individual custom instr

    Profile C program

    6 - 13

    Generate program dependence graphs

    Rank control blocks

    Generate templates

    Select templates

    1

    2

    3

    4

    5

    Select custom instr combination

    Generate custom instr combination

    Build processor

    14

    15

    16

    17

    18

    19

    Aristotle analysis system

    Profiler (xt-gprof)

    Synthesize custom instr combination

    Clock period/area constraints met?

    Next instr combination

    N

    Profile C with instr combination

    Y

    Synthesize processor

    20

  • Custom Instruction Combination Selection --- Problem StatementGiven a set of non-overlapping custom instructions, with each instruction having several versions, find a version for each instruction such that performance is maximized while area is under a certain threshold

  • Custom Instruction Combination Selection --- Flow Chart

    Start

    All instrs analyzed?

    Add current version of current instr to solution

    Performance upper bound is among the best?

    Area meets constraint?

    All versions considered?

    Stop

    Performance is among the best?

    Update best solutions

    N

    Y

    Y

    Y

    Y

    Y

    N

    N

    N

    Next version

    Next instruction(recursive call)

    N

    Start

    All instrs analysized?

    Add current version of current instr in solution

    Performance up bound is among the best?

    Area is under maximum?

    All versions considered?

    Stop

    Performance is among the best?

    Update best solutions

    N

    Y

    Y

    Y

    Y

    Y

    N

    N

    Next version

    Next instructionrecursive call

    Start

    All instrs analysized?

    Add current version of current instr in solution

    Performance up bound is among the best?

    Area is under maximum?

    All versions considered?

    Stop

    Performance is among the best?

    Update best solutions

    N

    Y

    Y

    Y

    Y

    Y

    N

    N

    Next version

    Next instructionrecursive call

  • Automatic Custom Instruction Generation Flow

    Title

    Application program (C)

    Generate individual custom instr

    Profile C program

    6 - 13

    Generate program dependence graphs

    Rank control blocks

    Generate templates

    Select templates

    1

    2

    3

    4

    5

    Select custom instr combination

    Generate custom instr combination

    Build processor

    14

    15

    16

    17

    18

    19

    Aristotle analysis system

    Profiler (xt-gprof)

    Synthesize custom instr combination

    Clock period/area constraints met?

    Next instr combination

    N

    Profile C with instr combination

    Y

    Synthesize processor

    20

  • Experimental MethodologyC ProgramAutomatic Custom Instruction GenerationAristotleXtensa TIE CompilerSynopsys Design CompilerXtensa GNU ProfilerTensilica Processor GeneratorSynopsys Design CompilerModified C program Cross CompilerISSSente WattwatcherAreaClock PeriodExecution CyclesPower

  • Experimental Results (Contd.)AveragePerformance improvement: 3.4X Energy reduction: 3.2XEnergy*delay reduction: 12.6X Area increase: 1.8%

  • ConclusionsAutomatic custom instruction synthesis for ASIPsTemplate generation/selectionCustom instruction insertionCustom instruction combination selectionExperimental results3.4X average performance improvement12.6X average energy*delay reduction