design automation methodologies for extensible processors platform

54
[email protected] Design Automation Design Automation Methodologies for Methodologies for Extensible Processors Extensible Processors Platform Platform Newton Cheung Newton Cheung School of Computer Science & Engineering School of Computer Science & Engineering The University of New South Wales The University of New South Wales Sydney, Australia Sydney, Australia

Upload: noura

Post on 12-Jan-2016

46 views

Category:

Documents


4 download

DESCRIPTION

Design Automation Methodologies for Extensible Processors Platform. Newton Cheung School of Computer Science & Engineering The University of New South Wales Sydney, Australia. Outline. System-On-Chips Design Challenges Extensible Processors Platform Background – Xtensa - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Design Automation Methodologies for Extensible Processors Platform

[email protected]

Design Automation Methodologies Design Automation Methodologies for Extensible Processors Platformfor Extensible Processors Platform

Newton CheungNewton Cheung

School of Computer Science & EngineeringSchool of Computer Science & EngineeringThe University of New South WalesThe University of New South Wales

Sydney, AustraliaSydney, Australia

Page 2: Design Automation Methodologies for Extensible Processors Platform

[email protected] 2

OutlineOutline

• System-On-Chips Design ChallengesSystem-On-Chips Design Challenges• Extensible Processors PlatformExtensible Processors Platform• Background – XtensaBackground – Xtensa• Problems in Extensible Processors PlatformProblems in Extensible Processors Platform• The Goal of the researchThe Goal of the research• Proposed SolutionProposed Solution• INSIDE INSIDE • MINCEMINCE• ConclusionsConclusions

Page 3: Design Automation Methodologies for Extensible Processors Platform

[email protected] 3

SoC Design ChallengesSoC Design Challenges

Increasing software contentIncreasing software content

Customizing the architecture to Customizing the architecture to the specific application the specific application (application domain)(application domain)

• Power consumptionPower consumption

• Chip areaChip area

• CostCost

• Application functionalityApplication functionality

• Ever changing nature of embedded productEver changing nature of embedded product

FlexibilityFlexibility

EfficiencyEfficiency

Page 4: Design Automation Methodologies for Extensible Processors Platform

[email protected] 4

Flexibility vs. Efficiency (Energy)Flexibility vs. Efficiency (Energy)

General Purpose Processor

0.1-1 MIPS/mw

1-10 MIPS/mw

50-100 MIPS/mw

500-1000 MIPS/mw

En

erg

y E

ffic

ien

cy

Fle

xib

ilit

y

Application Specific Instruction-set Processor (ASIP)

Fle

xib

ilit

y

Application Specific Integrated Circuit (ASIC)

Domain SpecificProcessor (DSP)

Source: Fei Sun @ ICCAD 2002Source: Fei Sun @ ICCAD 2002

Page 5: Design Automation Methodologies for Extensible Processors Platform

[email protected] 5

Flexibility vs. Efficiency (Performance)Flexibility vs. Efficiency (Performance)

ASICASIP

FPGAGPP

Pe

rfo

rma

nc

e

Flexibility

Page 6: Design Automation Methodologies for Extensible Processors Platform

[email protected] 6

General Purpose ProcessorGeneral Purpose ProcessorApplication Specific ProcessorApplication Specific Processor

Page 7: Design Automation Methodologies for Extensible Processors Platform

[email protected] 7

Optimized Processor Design (ASIP)Optimized Processor Design (ASIP)

• CostCost– Small sizeSmall size

• PerformancePerformance– Application extensions Application extensions

• ProductivityProductivity– Rapid hardware Rapid hardware

and softwareand software

Page 8: Design Automation Methodologies for Extensible Processors Platform

[email protected] 8

Extensible Processors PlatformExtensible Processors Platform

• Represents the state-of-the-art in Represents the state-of-the-art in application specific application specific instruction-set processor (ASIP)instruction-set processor (ASIP)

• Consists of a Consists of a base processorbase processor containing a containing a base base instruction set, instruction set, plus the capability to plus the capability to customize their customize their architecture architecture to replace computationally intensive code to replace computationally intensive code segmentssegments

• The goal of designing extensible processors is The goal of designing extensible processors is typically to typically to maximize the performancemaximize the performance of an of an application, while application, while meeting design constraintsmeeting design constraints

Page 9: Design Automation Methodologies for Extensible Processors Platform

[email protected] 9

Extensible Processors PlatformExtensible Processors Platform• Enables to address Enables to address threethree architectural levels on the architectural levels on the

base processor:base processor: – Inclusion/Exclusion of Inclusion/Exclusion of predefined blockspredefined blocks– InstructionsInstructions extension extension– Parameterizations Parameterizations

I:Instruction Fetch

R:Instruction Decode / Register Fetch

E: Execute / Effective Address

M: Memory Access / Branch Complete

W: Write Back

Instruction ROM / RAM / Cache

Write Back / Exception Resolution

Decode / General Registers

ALU / Address Generation

Data ROM / RAM / Cache

Custom / Specific Instruction

Predefined Blocks (e.g. DSP – Digital Signal Processing, Floating-Point Unit)

Predefined Block Register (i.e. 64-bit)

Page 10: Design Automation Methodologies for Extensible Processors Platform

[email protected] 10

• System-On-Chips Design ChallengesSystem-On-Chips Design Challenges• Extensible Processors PlatformExtensible Processors Platform• Background – XtensaBackground – Xtensa• Problems in Extensible Processors PlatformProblems in Extensible Processors Platform• The Goal of the researchThe Goal of the research• Proposed SolutionProposed Solution• INSIDE INSIDE • MINCEMINCE• ConclusionsConclusions

OutlineOutline

Page 11: Design Automation Methodologies for Extensible Processors Platform

[email protected] 11

Background (Xtensa)Background (Xtensa)

Source: www.tensilica.com Source: www.tensilica.com

Page 12: Design Automation Methodologies for Extensible Processors Platform

[email protected] 12

Xtensa ArchitectureXtensa ArchitectureSource: www.tensilica.com Source: www.tensilica.com

Page 13: Design Automation Methodologies for Extensible Processors Platform

[email protected] 13

Xtensa Custom Instructions (TIE)Xtensa Custom Instructions (TIE)

Source: www.tensilica.com Source: www.tensilica.com

Page 14: Design Automation Methodologies for Extensible Processors Platform

[email protected] 14

Xtensa’s Design FlowXtensa’s Design Flow

Source: www.tensilica.com Source: www.tensilica.com

Page 15: Design Automation Methodologies for Extensible Processors Platform

[email protected] 15

Extensible Processor (Why?) Extensible Processor (Why?)

Standard Standard ProcessorProcessor

Extensible Extensible ProcessorProcessor

ASIC ASIC (RTL Logic)(RTL Logic)

Application tuned Application tuned data pathsdata paths

NONO Yes: High-level Yes: High-level TIETIE

Low-level RTL Low-level RTL

Task ControlTask Control C/C++C/C++ C/C++C/C++ NoNo

SimulationSimulation Fast Fast simulation or simulation or

boardboard

Fast simulation Fast simulation or boardor board

RTL simulation: RTL simulation: 100x slower100x slower

Multiple EnginesMultiple Engines LimitedLimited Simple Simple directed MP directed MP

interfaceinterface

Possible, but Possible, but hard to design hard to design

and modeland model

Page 16: Design Automation Methodologies for Extensible Processors Platform

[email protected] 16

Example: Digital VideoExample: Digital VideoSource: www.tensilica.com Source: www.tensilica.com

Page 17: Design Automation Methodologies for Extensible Processors Platform

[email protected] 17

Example: Multiple Processors for TOEExample: Multiple Processors for TOE

Source: www.tensilica.com Source: www.tensilica.com

Page 18: Design Automation Methodologies for Extensible Processors Platform

[email protected] 18

Example: Voice GatewayExample: Voice GatewaySource: www.tensilica.com Source: www.tensilica.com

Page 19: Design Automation Methodologies for Extensible Processors Platform

[email protected] 19

• System-On-Chips Design ChallengesSystem-On-Chips Design Challenges• Extensible Processors PlatformExtensible Processors Platform• Background – XtensaBackground – Xtensa• Problems in Extensible Processors PlatformProblems in Extensible Processors Platform• The Goal of the researchThe Goal of the research• Proposed SolutionProposed Solution• INSIDE INSIDE • MINCEMINCE• ConclusionsConclusions

OutlineOutline

Page 20: Design Automation Methodologies for Extensible Processors Platform

[email protected] 20

Problems in Automation??Problems in Automation??

Source: www.tensilica.com Source: www.tensilica.com

Design Expertise

Required!!

Page 21: Design Automation Methodologies for Extensible Processors Platform

[email protected] 21

Generic Design Flow of Extensible Generic Design Flow of Extensible ProcessorProcessor

Application in C/C++

Compilation

Analysis and Profiling

Identify:1) Predefined blocks2) Extensible instructions3) Parameter settings

Define/Select: 1) Predefined blocks 2) Extensible instructions3) Parameter settings

Evaluate: Compiler, Linker, Assembler, Debugger, Instruction Set Simulator

Explore extensible processor

design space

OK?

Synthesizable RTL of: Base processor Predefined blocks Extensible instructions

Generate extensible processor

Synthesis and tape out

Proto-typing

Instruction Library

No Yes

Page 22: Design Automation Methodologies for Extensible Processors Platform

[email protected] 22

PreviousPrevious Work Workss

• Profiling/IdentificationProfiling/Identification– [Binh 1998], [Yang 2002], ARC, Xtensa[Binh 1998], [Yang 2002], ARC, Xtensa

• Design methodology for different aspectsDesign methodology for different aspects– [Gupta 2000], [Jain 2001][Gupta 2000], [Jain 2001]

• Instruction generation/selectionInstruction generation/selection– [Brisk 1998], [Kastner 2001], [Sun 2003], [Zhao [Brisk 1998], [Kastner 2001], [Sun 2003], [Zhao

2002]2002]

• Overall design flow for extensible processorsOverall design flow for extensible processors– [Kathail 2002], [Lee 2002], [Sun 2002][Kathail 2002], [Lee 2002], [Sun 2002]

• Vendor and academic Vendor and academic – ARC, Lisatek, Xtensa, ASIP-MeisterARC, Lisatek, Xtensa, ASIP-Meister

Page 23: Design Automation Methodologies for Extensible Processors Platform

[email protected] 23

• System-On-Chips Design ChallengesSystem-On-Chips Design Challenges• Extensible Processors PlatformExtensible Processors Platform• Background – XtensaBackground – Xtensa• Problems in Extensible Processors PlatformProblems in Extensible Processors Platform• The Goal of the researchThe Goal of the research• Proposed SolutionProposed Solution• INSIDE INSIDE • MINCEMINCE• ConclusionsConclusions

OutlineOutline

Page 24: Design Automation Methodologies for Extensible Processors Platform

[email protected] 24

Goal of the ResearchGoal of the Research

• To To automateautomate the design flow of extensible the design flow of extensible processors platformprocessors platform..

• Given an application and design constraints, Given an application and design constraints, the system configures an extensible the system configures an extensible processor that processor that maximizes the performancemaximizes the performance of of an application while an application while satisfying the design satisfying the design constraintsconstraints..

Page 25: Design Automation Methodologies for Extensible Processors Platform

[email protected] 25

Proposed SolutionProposed Solution

Application in C/C++

Compilation

Analysis and Profiling

Identify:1) Predefined blocks2) Extensible instructions3) Parameter settings

Evaluate: Compiler, Linker, Assembler, Debugger, Instruction Set Simulator

Explore extensible processor

design space

OK?

Synthesizable RTL of: Base processor Predefined blocks Extensible instructions

Generate extensible processor

Synthesis and tape out

Proto-typing

Instruction Library

No Yes

Define/Select: 1) Predefined blocks 2) Extensible instructions3) Parameter settings

Application in C/C++

Compilation

Analysis and Profiling

Identify:1) Predefined blocks2) Extensible instructions3) Parameter settings

Evaluate: Compiler, Linker, Assembler, Debugger, Instruction Set Simulator

Explore extensible processor

design space

OK?

Synthesizable RTL of: Base processor Predefined blocks Extensible instructions

Generate extensible processor

Synthesis and tape out

Proto-typing

Instruction LibraryINSIDE

system

Define: 1) P.B.2) E.I.3) P.S.

Select: 1) P.B.2) E.I.3) P.S.

Performance Estimation Model1) Information from profiling2) Synthesized information of instruction

Use libraries of pre-configured processor and

instructions to limit the design space

Fitting Function:- Finds suitable code segment to implement as extensible instruction

MINCE tool- Match code segment to instruction

No Yes

Page 26: Design Automation Methodologies for Extensible Processors Platform

[email protected] 26

1.1. INSIDEINSIDE system system– Identifies sections of code segments which are Identifies sections of code segments which are

suitable for translation to instructions.suitable for translation to instructions.– A two-level hierarchy approach.A two-level hierarchy approach.– A performance estimatorA performance estimator..ReduceReducess the the design turnaround timedesign turnaround time significantly. significantly.

2.2. MINCEMINCE tool tool– Match code segment with pre-synthesized Match code segment with pre-synthesized

extensible instruction using combinational extensible instruction using combinational equivalence.equivalence.

Enhances Enhances reusabilityreusability of the extensible instructions. of the extensible instructions.

INSIDE / MINCE INSIDE / MINCE

Page 27: Design Automation Methodologies for Extensible Processors Platform

[email protected] 27

• System-On-Chips Design ChallengesSystem-On-Chips Design Challenges• Extensible Processors PlatformExtensible Processors Platform• Background – XtensaBackground – Xtensa• Problems in Extensible Processors PlatformProblems in Extensible Processors Platform• The Goal of the researchThe Goal of the research• Proposed SolutionProposed Solution• INSIDE INSIDE • MINCEMINCE• ConclusionsConclusions

OutlineOutline

Page 28: Design Automation Methodologies for Extensible Processors Platform

[email protected] 28

INSIDE system (Overview)INSIDE system (Overview)Processor 3

Base

ProcessorDSP

Processor 2

Base

ProcessorFPU

Processor 1

Base

Processor

Compile, Simulate, andProfile

Execution Time EstimationPhase IV

Heuristic Algorithm(Part 1: for Selecting pre-configured processor)

Phase I

Heuristic Algorithm(Part 2: for Selecting pre-designed

extensible instruction)

Phase IIIPre-designed Extensible

Library

Methodology for Identifying Code Segment, ImplementingExtensible Instruction, and Extending the Instruction Library

Phase II

Processor with coprocessor & extensible instructions Execution Time

Page 29: Design Automation Methodologies for Extensible Processors Platform

[email protected] 29

Phase I: Heuristic Algorithm (Part I)Phase I: Heuristic Algorithm (Part I)

• For selecting pre-configured processorFor selecting pre-configured processor

• The The area delay productarea delay product, , EPEPii of processor of processor ii for a for a

certain applicationcertain application

AreaPeriodCCEP

#

1

ProcessorProcessor Clock CycleClock Cycle Clock PeriodClock Period AreaArea EPEP

P1P1 1200012000 6ns6ns 50005000 2.782.78

P2P2 80008000 8ns8ns 80008000 1.951.95

Page 30: Design Automation Methodologies for Extensible Processors Platform

[email protected] 30

Phase II: Identify Code SegmentsPhase II: Identify Code Segments

• Problem: Problem: – Application has Application has millions of lines of C codemillions of lines of C code, how do , how do

we know which code segment is good for we know which code segment is good for converting to extensible instructionconverting to extensible instruction

• Fitting function Fitting function – Identifies code segments which are Identifies code segments which are suitablesuitable for for

translation to extensible instructionstranslation to extensible instructions– Extracts Extracts characteristicscharacteristics of the code segment of the code segment

Page 31: Design Automation Methodologies for Extensible Processors Platform

[email protected] 31

Fitting FunctionFitting Function

• The four characteristics:The four characteristics:– The The frequency of usefrequency of use of a code segment of a code segment– The The number of operandsnumber of operands in a code segment in a code segment– The percentage of integer (short) The percentage of integer (short) type operandstype operands in all the in all the

operandsoperands– The percentage of The percentage of bit operationsbit operations in all the operands in all the operands

• The fitting function:The fitting function:

α α – the ideal number of operands in the code segment.– the ideal number of operands in the code segment.

......

1.. OBOT

ONUFctionFittingFun

Page 32: Design Automation Methodologies for Extensible Processors Platform

[email protected] 32

RelationshipRelationship

• Relationship between the fitting function and the Relationship between the fitting function and the speedup/area ratio of the instructionspeedup/area ratio of the instruction

Voice Encoder

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

FittingFunction

Speedup/area

Page 33: Design Automation Methodologies for Extensible Processors Platform

[email protected] 33

Phase III: Heuristic Algorithm (Part II)Phase III: Heuristic Algorithm (Part II)

• For selecting extensible instructionsFor selecting extensible instructions• The The potential speedup/area ratiopotential speedup/area ratio, , PSARPSAR, of , of

extensible instruction in processor:extensible instruction in processor:

max

_#_%

LatencyArea

SpeedupCCofPSAR

InstructionInstruction % of total % of total clock cycleclock cycle

SpeedupSpeedup AreaArea LatencyLatency PSARPSAR

Inst 1Inst 1 1313 3x3x 15001500 6ns6ns 4333343333

Inst 2Inst 2 1010 6x6x 25002500 6ns6ns 3428634286

Page 34: Design Automation Methodologies for Extensible Processors Platform

[email protected] 34

Phase IV: Performance EstimationPhase IV: Performance Estimation

• For rapidly estimating the execution timeFor rapidly estimating the execution time• The The execution time estimationexecution time estimation, , ETEETE, for an extensible , for an extensible

processor with a set of selected extensible processor with a set of selected extensible instruction:instruction:

max

__ Latency

Speedup

affectedCCUnaffectedCCETE

Page 35: Design Automation Methodologies for Extensible Processors Platform

[email protected] 35

INSIDE: INstruction Selection/Identification &Design Exploration for Extensible Processors

Pre-configured ProcessorLibrary

AreaConstraint

Processor with coprocessor &extensible instructions

Pre-designed InstructionLibrary

Execution Time

Application

ExhaustivelySimulation

All the solutions

Verify the solutionand system

Experimental MethodologyExperimental Methodology

Page 36: Design Automation Methodologies for Extensible Processors Platform

[email protected] 36

ResultsResults

Page 37: Design Automation Methodologies for Extensible Processors Platform

[email protected] 37

Results (Design Turnaround Time)Results (Design Turnaround Time)

100 160 110 90 90 45 150

3100

9600

3840

1800 1800

3600

24000

0

5000

10000

15000

20000

25000

adpcmencoder

gsm encoder gsm decoder g721encoder

g721decoder

mpeg2decoder

voicerecognition

Application

Tim

e (

min

ute

)

INSIDE

Full Exploration

Page 38: Design Automation Methodologies for Extensible Processors Platform

[email protected] 38

Execution Time vs Area

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

0 50,000 100,000 150,000 200,000 250,000 300,000

Area [gates]

Exe

cutio

n T

ime

[se

con

d]

Results (Pareto Points)Results (Pareto Points)

Execution Time vs Area

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

0 50,000 100,000 150,000 200,000 250,000 300,000

Area [gates]

Exe

cutio

n T

ime

[se

con

d]

Page 39: Design Automation Methodologies for Extensible Processors Platform

[email protected] 39

Results of Results of INSIDEINSIDE system system

• Mediabench applicationsMediabench applications• Speedup of the applications:Speedup of the applications:

– On average On average 2.03x2.03x (up to 7x) (up to 7x)

• Hardware overheads:Hardware overheads:– On average On average 25%25% (up to 72%) (up to 72%)

• Pareto points:Pareto points:– Obtained on average Obtained on average 83%83% (up to 100%) (up to 100%)– On average within On average within 5%5% of the execution time of the execution time

Page 40: Design Automation Methodologies for Extensible Processors Platform

[email protected] 40

• System-On-Chips Design ChallengesSystem-On-Chips Design Challenges• Extensible Processors PlatformExtensible Processors Platform• Background – XtensaBackground – Xtensa• Problems in Extensible Processors PlatformProblems in Extensible Processors Platform• The Goal of the researchThe Goal of the research• Proposed SolutionProposed Solution• INSIDE INSIDE • MINCEMINCE• ConclusionsConclusions

OutlineOutline

Page 41: Design Automation Methodologies for Extensible Processors Platform

[email protected] 41

The Aim of the The Aim of the MINCEMINCE

• MMatching atching ININstructions using structions using CCombinational ombinational EEquivalencequivalence

• Automatically Automatically matching code segmentsmatching code segments to pre- to pre-synthesized synthesized specific instructionsspecific instructions..

// Pre-synthesized specific instructions

state total 32

iclass ei EI {out arr, in art, in ars} {in state}

reference EI {

wire [31;0] tmp

assign tmp = TIEmul(art, ars, 1’b0) >> 4;

assign arr = tmp + state;

}

// Application in C

int main() {

… …

// Code segment

for (x = 0; x < 100000; x++) {

tmp = a[x] * b[x] << 4;

total += tmp;

}

… …

} Functional Equivalent

Page 42: Design Automation Methodologies for Extensible Processors Platform

[email protected] 42

MotivationMotivationApplication in C/C++

Compilation

Analysis and Profiling

Identifying the “hotspots” of the application through simulation,

profiling, and trace

Designing extensible instructions for the “hotspots”

Testing/Verifying the functionality, speedup, area, power consumption

of extensible instruction

Explore extensible processor

design space

OK?

Synthesizable RTL of base processor and extensible instructions

Generate extensible processor

Synthesis and tape out

Proto-typing

Designed Extensible Instruction Library

No Yes

Selecting the extensible instructions based on the design

constraints

Design Constraints (Performance, Area, Power Consumption)

MINCE tool- Matching designed extensible instruction to the code segment

Page 43: Design Automation Methodologies for Extensible Processors Platform

[email protected] 43

MINCE ToolMINCE Tool

• An An automated toolautomated tool for matching pre- for matching pre-synthesized extensible instruction to the synthesized extensible instruction to the functional equivalencefunctional equivalence of code segments of code segments using combinational equivalence checking in using combinational equivalence checking in the extensible processors platform.the extensible processors platform.

Page 44: Design Automation Methodologies for Extensible Processors Platform

[email protected] 44

MINCE ToolMINCE Tool

• MINCE consists: MINCE consists: – A A translatortranslator– A A filtering filtering

algorithmalgorithm– A A combinational combinational

equivalence equivalence checkingchecking tool tool

ApplicationSoftware in

C/C++

Functional equivalence implementation

ExtensibleInstruction Library

(Verilog HDL)

Translator

Combinational Equivalence Checking Tool

MINCEtool

I

III

Filtering AlgorithmII

Page 45: Design Automation Methodologies for Extensible Processors Platform

[email protected] 45

Related WorksRelated Works

• Simulation techniques:Simulation techniques:[Stadler 1999][Stadler 1999]

• Graph matching techniques:Graph matching techniques:[Corazao 1993] [Kang 1995] [Liem 1994] [Shu 1996][Corazao 1993] [Kang 1995] [Liem 1994] [Shu 1996]

• Equivalence verifications:Equivalence verifications:[Clarke 2003], [Pnueli 1998], [Semeria 2002][Clarke 2003], [Pnueli 1998], [Semeria 2002]

Page 46: Design Automation Methodologies for Extensible Processors Platform

[email protected] 46

Our ContributionsOur Contributions

• Enhances Enhances reusabilityreusability of the extensible of the extensible instructions.instructions.

• MINCE tool is MINCE tool is superiorsuperior to computation- to computation-intensive and error-prone intensive and error-prone simulation simulation approaches.approaches.

• The usage of functional equivalence checking The usage of functional equivalence checking ensures that the results are largely ensures that the results are largely independentindependent of the programming styleof the programming style of the of the application.application.

Page 47: Design Automation Methodologies for Extensible Processors Platform

[email protected] 47

Phase I: TranslatorPhase I: Translator

• The goal of the translator is to The goal of the translator is to convertconvert the application the application written in C/C++ to a set of code segment in Verilog written in C/C++ to a set of code segment in Verilog HDL using a HDL using a systematic approachsystematic approach..

• To reduce the To reduce the granularitygranularity of the application written in of the application written in C/C++.C/C++.

• The translator consists of The translator consists of fourfour steps: steps:– Separate the application into code segments;Separate the application into code segments;– Compile code segments;Compile code segments;– Convert to register transfer list;Convert to register transfer list;– Map to Verilog HDL file.Map to Verilog HDL file.

Page 48: Design Automation Methodologies for Extensible Processors Platform

[email protected] 48

TranslatorTranslator

Compile (Step II)

Convert (Step III)

Map (Step IV)

Assembler Instruction HardwareLibrary (Verilog HDL)

... …

module mult (product, input1, input2);… …

endmodule

module add (sum, in1, in2);… …

endmodule

module sfr (out1, in1, amt);… …

endmodule

module sfl … ...

module cmpl … ...

Separate (Step I)

Application Software written in C/C++

// High-level code segmentint example (int sum, int input1, int input2) {

total = sum + (input1 * input2) >> 4;return total;

}

// Assembly Codemult R6, R1, R2mov R4, R6sar R4, $4add R5, R4, R3

// Register Transfer ListR6 = R1 * R2;R4 = R6;R4_1 = R4 >> 4;R5 = R3 + R4_1;

// Verilog Codemodule example (total, sum, input1, input2);

output [31:0] total;input [31:0] sum, input1, input2;wire [31:0] r1, r2, r3, r4, r4_1, r5;wire [63:0] r6;

r1 = input1;r2 = input2;r3 = sum;mult (r6, r1, r2);r4 = r6;sfr (r4_1, r4, 4);add (r5, r3, r4_1);total = r5;

endmodule

module mult (product, in1, in2);output [63:0] product;input [31:0] in1;input [31:0] in2;

… …endmodule

module add (sum, in1, in2);… …

endmodule

module sfr (out1, in1, amt);… …

endmodule

Page 49: Design Automation Methodologies for Extensible Processors Platform

[email protected] 49

Phase II: Filtering AlgorithmPhase II: Filtering Algorithm

• The goal of the filtering algorithm is to The goal of the filtering algorithm is to eliminateeliminate the the unnecessary and complex Verilog HDL file into the unnecessary and complex Verilog HDL file into the combinational equivalence checking model.combinational equivalence checking model.

• Verilog HDL files can be pruned as non-match:Verilog HDL files can be pruned as non-match:– Differing Differing number of portsnumber of ports;;– Differing Differing port sizesport sizes;;– Insufficient Insufficient number of base hardware modulesnumber of base hardware modules to to

present complex module. present complex module.

Page 50: Design Automation Methodologies for Extensible Processors Platform

[email protected] 50

Combinational Equivalence CheckingCombinational Equivalence Checking

• Binary Decision DiagramBinary Decision Diagram (BDD) (BDD) • For example,For example,

\\ High-level Code Segment S = (a + b) * 16;

\\ Extensible Instruction S = (a + b) << 4;

ai

bi

cii

cii

bi

cii

cii

10

S4

ai

bi

cii

cii

bi

cii

cii

10

S5

ai

bi

cii

cii

bi

cii

cii

10

ai

bi

cii

cii

bi

cii

cii

10

S7

ai

bi

cii

cii

bi

cii

cii

10

S8

ai

bi

cii

cii

bi

cii

cii

10

S9

ai

bi

cii

cii

bi

cii

cii

10

S10

ai

bi

cii

cii

bi

cii

cii

10

S11 S6 S0…S31……

ai

bi

cii

cii

bi

cii

cii

10

S4

ai

bi

cii

cii

bi

cii

cii

10

S5

ai

bi

cii

cii

bi

cii

cii

10

ai

bi

cii

cii

bi

cii

cii

10

S7

ai

bi

cii

cii

bi

cii

cii

10

S8

ai

bi

cii

cii

bi

cii

cii

10

S9

ai

bi

cii

cii

bi

cii

cii

10

S10

ai

bi

cii

cii

bi

cii

cii

10

S11 S6 S0…S31……

a1

b1

ci1 ci1

b1

ci1 ci1

10

S5

a1

b1

ci1 ci1

b1

ci1 ci1

10

S5

• Using Verification Interfacing with Synthesis (VIS) Using Verification Interfacing with Synthesis (VIS) which was jointly developed by the University of which was jointly developed by the University of California, in Berkeley and the University of Colorado, California, in Berkeley and the University of Colorado, Boulder.Boulder.

Page 51: Design Automation Methodologies for Extensible Processors Platform

[email protected] 51

ResultsResults

3

3

3

3

1

1

1

1

11

EM: Exact matchEM: Exact matchFE: Functional equivalenceFE: Functional equivalenceDM: I/O match onlyDM: I/O match onlyTW: Do not matchTW: Do not match

Simulation vs MINCE (Time on H/W instruction with different kinds of S/W code segment)

0

20

40

60

80

100

120

EM FE

DM

TW EM FE

DM

TW EM FE

DM

TW EM FE

DM

TW EM FE

DM

TW EM FE

DM

TW EM FE

DM

TW EM FE

DM

TW EM FE

DM

TW EM FE

DM

TW

Instruction 1 Instruction 2 Instruction 3 Instruction 4 Instruction 5 Instruction 6 Instruction 7 Instruction 8 Instruction 9 Instruction 10

Time (minute)

SimulationMINCE

EM - Exact MatchFE - Functional EquivalentDM - Do Not Match (I/O only)TW - Totally Wrong (Do Not Match at all)

Page 52: Design Automation Methodologies for Extensible Processors Platform

[email protected] 52

ResultsResults

Matching Time

80 75 74

10595

115

205

25 20 20

40 3521

40

10 8 9

2515 18

25

0

50

100

150

200

250

Adpcmencoder

G721encoder

G721decoder

Gsm encoder Gsm decoder MPEG2encoder

VoiceRecognition

Application

Tim

e (h

ou

r)

Simulation Technique

MINCE (w/o filtering alg.)

MINCE

Page 53: Design Automation Methodologies for Extensible Processors Platform

[email protected] 53

Future Plan Future Plan

Application in C/C++

Compilation

Analysis and Profiling

Identify:1) Predefined blocks2) Extensible instructions3) Parameter settings

OK?

Synthesizable RTL of: Base processor Predefined blocks Extensible instructions

Generate extensible processor

Synthesis and tape out

Proto-typing

Instruction Library

INSIDE system

Define: 1) P.B.2) E.I.3) P.S.

Select: 1) P.B.2) E.I.3) P.S.

Performance Estimation Model1) Information from profiling2) Synthesized information of instruction

Use libraries of pre-configured processor and

instructions to limit the design space

Fitting Function:- Finds suitable code segment to implement as extensible instruction

MINCE tool- Match code segment to instruction

No Yes

Generate Extensible Instruction Automatically

Area, Latency, Power Model

Page 54: Design Automation Methodologies for Extensible Processors Platform

[email protected] 54

ConclusionConclusion

• Extensible processors platform enables to Extensible processors platform enables to address three architectural levels in order to address three architectural levels in order to tunetune for for application specificapplication specific: : – Inclusion/Exclusion of predefined blocksInclusion/Exclusion of predefined blocks– Instructions extensionInstructions extension– ParameterizationsParameterizations

• The goal of the research is to The goal of the research is to automateautomate the the design flow of extensible processor platformdesign flow of extensible processor platform..

• INSIDE system / MINCE toolINSIDE system / MINCE tool