crec: a novel reconfigurable computing design methodology

31
Octavian Cret Octavian Cret , K , K a a lm lm a a n n Pusztai Cristian Vancea, Pusztai Cristian Vancea, Balint Szente Balint Szente Technical University of Cluj-Napoca, Technical University of Cluj-Napoca, Romania Romania CREC: A Novel CREC: A Novel Reconfigurable Reconfigurable Computing Design Computing Design Methodology Methodology

Upload: ivan

Post on 04-Feb-2016

37 views

Category:

Documents


0 download

DESCRIPTION

CREC: A Novel Reconfigurable Computing Design Methodology. Octavian Cret , K a lm a n Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania. Introduction. CREC: low-cost general-purpose reconfigurable computer; Dynamically generated architecture; - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CREC: A Novel  Reconfigurable Computing Design Methodology

Octavian CretOctavian Cret, K, Kaalmlmaan n Pusztai Pusztai Cristian Vancea, Balint SzenteCristian Vancea, Balint Szente

Technical University of Cluj-Napoca, RomaniaTechnical University of Cluj-Napoca, Romania

CREC: A Novel CREC: A Novel Reconfigurable Computing Reconfigurable Computing

Design MethodologyDesign Methodology

Page 2: CREC: A Novel  Reconfigurable Computing Design Methodology

22

IntroductionIntroductionCREC: low-cost general-purpose CREC: low-cost general-purpose reconfigurable computer;reconfigurable computer; DynamicallyDynamically generated architecture; generated architecture;Built in a Hardware/Software CoDesign Built in a Hardware/Software CoDesign manner;manner;Based on FPGA devices, on VHDL Based on FPGA devices, on VHDL language and high level language (Java);language and high level language (Java);No need for integration in a dedicated No need for integration in a dedicated VLSI chip.VLSI chip.

Page 3: CREC: A Novel  Reconfigurable Computing Design Methodology

33

CREC’s Main FeaturesCREC’s Main Features

Reconfigurable Reconfigurable RISCRISC computer computer;;ParallelParallel computer: each register has an computer: each register has an associated Execution Unit (EU)associated Execution Unit (EU);; All the EUs have an All the EUs have an identicalidentical structure, and structure, and each one is able to execute any kind of each one is able to execute any kind of instruction from the CREC Instruction Setinstruction from the CREC Instruction Set;;Having a greater number of EUs has the Having a greater number of EUs has the advantage of introducing advantage of introducing Instruction Level Instruction Level ParallelismParallelism..

Page 4: CREC: A Novel  Reconfigurable Computing Design Methodology

44

CREC Design FlowCREC Design Flow

AApppplliiccaattiioonn ssoouurrccee ccooddee

((wwrriitttteenn iinn CCRREECC AAsssseemmbbllyy LLaanngguuaaggee))

PPaarraalllleell CCoommppiilleerr

((ddeetteerrmmiinnaattiioonn ooff tthhee nnuummbbeerr ooff

sslliicceess aanndd iinnssttrruuccttiioonnss sscchheedduulliinngg))

VVHHDDLL ssoouurrccee ccooddee GGeenneerraattoorr

((wwrriitttteenn iinn JJAAVVAA))

VVHHDDLL ffiillee CCoommppiillaattiioonn

FFPPGGAA CCoonnffiigguurraattiioonn

PPrroocceessss

AApppplliiccaattiioonn EExxeeccuuttiioonn

IInntteeggrraatteedd CCRREECC DDeevveellooppmmeenntt SSyysstteemm

Page 5: CREC: A Novel  Reconfigurable Computing Design Methodology

55

The Parallel Compiler (I.)The Parallel Compiler (I.)

Parses the CREC-RISC source codeParses the CREC-RISC source code;;Takes important decisions upon the execution Takes important decisions upon the execution system that will be generatedsystem that will be generated;;Divides a program that is written in a sequential Divides a program that is written in a sequential manner into portions of code to be executed at manner into portions of code to be executed at the same time;the same time;Determines the minimal number of program Determines the minimal number of program slicesslices;;Determines which instructions will be executed Determines which instructions will be executed in parallel in each slicein parallel in each slice..

Page 6: CREC: A Novel  Reconfigurable Computing Design Methodology

66

The Parallel Compiler (II.)The Parallel Compiler (II.)

Uses a set of rules;Uses a set of rules;An example: each slice can contain at most one An example: each slice can contain at most one LoadLoad, , StoreStore or or JumpJump instruction; instruction;Reads the application source code (in CREC Reads the application source code (in CREC assembly language) and generates a file in a assembly language) and generates a file in a specificspecific format, giving a description of the format, giving a description of the tailored CRECtailored CREC;;The resulting CREC architecture contains only The resulting CREC architecture contains only the hardware needed to execute the subset of the hardware needed to execute the subset of instructions used in the program.instructions used in the program.

Page 7: CREC: A Novel  Reconfigurable Computing Design Methodology

77

Page 8: CREC: A Novel  Reconfigurable Computing Design Methodology

88

Results of the Parallel CompilerResults of the Parallel Compiler

The size of the various functional partsThe size of the various functional parts;;The subset of instructions involvedThe subset of instructions involved;;The number of execution unitsThe number of execution units ( (NN););The sequence of instructions making up The sequence of instructions making up the programthe program;;The resulting CREC architecture contains The resulting CREC architecture contains only the hardware needed to execute the only the hardware needed to execute the subset of instructions used in the program.subset of instructions used in the program.

Page 9: CREC: A Novel  Reconfigurable Computing Design Methodology

99

Slices Slices

The instructions that are assigned to each The instructions that are assigned to each EU to be executed at a same moment of EU to be executed at a same moment of time make up a program time make up a program sliceslice;;The whole program is divided into slices;The whole program is divided into slices;The slice’s size depends on the designed The slice’s size depends on the designed number of execution units used for number of execution units used for program execution.program execution.

Page 10: CREC: A Novel  Reconfigurable Computing Design Methodology

1010

Program sequence, and the instruction scheduling:Program sequence, and the instruction scheduling: [1] MOV R1,2[1] MOV R1,2 [2] MOV R2,3[2] MOV R2,3 [3] MOV R3,3[3] MOV R3,3 [4] ADD R1,R2[4] ADD R1,R2 [5] DEC R3[5] DEC R3 [6] JNZ[6] JNZ R3R3,[,[44]] [7] MOV ST[7] MOV STORORB,R1B,R1 [8] STORE [8] STORE [[200200]]

Program ExampleProgram Example

Classical, non-optimal multiplication of two integers Classical, non-optimal multiplication of two integers without overflow check using three EUswithout overflow check using three EUs

Page 11: CREC: A Novel  Reconfigurable Computing Design Methodology

1111

VHDL Source Code GeneratorVHDL Source Code Generator

VHDL fileVHDL filess contain an already written source contain an already written source code, where the main architecture’s parameters code, where the main architecture’s parameters are given as are given as genericsgenerics and and constantsconstants;;The following components can be tailored:The following components can be tailored: The number of EUs;The number of EUs; The register’s width in all the EUs;The register’s width in all the EUs; The size of the Instructions Memory and Operands The size of the Instructions Memory and Operands

Memory for each EU;Memory for each EU; The size of the Data Stack and Slice Stack Memory;The size of the Data Stack and Slice Stack Memory; The slice-mapping block, containing instructions.The slice-mapping block, containing instructions.

Page 12: CREC: A Novel  Reconfigurable Computing Design Methodology

1212

CREC General ArchitectureCREC General Architecture

EEUU11 EEUU22 SSlliiccee

MMeemmoorryy

SSlliiccee CCoouunntteerr

SSlliiccee SSttaacckk MMeemmoorryy

DDaattaa SSttaacckk MMeemmoorryy

LLooaadd BBuuffffeerr

SSttoorree BBuuffffeerr

DDaattaa MMeemmoorryy

EEUUNN

Addr

AAddddrr OOppeerraanndd

MMeemmoorryy 11

……

AAddddrr IInnssttrruuccttiioonnss MMeemmoorryy 11

AAddddrr OOppeerraanndd

MMeemmoorryy 22

AAddddrr IInnssttrruuccttiioonnss MMeemmoorryy 22

AAddddrr OOppeerraanndd

MMeemmoorryy NN

AAddddrr IInnssttrruuccttiioonnss MMeemmoorryy NN

……

Page 13: CREC: A Novel  Reconfigurable Computing Design Methodology

1313

The Hardware ArchitectureThe Hardware Architecture

The The NN Execution Units; Execution Units;Instruction Memories;Instruction Memories;Data Stack Memory (for Data Stack Memory (for PushPush and and PopPop););Slice Stack Memory (for Slice Stack Memory (for CallCall and and ReturnReturn););A Slice Program Counter;A Slice Program Counter;A Slice-mapping Memory;A Slice-mapping Memory;Store Buffer and Load Buffer;Store Buffer and Load Buffer;Data Memory (external or internal);Data Memory (external or internal);Operand Memories.Operand Memories.

Page 14: CREC: A Novel  Reconfigurable Computing Design Methodology

1414

The Instruction SetThe Instruction Set

Relatively Relatively largelarge instruction set, contains instruction set, contains more instructions than the usual more instructions than the usual microcontrollers have;microcontrollers have;Every instruction performs operation only Every instruction performs operation only on on unsignedunsigned integers; integers;Each EU is potentially able to execute Each EU is potentially able to execute any any kindkind of instruction from the CREC of instruction from the CREC Instruction Set.Instruction Set.

Page 15: CREC: A Novel  Reconfigurable Computing Design Methodology

1515

AdditionAddition with or without Carry; with or without Carry;SubtractionSubtraction with or without Borrow and with or without Borrow and comparecompare;;Logical functions: Logical functions: AndAnd, , OrOr, , XorXor, , NotNot and and Bit Bit TestTest;;ShiftShift arithmetic and logic to left/right; arithmetic and logic to left/right;RotateRotate and rotate through Carry to left/right; and rotate through Carry to left/right;IncrementIncrement//DecrementDecrement and and 2’s Complement2’s Complement..

Data Manipulation InstructionsData Manipulation Instructions

Page 16: CREC: A Novel  Reconfigurable Computing Design Methodology

1616

Instruction Format and ExampleInstruction Format and Example

““GG” defines the Instruction Group (Data Manipulation);” defines the Instruction Group (Data Manipulation);““CodeCode” is the operation code (ex. Add, Sub);” is the operation code (ex. Add, Sub);““TypeType” specifies the operation type (ex. with/without Carry);” specifies the operation type (ex. with/without Carry);““LoadLoad” contains the load signals for the register and for the ” contains the load signals for the register and for the Carry and Zero flags;Carry and Zero flags;““DD” is the Register/Data selection for the second operand.” is the Register/Data selection for the second operand.

Page 17: CREC: A Novel  Reconfigurable Computing Design Methodology

1717

Program Control InstructionProgram Control Instruction

Slice counter manipulation: Slice counter manipulation: JumpJump, , CallCall and and ReturnReturn;;Data movement: Data movement: MoveMove;;Stack manipulation: Stack manipulation: PushPush and and PopPop;;Input from and Output to port: Input from and Output to port: InIn and and OutOut;;LoadLoad from and from and StoreStore to external memory; to external memory;For great flexibility every instruction exists also in For great flexibility every instruction exists also in the conditioned form: the conditioned form: CC ( (CarryCarry), ), ZZ ( (ZeroZero), ), EE ( (EqualEqual), ), AA ( (AboveAbove), ), AEAE ( (Above or EqualAbove or Equal), ), BB ( (BelowBelow), ), BEBE ((Below or EqualBelow or Equal) and with negation too.) and with negation too.

Page 18: CREC: A Novel  Reconfigurable Computing Design Methodology

1818

Instruction Format and ExampleInstruction Format and Example

““GG” defines the Instruction Group (Program Control);” defines the Instruction Group (Program Control);““CodeCode” is the operation code (ex. Jump, Call);” is the operation code (ex. Jump, Call);““ConditionsConditions” ” field contains the code for validating the field contains the code for validating the execution of a given instructionexecution of a given instruction;;““RR” is the load signal for the Register (ex. Move);” is the load signal for the Register (ex. Move);““DD” is the Register/Data selection for the second operand.” is the Register/Data selection for the second operand.

Page 19: CREC: A Novel  Reconfigurable Computing Design Methodology

1919

The Execution UnitThe Execution UnitDecoding UnitDecoding Unit – decodes the instruction code; – decodes the instruction code;Control UnitControl Unit – generates the control signals for – generates the control signals for the Program Control Instruction group;the Program Control Instruction group;Multiplexer UnitMultiplexer Unit – the second operand of the – the second operand of the binary instructions is multiplexed by this unit;binary instructions is multiplexed by this unit;Operating UnitOperating Unit – realizes data manipulating – realizes data manipulating operations;operations;Accumulator UnitAccumulator Unit – stores the instruction result; – stores the instruction result;Flag UnitFlag Unit – contains the two flag bits: Carry Flag – contains the two flag bits: Carry Flag (CF), and the Zero Flag (ZF) (CF), and the Zero Flag (ZF)

Page 20: CREC: A Novel  Reconfigurable Computing Design Methodology

2020

ZZFF CCFF

FFllaagg UUnniitt RReeggiisstteerr

AAccccuummuullaattoorr

SShhiifftt LLeefftt UUnniitt SSHHLL//RROOLL//NNEEGG

IINNCC//DDEECC//

SShhiifftt RRiigghhtt UUnniitt SSHHRR//RROORR//NNOOTT

LLooggiicc UUnniitt AANNDD//OORR//XXOORR

AArriitthhmmeettiicc UUnniitt AADDDD//SSUUBB

CCaarrrryy GGeenneerraattoorr

OOppeerraattiinngg UUnniitt

RReegg//DDaattaa MMUUXX

RReeggiisstteerr MMUUXX DDaattaa MMUUXX

MMuullttiipplleexxeerr UUnniitt

II mmmm

ee ddii aa

tt ee

OOpp ee

rr aann dd

LL oo aa

dd BB

uu ffff ee

rr

SS ttaa cc

kk

II nnpp uu

tt PPoo rr

tt

RR11

RR22

RRNN

CCoonnttrrooll SSiiggnnaall GGeenneerraattoorr CC

oo nntt rr o

o ll UU

nn iitt

JJ MMPP

CCAA

LL LL

RREE T

T PP UU

SS HH

PP OOPP

LL OO

AADD

SS TT

OORR

EE W

MMOO

VV SS

TT BB

OO

UUTT

RREE G

G// DD

AATT A

A RReeggiisstteerr

VVaalluuee OOppeerraanndd

VVaalluuee

IInnssttrruuccttiioonn CCooddee

CCoonnddiittiioonn GGeenneerraattoorr

CCOONNDDIITTIIOONN BBUUSS

CCOONNDDIITTIIOONN BBUUSS

IInnssttrruuccttiioonn DDeeccooddeerr

DDeeccooddiinngg UUnniitt

EEXX

EECC

UUTT

II OONN

UUNN

II TT

Page 21: CREC: A Novel  Reconfigurable Computing Design Methodology

2121

The Optimized Operating UnitThe Optimized Operating Unit

Symmetrical organization: aSymmetrical organization: at the right side are t the right side are the binary instruction blocks, and at the left side the binary instruction blocks, and at the left side are the unary operation blocks (performing are the unary operation blocks (performing operations only on the accumulator);operations only on the accumulator);The blocks use The blocks use only one levelonly one level of FPGA slices; of FPGA slices;All four subunits use the same number of slices;All four subunits use the same number of slices;Takes advantage of the Fast Carry Lines;Takes advantage of the Fast Carry Lines;The size of the The size of the Operating Unit is growing Operating Unit is growing linearlylinearly with the word length.with the word length.

Page 22: CREC: A Novel  Reconfigurable Computing Design Methodology

2222

Virtex Optimized Arithmetic UnitVirtex Optimized Arithmetic Unit

The basic 2-bit ADD/SUB cell using the Fast Carry The basic 2-bit ADD/SUB cell using the Fast Carry Lines consumes only one Xilinx VirtexE slice.Lines consumes only one Xilinx VirtexE slice.

Page 23: CREC: A Novel  Reconfigurable Computing Design Methodology

2323

Arithmetic and Logic OpcodesArithmetic and Logic OpcodesOpcodes of the arithmetic unitOpcodes of the arithmetic unit

Opcodes of the logic unitOpcodes of the logic unit

Where Where LL is the “ is the “Not LoadNot Load” and ” and SS is the “ is the “SubtractSubtract” signal ” signal

Page 24: CREC: A Novel  Reconfigurable Computing Design Methodology

2424

Virtex Optimized Shift Left UnitVirtex Optimized Shift Left Unit

The basic 2-bit SHL/ROL/NEG/INC/DEC cell using The basic 2-bit SHL/ROL/NEG/INC/DEC cell using the Fast Carry Lines consumes only one slice.the Fast Carry Lines consumes only one slice.

Page 25: CREC: A Novel  Reconfigurable Computing Design Methodology

2525

Virtex Optimized Shift Right UnitVirtex Optimized Shift Right Unit

The basic 2-bit SHR/ROR/NOT cell using the Fast The basic 2-bit SHR/ROR/NOT cell using the Fast Carry Lines consumes only one Xilinx VirtexE slice.Carry Lines consumes only one Xilinx VirtexE slice.

Page 26: CREC: A Novel  Reconfigurable Computing Design Methodology

2626

Shift Left and Right OpcodesShift Left and Right OpcodesOpcodes of the shift left unitOpcodes of the shift left unit

Opcodes of the shift right unitOpcodes of the shift right unitWhere Where SS is the “ is the “ShiftShift” and ” and DD is the “ is the “DecrementDecrement” signal” signal

Where Where SS is the “ is the “ShiftShift” and ” and NN is the “ is the “NotNot” signal” signal

Page 27: CREC: A Novel  Reconfigurable Computing Design Methodology

2727

Shift and Rotate OperationsShift and Rotate Operations

SHLSHL – Shift Left;– Shift Left;SALSAL – Shift Arithmetic Left;– Shift Arithmetic Left;ROLROL – Rotate Left;– Rotate Left;RCLRCL – Rotate through – Rotate through Carry Left.Carry Left.

SHRSHR – Shift Right;– Shift Right;SARSAR – Shift Arithmetic Right;– Shift Arithmetic Right;RORROR – Rotate Right;– Rotate Right;RCRRCR – Rotate through Carry – Rotate through Carry Right.Right.

Page 28: CREC: A Novel  Reconfigurable Computing Design Methodology

2828

Execution Unit ResourcesExecution Unit Resources

A complete Execution Unit (with all the A complete Execution Unit (with all the subunits generated) having 8-bit wide subunits generated) having 8-bit wide accumulator consumes 20 CLBs, that is accumulator consumes 20 CLBs, that is approximately 0.6% of a Xilinx Virtex600E approximately 0.6% of a Xilinx Virtex600E FPGA chip;FPGA chip;An Execution Unit with 16-bit wide register An Execution Unit with 16-bit wide register consumes 35 CLBs, that is approximately consumes 35 CLBs, that is approximately 1% of the available CLBs.1% of the available CLBs.

Page 29: CREC: A Novel  Reconfigurable Computing Design Methodology

2929

Experimental ResultsExperimental ResultsFunctional Parallel compiler;Functional Parallel compiler;Execution Units optimized for Xilinx VirtexE device;Execution Units optimized for Xilinx VirtexE device;Slice Memory and Stack Memory under test;Slice Memory and Stack Memory under test;A CREC architecture having 4 EUs with 4-bit wide A CREC architecture having 4 EUs with 4-bit wide registers occupies 4% of the CLBs and 5% of the registers occupies 4% of the CLBs and 5% of the BlockRAMs in the Virtex600E device;BlockRAMs in the Virtex600E device; A CREC architecture having 4 EUs with 16-bit wide A CREC architecture having 4 EUs with 16-bit wide registers occupies 18% of the CLBs and 20% of the registers occupies 18% of the CLBs and 20% of the BlockRAMs in the Virtex600E device; BlockRAMs in the Virtex600E device; The operating clock frequency is 100 MHz.The operating clock frequency is 100 MHz.

Page 30: CREC: A Novel  Reconfigurable Computing Design Methodology

3030

Performance evaluationPerformance evaluation

The performance indexes show how many times The performance indexes show how many times faster a given algorithm is executed on an faster a given algorithm is executed on an optimised CREC system than in the case of optimised CREC system than in the case of classical execution flowclassical execution flow

Page 31: CREC: A Novel  Reconfigurable Computing Design Methodology

3131

Conclusions and Further WorkConclusions and Further Work

Creating the possibility of writing high-level Creating the possibility of writing high-level programs for CREC;programs for CREC;Extend the functionalities of the Parallel Extend the functionalities of the Parallel Compiler, then create a C or PASCAL Compiler, then create a C or PASCAL compiler for CREC applications;compiler for CREC applications;Several variants of CREC architecturesSeveral variants of CREC architectures;;Hardware distributed computing, using the Hardware distributed computing, using the FPGA configuration over the Internet.FPGA configuration over the Internet.