Tutorial7TutorialonRISC-VDesignandVerification
KevinMcDermott-ImperasSoftwareLtd.ZdenekPrikryl-CodasipLtd.
PeterShields-UltraSoCTechnologiesLtd.
© Accellera Systems Initiative 1
TutorialonRISC-VDesignandVerification• Speakers:
– KevinMcDermott-ImperasSoftwareLtd: – ZdenekPrikryl-CodasipLtd.– PeterShields-UltraSoCTechnologiesLtd.
• RISC-VTutorialoverview1. IntroductiontoRISC-VISA&TheRISC-VFoundation:ISAFreedom&innovation2. Imperas:addingRISC-Vcustominstructionsforsoftwaredevelopment3. Codasip:hardwaredesignflowforRISC-VIPcoreandextensions4. UltraSoC:on-chipAnalyticsforSoC,andheterogeneousarchitectures
© Accellera Systems Initiative 2
IntroductiontoRISC-VISA• https://riscv.org
• RISC-V(pronounced“risk-five”)isanopen,freeISAenablinganeweraofprocessorinnovationthroughopenstandardcollaboration.Borninacademiaandresearch,RISC-VISAdeliversanewleveloffree,extensiblesoftwareandhardwarefreedomonarchitecture,pavingthewayforthenext50yearsofcomputingdesignandinnovation.
© Accellera Systems Initiative 3
RISC-VBackground• In2010,aftermanyyearsandmanyprojectsusingMIPS,SPARC,andx86asbasisofresearch,itwastimefortheComputerScienceteamatUCBerkeleytolookatwhatISAstousefortheirnextsetofprojects
• Obviouschoices:x86andARM– x86impossible–toocomplex,IPissues– ARMmostlyimpossible–complex,IPissues
• SoUCBerkeleystarted“3-monthproject”duringthesummerof2010todeveloptheirownclean-slateISA
© Accellera Systems Initiative 4
RISC-VBackground(cont’d)• Fouryearslater,inMayof2014,UCBerkeleyreleasedfrozenbaseuserspec– manytapeoutsandseveralresearchpublicationsalongtheway
• ThenameRISC-V(pronouncedrisk-five),waschosentorepresentthefifthmajorRISCISAdesigneffortatUCBerkeley– RISC-I,RISC-II,SOAR,andSPURwerethefirstfourprojectswiththeoriginalRISC-Ipublicationsdatingbackto1981
• InAugust2015,articlesofincorporationwerefiledtocreateanon-profitRISC-VFoundationtogoverntheISA
© Accellera Systems Initiative 5
What’sDifferentaboutRISC-V?
© Accellera Systems Initiative 6
• Simple– FarsmallerthanothercommercialISAs
• Clean-slatedesign– ClearseparationbetweenuserandprivilegedISA– Avoidsµarchitectureortechnology-dependentfeatures
• AmodularISA– SmallstandardbaseISA– Multiplestandardextensions
• Designedforextensibility/specialization– Variable-lengthinstructionencoding– Vastopcodespaceavailableforinstruction-setextensions
• Stable– Baseandstandardextensionsarefrozen– Additionsviaoptionalextensions,notnewversions
RISC-VBasePlusStandardExtensions
© Accellera Systems Initiative 7
• FourbaseintegerISAs– RV32E,RV32I,RV64I,RV128I– Only<50hardwareinstructionsneededforbase
• Standardextensions– M:Integermultiply/divide– A:Atomicmemoryoperations(AMOs+LR/SC)– F:Single-precisionfloating-point– D:Double-precisionfloating-point– G=IMAFD,“General-purpose”ISA– Q:Quad-precisionfloating-point– C:compressed16bencodingsfor32binstructions
• AlltheaboveareafairlystandardRISCencodinginafixed32-bitinstructionformat
RISC-V Reference Card ④ Optional Compressed Instructions: RVC
Category Name Fmt RV{32|64|128)I Base Fmt RV mnemonic Fmt RV{F|D|Q} (HP/SP,DP,QP) Category Name Fmt RVCLoads Load Byte I LB rd,rs1,imm R CSRRW rd,csr,rs1 I FL{W,D,Q} rd,rs1,imm Loads Load Word CL C.LW rd′,rs1′,imm
Load Halfword I LH rd,rs1,imm R CSRRS rd,csr,rs1 S FS{W,D,Q} rs1,rs2,imm Load Word SP CI C.LWSP rd,imm
Load Word I L{W|D|Q} rd,rs1,imm R CSRRC rd,csr,rs1 R FADD.{S|D|Q} rd,rs1,rs2 Load Double CL C.LD rd′,rs1′,imm Load Byte Unsigned I LBU rd,rs1,imm R CSRRWI rd,csr,imm R FSUB.{S|D|Q} rd,rs1,rs2 Load Double SP CI C.LWSP rd,imm
Load Half Unsigned I L{H|W|D}U rd,rs1,imm R CSRRSI rd,csr,imm R FMUL.{S|D|Q} rd,rs1,rs2 Load Quad CL C.LQ rd′,rs1′,immStores Store Byte S SB rs1,rs2,imm R CSRRCI rd,csr,imm R FDIV.{S|D|Q} rd,rs1,rs2 Load Quad SP CI C.LQSP rd,imm
Store Halfword S SH rs1,rs2,imm Change Level Env. Call R ECALL R FSQRT.{S|D|Q} rd,rs1 Load Byte Unsigned CL C.LBU rd′,rs1′,immStore Word S S{W|D|Q} rs1,rs2,imm R EBREAK R FMADD.{S|D|Q} rd,rs1,rs2,rs3 Float Load Word CL C.FLW rd′,rs1′,imm
Shifts Shift Left R SLL{|W|D} rd,rs1,rs2 R ERET R FMSUB.{S|D|Q} rd,rs1,rs2,rs3 Float Load Double CL C.FLD rd′,rs1′,imm Shift Left Immediate I SLLI{|W|D} rd,rs1,shamt R MRTS R FMNSUB.{S|D|Q} rd,rs1,rs2,rs3 Float Load Word SP CI C.FLWSP rd,imm
Shift Right R SRL{|W|D} rd,rs1,rs2 R MRTH R FMNADD.{S|D|Q} rd,rs1,rs2,rs3 Float Load Double SP CI C.FLDSP rd,imm
Shift Right Immediate I SRLI{|W|D} rd,rs1,shamt R HRTS R FSGNJ.{S|D|Q} rd,rs1,rs2 Stores Store Word CS C.SW rs1′,rs2′,imm Shift Right Arithmetic R SRA{|W|D} rd,rs1,rs2 Interrupt Wait for Interrupt R WFI R FSGNJN.{S|D|Q} rd,rs1,rs2 Store Word SP CSS C.SWSP rs2,imm Shift Right Arith Imm I SRAI{|W|D} rd,rs1,shamt MMU Supervisor FENCE R SFENCE.VM rs1 R FSGNJX.{S|D|Q} rd,rs1,rs2 Store Double CS C.SD rs1′,rs2′,imm
Arithmetic ADD R ADD{|W|D} rd,rs1,rs2 R FMIN.{S|D|Q} rd,rs1,rs2 Store Double SP CSS C.SDSP rs2,imm
ADD Immediate I ADDI{|W|D} rd,rs1,imm Category Name Fmt R FMAX.{S|D|Q} rd,rs1,rs2 Store Quad CS C.SQ rs1′,rs2′,imm SUBtract R SUB{|W|D} rd,rs1,rs2 Multiply MULtiply R R FEQ.{S|D|Q} rd,rs1,rs2 Store Quad SP CSS C.SQSP rs2,imm
Load Upper Imm U LUI rd,imm MULtiply upper Half R R FLT.{S|D|Q} rd,rs1,rs2 Float Store Word CSS C.FSW rd′,rs1′,imm Add Upper Imm to PC U AUIPC rd,imm MULtiply Half Sign/Uns R R FLE.{S|D|Q} rd,rs1,rs2 Float Store Double CSS C.FSD rd′,rs1′,imm
Logical XOR R XOR rd,rs1,rs2 MULtiply upper Half Uns R R FCLASS.{S|D|Q} rd,rs1 Float Store Word SP CSS C.FSWSP rd,imm
XOR Immediate I XORI rd,rs1,imm Divide DIVide R R FMV.S.X rd,rs1 Float Store Double SP CSS C.FSDSP rd,imm
OR R OR rd,rs1,rs2 DIVide Unsigned R R FMV.X.S rd,rs1 Arithmetic ADD CR C.ADD rd,rs1
OR Immediate I ORI rd,rs1,imm RemainderREMainder R R FCVT.{S|D|Q}.W rd,rs1 ADD Word CR C.ADDW rd',rs2'AND R AND rd,rs1,rs2 REMainder Unsigned R R FCVT.{S|D|Q}.WU rd,rs1 ADD Immediate CI C.ADDI rd,imm
AND Immediate I ANDI rd,rs1,imm R FCVT.W.{S|D|Q} rd,rs1 ADD Word Imm CI C.ADDIW rd,imm
Compare Set < R SLT rd,rs1,rs2 Category Name Fmt R FCVT.WU.{S|D|Q} rd,rs1 ADD SP Imm * 16 CI C.ADDI16SP x0,imm
Set < Immediate I SLTI rd,rs1,imm Load Load Reserved R LR.{W|D|Q} rd,rs1 R FRCSR rd ADD SP Imm * 4 CIW C.ADDI4SPN rd',imm
Set < Unsigned R SLTU rd,rs1,rs2 Store Store Conditional R SC.{W|D|Q} rd,rs1,rs2 R FRRM rd Load Immediate CI C.LI rd,imm
Set < Imm Unsigned I SLTIU rd,rs1,imm Swap SWAP R AMOSWAP.{W|D|Q} rd,rs1,rs2 R FRFLAGS rd Load Upper Imm CI C.LUI rd,imm
Branches Branch = SB BEQ rs1,rs2,imm Add ADD R AMOADD.{W|D|Q} rd,rs1,rs2 R FSCSR rd,rs1 MoVe CR C.MV rd,rs1
Branch ≠ SB BNE rs1,rs2,imm Logical XOR R AMOXOR.{W|D|Q} rd,rs1,rs2 R FSRM rd,rs1 SUB CR C.SUB rd',rs2'
Branch < SB BLT rs1,rs2,imm AND R AMOAND.{W|D|Q} rd,rs1,rs2 R FSFLAGS rd,rs1 SUB Word CR C.SUBW rd',rs2'
Branch ≥ SB BGE rs1,rs2,imm OR R AMOOR.{W|D|Q} rd,rs1,rs2 I FSRMI rd,imm Logical XOR CS C.XOR rd',rs2' Branch < Unsigned SB BLTU rs1,rs2,imm Min/Max MINimum R AMOMIN.{W|D|Q} rd,rs1,rs2 I FSFLAGSI rd,imm OR CS C.OR rd',rs2'
Branch ≥ Unsigned SB BGEU rs1,rs2,imm MAXimum R AMOMAX.{W|D|Q} rd,rs1,rs2 AND CS C.AND rd',rs2'
Jump & Link J&L UJ JAL rd,imm MINimum Unsigned R AMOMINU.{W|D|Q} rd,rs1,rs2 Category Name Fmt RV{F|D|Q} (HP/SP,DP,QP) AND Immediate CB C.ANDI rd',rs2'
Jump & Link Register I JALR rd,rs1,imm MAXimum Unsigned R AMOMAXU.{W|D|Q} rd,rs1,rs2 R FMV.{D|Q}.X rd,rs1 Shifts Shift Left Imm CI C.SLLI rd,imm
Synch Synch thread I FENCE R FMV.X.{D|Q} rd,rs1 Shift Right Immediate CB C.SRLI rd',imm
Synch Instr & Data I FENCE.I R FCVT.{S|D|Q}.{L|T} rd,rs1 Shift Right Arith Imm CB C.SRAI rd',imm
System System CALL I SCALL R FCVT.{S|D|Q}.{L|T}U rd,rs1 Branches Branch=0 CB C.BEQZ rs1′,imm System BREAK I SBREAK 16-bit (RVC) and 32-bit Instruction Formats R FCVT.{L|T}.{S|D|Q} rd,rs1 Branch≠0 CB C.BNEZ rs1′,imm
Counters ReaD CYCLE I RDCYCLE rd R FCVT.{L|T}U.{S|D|Q} rd,rs1 Jump Jump CJ C.J imm
ReaD CYCLE upper Half I RDCYCLEH rd CI Jump Register CR C.JR rd,rs1
ReaD TIME I RDTIME rd CSS R Jump & Link J&L CJ C.JAL imm
ReaD TIME upper Half I RDTIMEH rd CIW I Jump & Link Register CR C.JALR rs1
ReaD INSTR RETired I RDINSTRET rd CL S System Env. BREAK CI C.EBREAK
ReaD INSTR upper Half I RDINSTRETH rd CS SBCB UCJ UJ
Category Name
Convert to Int Unsigned
Swap Rounding Mode ImmSwap Flags Imm
3 Optional FP Extensions: RV{64|128}{F|D|Q}
Move Move from IntegerMove to Integer
Convert Convert from IntConvert from Int Unsigned
Convert to Int
Configuration Read StatRead Rounding Mode
Read FlagsSwap Status Reg
Swap Rounding ModeSwap Flags
REMU{|W|D} rd,rs1,rs2 Convert from Int UnsignedOptional Atomic Instruction Extension: RVA Convert to Int
RV{32|64|128}A (Atomic) Convert to Int Unsigned
DIV{|W|D} rd,rs1,rs2 Move Move from IntegerDIVU rd,rs1,rs2 Move to IntegerREM{|W|D} rd,rs1,rs2 Convert Convert from Int
MULH rd,rs1,rs2 Compare Float <MULHSU rd,rs1,rs2 Compare Float ≤MULHU rd,rs1,rs2 Categorize Classify Type
Optional Multiply-Divide Extension: RV32M Min/Max MINimumRV32M (Mult-Div) MAXimum
MUL{|W|D} rd,rs1,rs2 Compare Compare Float =
Redirect Trap to Hypervisor Negative Multiply-ADDHypervisor Trap to Supervisor Sign Inject SiGN source
Negative SiGN sourceXor SiGN source
Environment Breakpoint Mul-Add Multiply-ADDEnvironment Return Multiply-SUBtract
Trap Redirect to Supervisor Negative Multiply-SUBtract
SUBtractAtomic Read & Set Bit Imm MULtiply
Atomic Read & Clear Bit Imm DIVideSQuare RooT
Category NameCSR Access Atomic R/W Load Load Atomic Read & Set Bit Store Store Atomic Read & Clear Bit Arithmetic ADD
Atomic R/W Imm
① ② ③Base Integer Instructions (32|64|128) RV Privileged Instructions (32|64|128) 3 Optional FP Extensions: RV32{F|D|Q}
RV32I / RV64I / RV128I + M, A, F, D, Q, C
+14 Privileged
+ 8 for M
+ 11 for A
+ 34 for F, D, Q + 46 for C
8
© Accellera Systems Initiative
RISC-V Reference Card ④ Optional Compressed Instructions: RVC
Category Name Fmt RV{32|64|128)I Base Fmt RV mnemonic Fmt RV{F|D|Q} (HP/SP,DP,QP) Category Name Fmt RVCLoads Load Byte I LB rd,rs1,imm R CSRRW rd,csr,rs1 I FL{W,D,Q} rd,rs1,imm Loads Load Word CL C.LW rd′,rs1′,imm
Load Halfword I LH rd,rs1,imm R CSRRS rd,csr,rs1 S FS{W,D,Q} rs1,rs2,imm Load Word SP CI C.LWSP rd,imm
Load Word I L{W|D|Q} rd,rs1,imm R CSRRC rd,csr,rs1 R FADD.{S|D|Q} rd,rs1,rs2 Load Double CL C.LD rd′,rs1′,imm Load Byte Unsigned I LBU rd,rs1,imm R CSRRWI rd,csr,imm R FSUB.{S|D|Q} rd,rs1,rs2 Load Double SP CI C.LWSP rd,imm
Load Half Unsigned I L{H|W|D}U rd,rs1,imm R CSRRSI rd,csr,imm R FMUL.{S|D|Q} rd,rs1,rs2 Load Quad CL C.LQ rd′,rs1′,immStores Store Byte S SB rs1,rs2,imm R CSRRCI rd,csr,imm R FDIV.{S|D|Q} rd,rs1,rs2 Load Quad SP CI C.LQSP rd,imm
Store Halfword S SH rs1,rs2,imm Change Level Env. Call R ECALL R FSQRT.{S|D|Q} rd,rs1 Load Byte Unsigned CL C.LBU rd′,rs1′,immStore Word S S{W|D|Q} rs1,rs2,imm R EBREAK R FMADD.{S|D|Q} rd,rs1,rs2,rs3 Float Load Word CL C.FLW rd′,rs1′,imm
Shifts Shift Left R SLL{|W|D} rd,rs1,rs2 R ERET R FMSUB.{S|D|Q} rd,rs1,rs2,rs3 Float Load Double CL C.FLD rd′,rs1′,imm Shift Left Immediate I SLLI{|W|D} rd,rs1,shamt R MRTS R FMNSUB.{S|D|Q} rd,rs1,rs2,rs3 Float Load Word SP CI C.FLWSP rd,imm
Shift Right R SRL{|W|D} rd,rs1,rs2 R MRTH R FMNADD.{S|D|Q} rd,rs1,rs2,rs3 Float Load Double SP CI C.FLDSP rd,imm
Shift Right Immediate I SRLI{|W|D} rd,rs1,shamt R HRTS R FSGNJ.{S|D|Q} rd,rs1,rs2 Stores Store Word CS C.SW rs1′,rs2′,imm Shift Right Arithmetic R SRA{|W|D} rd,rs1,rs2 Interrupt Wait for Interrupt R WFI R FSGNJN.{S|D|Q} rd,rs1,rs2 Store Word SP CSS C.SWSP rs2,imm Shift Right Arith Imm I SRAI{|W|D} rd,rs1,shamt MMU Supervisor FENCE R SFENCE.VM rs1 R FSGNJX.{S|D|Q} rd,rs1,rs2 Store Double CS C.SD rs1′,rs2′,imm
Arithmetic ADD R ADD{|W|D} rd,rs1,rs2 R FMIN.{S|D|Q} rd,rs1,rs2 Store Double SP CSS C.SDSP rs2,imm
ADD Immediate I ADDI{|W|D} rd,rs1,imm Category Name Fmt R FMAX.{S|D|Q} rd,rs1,rs2 Store Quad CS C.SQ rs1′,rs2′,imm SUBtract R SUB{|W|D} rd,rs1,rs2 Multiply MULtiply R R FEQ.{S|D|Q} rd,rs1,rs2 Store Quad SP CSS C.SQSP rs2,imm
Load Upper Imm U LUI rd,imm MULtiply upper Half R R FLT.{S|D|Q} rd,rs1,rs2 Float Store Word CSS C.FSW rd′,rs1′,imm Add Upper Imm to PC U AUIPC rd,imm MULtiply Half Sign/Uns R R FLE.{S|D|Q} rd,rs1,rs2 Float Store Double CSS C.FSD rd′,rs1′,imm
Logical XOR R XOR rd,rs1,rs2 MULtiply upper Half Uns R R FCLASS.{S|D|Q} rd,rs1 Float Store Word SP CSS C.FSWSP rd,imm
XOR Immediate I XORI rd,rs1,imm Divide DIVide R R FMV.S.X rd,rs1 Float Store Double SP CSS C.FSDSP rd,imm
OR R OR rd,rs1,rs2 DIVide Unsigned R R FMV.X.S rd,rs1 Arithmetic ADD CR C.ADD rd,rs1
OR Immediate I ORI rd,rs1,imm RemainderREMainder R R FCVT.{S|D|Q}.W rd,rs1 ADD Word CR C.ADDW rd',rs2'AND R AND rd,rs1,rs2 REMainder Unsigned R R FCVT.{S|D|Q}.WU rd,rs1 ADD Immediate CI C.ADDI rd,imm
AND Immediate I ANDI rd,rs1,imm R FCVT.W.{S|D|Q} rd,rs1 ADD Word Imm CI C.ADDIW rd,imm
Compare Set < R SLT rd,rs1,rs2 Category Name Fmt R FCVT.WU.{S|D|Q} rd,rs1 ADD SP Imm * 16 CI C.ADDI16SP x0,imm
Set < Immediate I SLTI rd,rs1,imm Load Load Reserved R LR.{W|D|Q} rd,rs1 R FRCSR rd ADD SP Imm * 4 CIW C.ADDI4SPN rd',imm
Set < Unsigned R SLTU rd,rs1,rs2 Store Store Conditional R SC.{W|D|Q} rd,rs1,rs2 R FRRM rd Load Immediate CI C.LI rd,imm
Set < Imm Unsigned I SLTIU rd,rs1,imm Swap SWAP R AMOSWAP.{W|D|Q} rd,rs1,rs2 R FRFLAGS rd Load Upper Imm CI C.LUI rd,imm
Branches Branch = SB BEQ rs1,rs2,imm Add ADD R AMOADD.{W|D|Q} rd,rs1,rs2 R FSCSR rd,rs1 MoVe CR C.MV rd,rs1
Branch ≠ SB BNE rs1,rs2,imm Logical XOR R AMOXOR.{W|D|Q} rd,rs1,rs2 R FSRM rd,rs1 SUB CR C.SUB rd',rs2'
Branch < SB BLT rs1,rs2,imm AND R AMOAND.{W|D|Q} rd,rs1,rs2 R FSFLAGS rd,rs1 SUB Word CR C.SUBW rd',rs2'
Branch ≥ SB BGE rs1,rs2,imm OR R AMOOR.{W|D|Q} rd,rs1,rs2 I FSRMI rd,imm Logical XOR CS C.XOR rd',rs2' Branch < Unsigned SB BLTU rs1,rs2,imm Min/Max MINimum R AMOMIN.{W|D|Q} rd,rs1,rs2 I FSFLAGSI rd,imm OR CS C.OR rd',rs2'
Branch ≥ Unsigned SB BGEU rs1,rs2,imm MAXimum R AMOMAX.{W|D|Q} rd,rs1,rs2 AND CS C.AND rd',rs2'
Jump & Link J&L UJ JAL rd,imm MINimum Unsigned R AMOMINU.{W|D|Q} rd,rs1,rs2 Fmt RV{F|D|Q} (HP/SP,DP,QP) AND Immediate CB C.ANDI rd',rs2'
Jump & Link Register I JALR rd,rs1,imm MAXimum Unsigned R AMOMAXU.{W|D|Q} rd,rs1,rs2 R FMV.{D|Q}.X rd,rs1 Shifts Shift Left Imm CI C.SLLI rd,imm
Synch Synch thread I FENCE R FMV.X.{D|Q} rd,rs1 Shift Right Immediate CB C.SRLI rd',imm
Synch Instr & Data I FENCE.I R FCVT.{S|D|Q}.{L|T} rd,rs1 Shift Right Arith Imm CB C.SRAI rd',imm
System System CALL I SCALL R FCVT.{S|D|Q}.{L|T}U rd,rs1 Branches Branch=0 CB C.BEQZ rs1′,imm System BREAK I SBREAK 16-bit (RVC) and 32-bit Instruction Formats R FCVT.{L|T}.{S|D|Q} rd,rs1 Branch≠0 CB C.BNEZ rs1′,imm
Counters ReaD CYCLE I RDCYCLE rd R FCVT.{L|T}U.{S|D|Q} rd,rs1 Jump Jump CJ C.J imm
ReaD CYCLE upper Half I RDCYCLEH rd CI Jump Register CR C.JR rd,rs1
ReaD TIME I RDTIME rd CSS R Jump & Link J&L CJ C.JAL imm
ReaD TIME upper Half I RDTIMEH rd CIW I Jump & Link Register CR C.JALR rs1
ReaD INSTR RETired I RDINSTRET rd CL S System Env. BREAK CI C.EBREAK
ReaD INSTR upper Half I RDINSTRETH rd CS SBCB UCJ UJ
Category Name
Category Name
Convert to Int Unsigned
Swap Rounding Mode ImmSwap Flags Imm
3 Optional FP Extensions: RV{64|128}{F|D|Q}
Move Move from IntegerMove to Integer
Convert Convert from IntConvert from Int Unsigned
Convert to Int
Configuration Read StatRead Rounding Mode
Read FlagsSwap Status Reg
Swap Rounding ModeSwap Flags
REMU{|W|D} rd,rs1,rs2 Convert from Int Unsigned
Optional Atomic Instruction Extension: RVA Convert to IntRV{32|64|128}A (Atomic) Convert to Int Unsigned
DIV{|W|D} rd,rs1,rs2 Move Move from IntegerDIVU rd,rs1,rs2 Move to IntegerREM{|W|D} rd,rs1,rs2 Convert Convert from Int
MULH rd,rs1,rs2 Compare Float <MULHSU rd,rs1,rs2 Compare Float ≤MULHU rd,rs1,rs2 Categorize Classify Type
Optional Multiply-Divide Extension: RV32M Min/Max MINimumRV32M (Mult-Div) MAXimum
MUL{|W|D} rd,rs1,rs2 Compare Compare Float =
Redirect Trap to Hypervisor Negative Multiply-ADDHypervisor Trap to Supervisor Sign Inject SiGN source
Negative SiGN sourceXor SiGN source
Environment Breakpoint Mul-Add Multiply-ADDEnvironment Return Multiply-SUBtract
Trap Redirect to Supervisor Negative Multiply-SUBtract
SUBtractAtomic Read & Set Bit Imm MULtiply
Atomic Read & Clear Bit Imm DIVideSQuare RooT
Category NameCSR Access Atomic R/W Load Load Atomic Read & Set Bit Store Store Atomic Read & Clear Bit Arithmetic ADD
Atomic R/W Imm
① ② ③Base Integer Instructions (32|64|128) RV Privileged Instructions (32|64|128) 3 Optional FP Extensions: RV32{F|D|Q}
+ 4 for 64M/128M
RV32I / RV64I / RV128I + M, A, F, D, Q, C
+ 12 for 64I/128I
+ 11 for 64A/128A
+ 6 for 64{F|D|Q}/ 128{F|D|Q}
9
© Accellera Systems Initiative
RISC-VinEducation,newbooks!
© Accellera Systems Initiative 10
RISC-VFoundationOverview• IncorporatedAugust,2015asa501c6non-profitFoundation• MembershipAgreement&BylawsratifiedDecember2016• TheRISC-VISAandrelatedstandardsshallremainopenandlicense-freetoall
parties– RISC-VISAspecificationsshallalwaysbepubliclyavailableasanonlinedownload
• Thecompliancetestsuitesshallalwaysbepubliclyavailableasasourcecodedownload
• Toprotectthestandard,onlymembers(withcommercialRISC-Vproducts)oftheFoundationingoodstandingcanuse“RISC-V”andassociatedtrademarks,andonlyfordevicesthatpassthetestsintheopen-sourcecompliancesuitesmaintainedbytheFoundation
© Accellera Systems Initiative 11
FoundationOrganization• TheBoardofDirectorsconsistsofseven+members,whosereplacementsare
electedbythemembership• TheBoardcanamendtheBy-LawsoftheRISC-Vfoundationviaatwo-thirds
affirmativevote• TheBoardappointschairsofad-hoccommitteestoaddressissuesconcerning
RISC-V,andhasthefinalvoteofapprovaloftherecommendationofthead-hoccommittees.– TechnicalCommitteeChair–YunsupLee,SiFive– SecurityStandingCommitteeChair-HelenaHandschuh,Rambus– MarketingCommitteeChair–TedMarena,WesternDigital
• AllmembersofcommitteesmustbemembersoftheRISC-VFoundation
© Accellera Systems Initiative 12
RISC-VISA&FoundationSummary• ThefreeandopenRISC-VISAisenablinganewinnovationfrontierforallcomputingdevices
• StrongIndustrySupport– 150+members;Broadcommercialandacademicinterest
• RISC-VSummitregistrationisopen– December3-62018,SantaClara,CA
• RISC-VWorkshop– June2019,Zurich–furtherdetailstobeannouncedsoon
© Accellera Systems Initiative 13
RISC-VextendedwithCustomInstructions,VirtualPlatformforDesignandVerification
DuncanGraham,Sr.ApplicationsEngineerKevinMcDermott,VPMarketing
© Accellera Systems Initiative 14
ImperasIntroduction• Focusonsimulation,modeling,toolsforembeddedsoftware
– SoCdesignersthatneedearlysoftwaredevelopmentfordesignandtest– Softwaredevelopersthatneedplatformsbeforehardwareisavailable
• Founded2008,basedinThame,UK• ManagementBackground:Verilog,VCS,Verisity,Exemplar,Arm,MIPS• Businessmodelbasedontoolsandecosystempartnerships• ActivemembersofRISC-VFoundation
– TechnicalTaskGroups:vector,bitmanip,compliance
© Accellera Systems Initiative 15
MostadoptersofRISC-Vwanttoaddtheirowncustomextensioninstructions
• TraditionalISAchoicehasbeenhardifyouwanttoaddyourowncustomprocessorinstructionstoanISA
• RISC-Vasanopenstandardhasspecificregionsofinstructiondecodespacespecificallyallocatedforuserstoaddtheirowninstructions
• Therearemultipleissueswiththetoolsthatwillbeneeded…– Youneedtoevaluateeffectivenessandperformancegainsofnewinstructions– Challengeofhowtoefficientlyandsafelyaddnewinstructionstoexistingqualitysimulationmodels– Needtobeabletotrace,debugandanalyseapplicationsusingthenewinstructions– CompleteSoCtoolscoveringheterogeneous,multi-coreandmany-corecomputeconfigurations– Oftenneedtoprovidetodevelopers&customersmodels/platformswithoutissuesofGPLlicenses
© Accellera Systems Initiative 16
Inthistutorial…
© Accellera Systems Initiative 17
• SimulatorISSofRISC-Vwhichincludesthemodel+memory– JustliketheISSasusedintheRISCV.orgComplianceTaskGroupGitHubrepository
– ThoughinthistutorialweusetheImperasprofessionalsimulatorwhichallowsmodelandtoolextensions
§ Applicationsoftwareischaracterstreamencoder,basedonChaCha20encryptionalgorithm§ InstructionExtensionstoRISC-VcourtesyofCerberusSecurityLaboratoriesLtd§ https://cerberus-laboratories.com
• Tutorialwillshowaddingextensionforthemodelandanalysisincludingtimingestimation
Semihosted File I/O
RISC-V CPU model variant selection and
configuration
ISS (cpu+memory)
Application <cross>.elf
GDB Debugger
Flow to add new custom instructions
© Accellera Systems Initiative 18
• Instruction Accurate Simulation • Trace / Debug • Timing Simulation • Function Timing / Profiling
Characterize C Application
Flow to add new custom instructions
© Accellera Systems Initiative 19
• Instruction Accurate Simulation • Trace / Debug • Timing Simulation • Function Timing / Profiling
Characterize C Application • Design Instructions
• Add to Application • Add to Model • Add Timing
Develop New Custom Instructions
Flow to add new custom instructions
© Accellera Systems Initiative 20
• Instruction Accurate Simulation • Trace / Debug • Timing Simulation • Function Timing / Profiling
Characterize C Application • Design Instructions
• Add to Application • Add to Model • Add Timing
Develop New Custom Instructions
• Instruction Accurate Simulation • Trace / Debug • Timing Simulation • Function Timing / Profiling
Characterize New Instructions in Application
Flow to add new custom instructions
© Accellera Systems Initiative 21
• Instruction Coverage • Line Coverage • Instruction Performance • Generate PDF model doc
Optimize & Document model
• Instruction Accurate Simulation • Trace / Debug • Timing Simulation • Function Timing / Profiling
Characterize C Application • Design Instructions
• Add to Application • Add to Model • Add Timing
Develop New Custom Instructions
• Instruction Accurate Simulation • Trace / Debug • Timing Simulation • Function Timing / Profiling
Characterize New Instructions in Application
Flow to add new custom instructions
© Accellera Systems Initiative 22
• Check RISC-V Compliance • Use as reference for RTL Design Verification • Use in Imperas/OVP Platforms, EPKs
• Heterogeneous / Homogeneous • Multi-core, Many-core
• Imperas Multi-Processor Debug, VAP tools • Port OS, RTOS (Linux, FreeRTOS…) • Use in many simulation envs (inc. SystemC) • Deliver to end users
Release & Deploy
• Instruction Accurate Simulation • Trace / Debug • Timing Simulation • Function Timing / Profiling
Characterize C Application • Design Instructions
• Add to Application • Add to Model • Add Timing
Develop New Custom Instructions
• Instruction Accurate Simulation • Trace / Debug • Timing Simulation • Function Timing / Profiling
Characterize New Instructions in Application
• Instruction Coverage • Line Coverage • Instruction Performance • Generate PDF model doc
Optimize & Document model
ChecklistandTasks• InstructionAccuratesimulationofCapplication• CycleApproximatesimulationofCapplication• ProfiletheCapplication• Addcustominstructionstoapplication• Addcustominstructionstomodel• CycleApproximatesimulationincludingcustominstructions• Profilecustominstructionsapplication• Tracecustominstructions• Debugcustominstructions• Documentingcustominstructions• Furthertoolsformodeldevelopers
© Accellera Systems Initiative 23
InstructionAccuratesimulationCapplication
© Accellera Systems Initiative 24
• CrosscompiledCapplicationtargetingRV32IM– Characterstreamencoder,withChaCha20encryption
algorithm• IAsimulation
– ImperasRISC-VISSwithconfigurablemodelofRISC-VspecificationselectingRV32IM
• Semihosting– Enablesbaremetalapplicationtoverysimplyaccess
hostI/O
Ø runsfast– Over1billioninstructionsasecond(standardPC)
• LinuxandWindowssupportedhostOS
ProfileCApplication
© Accellera Systems Initiative 25
• SameCapplication• IAsimulation+timingannotation• Withsampledprofilingwithcallstackanalysis
Ø Showscycleapproximatetimingofeachapplicationfunction
CycleApproximatesimulationincludingcustominstructions
© Accellera Systems Initiative 26
• IAsimulation+timingannotation+custominstructions– IncludestimingestimationforRV32IMprocessor– Needtoaddtimingestimationfornewcustom
instructions• SimulateusingCcodeapplicationwithinline
assemblerofcustomextensions• IAsimulator+timingtool+customextension
instructionlibrary
Ø Seeestimatedimprovementinthroughputofapplicationonnewprocessor
Profilecustominstructionsapplication
© Accellera Systems Initiative 27
• IAsimulation+timingannotation+custominstructionswithsampledprofiling
Ø Showswhereslowestfunctionis– Nowmuchfaster…
Ø Showsbenefitsofusingcustominstructions
Debug&Tracecustominstructions
© Accellera Systems Initiative 28
• ImperasMPDisEclipsebasedsourcecodedebugtool
• Candebugusingsourcelineorinstructionlevel
• Seenewcustominstructionsandanynewadditionalstateregisters
Documentcustominstructions
© Accellera Systems Initiative 29
• ImperastoolsautomaticallygenerateaprocessormodeldocumentPDF
• Includesallbasemodelregistersandanynewregisters
• Providesdetaileddocumentationofnewcustominstructions
Furthertoolsformodeldevelopers
© Accellera Systems Initiative 30
• Modelsourcelinecoverage– Toseehowcompletelythetestsexercisethemodel
CheckingRISC-Vcompliance• ImperasISScanbeusedtocheckRISC-Vspecificationcompliance• ImperasISSisusedintheRISCV.orgComplianceWorkingGroup’scompliancetestsisaversionoftheImperasISS
• WithyournewcustominstructionsmodelledyoucanruntheRISC-VFoundation’scompliancesuiteandensurethatyourprocessorisstillRISC-Vcompliant– TheofficialRISC-VCompliancesuiteisavailableat
• ComplianceTaskGroupGitHubrepositoryhttps://github.com/riscv/riscv-compliance
© Accellera Systems Initiative 31
Summary• Ifyouareaddingnewinstructionsorstatetoaprocessor–thenthetoolsand
designflowwillneedtocover:– Modelling,simulation,timing,tracing,debug,coverage,andprofilingofnewinstructions
– Documentation,andchecklistmethodology/solution• LeveragethetechnicalworkinggroupsoftheRISC-VFoundation
– ComplianceWorkingGroupGitHubrepositoryisausefulstartingpoint– EnsuredesignComplianceandfollowsimilarmethodologyonthenewinstructions
• Softwareteamscanstartportingsoftwaretothenewmodel/instructions• VirtualPlatformsalsoprovideabasisforearlysoftwaredevelopmentat
customersandpartnersaspre-salesevaluationanddevelopment
© Accellera Systems Initiative 32
ContactsandLinks• https://riscv.org
• Visitwww.imperas.comandwww.OVPworld.orgformoreinformation
• RISC-VFoundationComplianceSuite&riscvOVPsimdownloadhttps://github.com/riscv/riscv-compliance
• DuncanGraham,[email protected]• KevinMcDermott,[email protected]
© Accellera Systems Initiative 33
Questions
Finalizeslidesetwithquestionsslide
© Accellera Systems Initiative 34
RISC-VConfigurationandCustomization
ZdeněkPřikryl,ChrisJones
35 © Accellera Systems Initiative
WhoIsCodasip• LeadingproviderofRISC-VprocessorIP
– IntroduceditsfirstRISC-VprocessorinNovember2015– OffersitsownportfolioofRISC-Vprocessors(CodasipBk)– ProvidesuniquedesignautomationtoolsforeasymodificationofRISC-Vprocessors
• FoundingmemberofRISC-VFoundation(www.riscv.org)– MemberofseveralworkinggroupswithintheFoundation
• ActivecontributortoLLVMandotheropen-sourceprojects
© Accellera Systems Initiative 36 © Codasip Ltd.
ConfigurationvsCustomization• RISC-VoffersawiderangeofISAextensions:
– I/Eforintegerinstructions– Mformultiplicationanddivision– Cforcompactinstruction– WIP:B,P,V,…– andothers
• Configuration:SelectingmultipleISAextensions– Enabledbysomevendorsoropen-sourceprojects– Stillinsufficientforsomeapplicationdomains
© Accellera Systems Initiative 37 © Codasip Ltd.
CustomerUseCase
© Accellera Systems Initiative 38
Configuration ClockCycles1 Codesize2 SpeedupAgainstBase
Area(Gates)3 AreaExpansionAgainstBase
Base 1,764,256 232 16.0k
Base+SerialMultiplier 427,561 148 4.12x 19.7k 1.24x
Base+ParallelMultiplier 133,061 148 13.26x 26.2k 1.64x
Base+DSPExtensions 31,371 64 56.24x 38.7k 2.43x
Desig
nIte
ratio
ns
© Codasip Ltd.
1Fewerclockcycles→samesoftwaretakeslesstimetorun.2Smallercodesize(optimizedsoftware)→lessmemorysavesmoney.3Moregatesinadvancedcores→highercost.Here,only2xareaincreaseprovides50xperformancegain.
Performanceimprovementthroughhigh-leveloptimization:FIRimplementationinCwith20016-bitinputsamplesand1616-bitcoefficients
StandardApproachtoCustomization• ManuallyaddingnewinstructionstotheRISC-VISA:
– Modelandsimulatetheinstruction– Modifythecompiler– Addsupportinthedebugger– WriteVerilogtoimplementinhardware– Verify,verify,verify,…
• Challengingandtime-consuming• Automationdesirableforeachstepabove
© Accellera Systems Initiative 39 © Codasip Ltd.
AutomatizationApproachtoCustomization
© Accellera Systems Initiative 40
ProcessorModeling Softwareanalysis SDKSynthesis RTLSynthesis Verification
ImplementationModel
RTLModelsCASimulator,Profiler,Debugger
Application(s)/Programs(s)
C/C++Compiler
Assembler
Linker
IASimulator,Profiler,DebuggerFunctionalModel ReferenceModel
UVMVerification
© Codasip Ltd.
CodasipApproachtoCustomization
© Accellera Systems Initiative 41
CodasipStudio CodAL–processordescriptionlanguageelement i_mac { use reg as dst, src1, src2; assembler { “mac” dst “,” src1 “,” src2 }; binary { OP_MAC:8 dst src1 src2 0:9 }; semantics { rf[dst] += rf[src1] * rf[src2]; }; };
IntegratedprocessordevelopmentenvironmentSDKautomationStandards-basedtools&models
VerificationAutomationVSPandprocessorvalidation
RTLAutomationPowerfulHigh-levelSyntheses
© Codasip Ltd.
CodasipStudio:• Introducedin2014• Silicon-provenbymajor
vendors• Allowsforfast&easy
customizationofbaseinstructionset:– SinglecycleMAC– Floatingpoint– Customcryptofunctions– …
CodALModels
© Accellera Systems Initiative 42
CodAL Description
Semantics InstructionSet Resources μArch(s)
Instruction Accurate (IA)
Cycle Accurate (CA)
/*Multiplyandaccumulate:semantics dst+=src1*src2*/
elementi_mac{
useregasdst,src1,src2;assembler{“mac”dst“,”src1“,”src2};binary{OP_MAC:bit[8]dstsrc1src20:bit[9]};semantics{ rf[dst]+=rf[src1]*rf[src2];};
};
© Codasip Ltd.
• ProcessorIPatHighLevelofAbstraction• EasytounderstandC-likelanguage• Features:
– Canmodelarichsetofprocessorcapabilities
– Canimplementmultiplemicroarchitecturesinasinglemodel
• Usage:– UsedtomodelandverifyallCodasip
processors– ProvidedtoCodasipIPcustomersas
astartingpointfortheirprocessoroptimizationsandmodifications
BISAExtension• Bitmanipulationinstructions,ca30:
– Bitinsertandextract– Byteswapping– Rotations– Bitswapping/shuffling– Zero/onecounters– ...
• Notyetratified– Mustbeimplementedascustomextensions
© Accellera Systems Initiative 43 © Codasip Ltd.
FunctionalModel
• WritteninCodAL– in10daysbyasingleengineer
• 900LOC• Softwaredevelopmentkit(SDK)automaticallygeneratedbyStudio,including– Ccompiler– Instructionsetsimulator(ISS)– Profilerforcheckingtheimpactoftheextensions
© Accellera Systems Initiative 44 © Codasip Ltd.
ImplementationModel• WritteninCodAL
– in3weeksbyasingleengineer• 1500LOC• Hardwaredesignkit(HDK)automaticallygeneratedbyStudio,including– RTL– Testbench– UVM-basedverificationenvironment
© Accellera Systems Initiative 45 © Codasip Ltd.
Verification• Consistencychecker• Randomassemblerprogramgenerator• UVMVerificationEnvironment
– ForcheckingthatRTLcorrespondstospecification(inthiscase,IAmodeldefinition)– EnvironmentinSystemVeriloggeneratedautomaticallyfromCodasipStudio
© Accellera Systems Initiative 46
Equivalence
Cycle-accurateCodALProcessorModel
Instruction-accurateCodALProcessorModel
SynthesizableRTL
ReferenceModel
TestCases
© Codasip Ltd.
Conclusion• Configurability
– Useful,butoftennotsufficient• Customization
– WhenthebestPPAforyourapplicationdomainisrequired• Standard(manual)approachforcustomISAextensions
– Error-proneandtime-consumingCodasipoffersaneasy,automatizedwaytoaddyoursecretsauce:
– CustomISAextensions– Microarchitecturalimprovements
Example:BISAextension,supportedbySDKandRTL,doneinacoupleofweeks.
© Accellera Systems Initiative 47 © Codasip Ltd.
Questions
© Accellera Systems Initiative 48
PostSiliconDebugandAnalyticsforRISC-VBasedSoCs
PeterShieldsUltraSoCTechnologiesLtd.DVConEuropeOctober,2018
© Accellera Systems Initiative 49
CorporateOverview• WeareaproviderofSoCanalyticssolutionsconsistingofon-chipRTLIPand
software• VC-fundedandbasedinCambridgeUK• £4.7MVCroundin2017withadditionofAlbertoSangiovanni-Vincentelli• 25patentsgranted+16pending• Seasonedmanagementteam• Keypartners&ecosystem• Siliconproventechnologyinmultiplecustomerdesigns• Revenue,blue-chipcustomers,repeatbusiness
50 © Accellera Systems Initiative
Acoherentarchitecturetodebug,develop,optimize&secure– FullSoCvisibility,HW&SW– Supportallarchitectures:FreedomofIPselection– Real-time&non-intrusive– Advancedanalytics&forensics– Power/Performanceoptimization– “inlife”analytics&SLAcompliance– SupportsFunctionalSafety– SupportsBareMetalSecurity™– High-speeddebugoverUSBorSerDes
On-chipAnalyticsforSoC
51 © Accellera Systems Initiative
xtensa
AHighLevelViewofSoC
52
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogicDSP
System Block
UltraSoC IP
© Accellera Systems Initiative
xtensa
ProcessorControlandTrace
53
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
TraceReceiver PAM PAM Trace
EncoderPortfolio of
Analytic Modules
DSP
System Block
UltraSoC IP
© Accellera Systems Initiative
xtensa
MessagingSubsystem
54
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
TraceReceiver PAM PAM Trace
Encoder
Message Engine
Message Engine
Portfolio of Analytic Modules
Flexible & Scalable Message Fabric
System Block
UltraSoC IP
DSP
© Accellera Systems Initiative
xtensa
TransactionAwareBusMonitoring
55
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder
Message Engine
Message Engine
Portfolio of Analytic Modules
Flexible & Scalable Message Fabric
System Block
UltraSoC IP
DSP
© Accellera Systems Initiative
xtensa
AdditionalMonitors
56
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
Portfolio of Analytic Modules
Flexible & Scalable Message Fabric
System Block
UltraSoC IP
DSP
© Accellera Systems Initiative
xtensa
ControlandDataOffChip
57
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
JTAGComm
Portfolio of Analytic Modules
Flexible & Scalable Message Fabric
System Block
UltraSoC IP
DSP
© Accellera Systems Initiative
xtensa
HighSpeedCommunicators
58
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
JTAGComm
USBComm
UniversalStreamingComm
Portfolio of Analytic Modules
Family of Communicators
Flexible & Scalable Message Fabric
System Block
UltraSoC IP
DSP
© Accellera Systems Initiative
xtensa
On-ChipDataStorageandControl
59
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
AXIComm
JTAGComm
USBComm
UniversalStreamingComm
Portfolio of Analytic Modules
Family of Communicators
Flexible & Scalable Message Fabric
System Block
UltraSoC IP
DSP
SystemMemoryBuffer
© Accellera Systems Initiative
IntelligentAnalyticModules
60
Use filters, cross-triggers and bursting
• TakeaBusMonitorasanexample• Configurablenumberof
– filters– counters– tracebuffersize
• Run-timeprogrammability– Filtermatchingfortriggering– Filtermatchingforcounting– Filteringforbustrace
• Gatherstatistics(best,worst,average)• Onlymeaningfulinformationsent• Reducesbandwidth&datavolume• Focusonwhatisrelevant
AXI Bus
Fullspeedbusmonitoring
Buffering – parameterisable
Communicator, e.g. USB
Very high bandwidth
Low bandwidth
Very low bandwidth
High bandwidth –parameterisable
High bandwidth
Very high bandwidth
Matches Counts Trace
© Accellera Systems Initiative
Softwaretoolsfordata-driveninsights
61
Eclipse based UltraDevelop IDE
RISC-VCPU
MultipleotherCPUs
singlestep&breakpointCPUcode
Real-timeHWData
SW&HWinonetool RISC-V
instructiontrace© Accellera Systems Initiative
RISC-VRunControl• Runcontrolincludeshalt,resume,read/writeofRAMandregisters,andsettingandclearingofbreakpoints.– Extensionsincludewatchpoints,runningarbitrarycode,semi-hostingandreversedebugging.
• ProposeddebugstandardforRISC-V– DebugModule– DebugTransportMechanism
• TransportcouldbeJTAG,Bus-mappedorother(USBetc)• https://github.com/riscv/riscv-debug-spec
© Accellera Systems Initiative 62
RISC-VTraceEncoding• ProcessorTraceTaskGroup
– standardizebothahardwareinterfacetotheRISC-Vcoreandapacket/dataformat– commercialandopensourcetraceencoders– provideenoughinformationforinstructiontrace– in-orderandout-of-ordercoreswithextensions– standardizethedataformatforcompressedbranchtracesothatprogramflowcanbereconstructedbydebuggingtools
• ProposedstandardforRISC-Vconsistsof– TraceInterface– TraceEncodingalgorithm
• Efficiencyandimpactontracebandwidth• https://github.com/riscv/riscv-trace-spec
© Accellera Systems Initiative 63
RISC-VTraceEncoder• Comparatorsandfilters
– Whatandwhentotrace
• TraceEncodingalgorithm• TraceBuffer• Counters
– Statistics– Performance
© Accellera Systems Initiative 64
Message & Event Interface
Message input Message output
Counters
Time Interval Timer
i-trace i/f d-trace i/f
System domain
Debug domain
status gpo
Filters
TraceTrace
ComparatorsComparatorsOptional
Standard
Key:
ExampleDesign
65
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
AXIComm
JTAGComm
USBComm
UniversalStreamingComm
System Block
UltraSoC IP
DSP
SystemMemoryBuffer
© Accellera Systems Initiative
DDRBandwidthExample
66
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
AXIComm
JTAGComm
USBComm
UniversalStreamingComm
SystemMemoryBuffer
DSP
WhydosomeDMAtransferstaketoolong?
© Accellera Systems Initiative
DDRBandwidthExample
67
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
AXIComm
JTAGComm
USBComm
UniversalStreamingComm
SystemMemoryBuffer
DSP
WhydosomeDMAtransferstaketoolong?
ConfigureBusMonitortoreportMin/Max/AvgbandwidthstoDRAM
controllerfrom2CPUs&DSP
© Accellera Systems Initiative
DDRBandwidth
68
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
AXIComm
JTAGComm
USBComm
UniversalStreamingComm
SystemMemoryBuffer
DSP
WhydosomeDMAtransferstaketoolong?
ConfigureBusMonitortoreportMin/Max/AvgbandwidthstoDRAM
controllerfrom2CPUs&DSP
0.00E+001.00E+082.00E+083.00E+084.00E+085.00E+086.00E+087.00E+088.00E+089.00E+08
Effe
ctiv
e B
/s
Time in ns
Windowed DDR traffic
DSP1 CPU1 CPU2
Aggregate bandwidth within spec (570Mbps avg)
© Accellera Systems Initiative
DDRBandwidth
69
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
AXIComm
JTAGComm
USBComm
UniversalStreamingComm
SystemMemoryBuffer
DSP
WhydosomeDMAtransferstaketoolong?
ConfigureBusMonitortoreportMin/Max/AvgbandwidthstoDRAM
controllerfrom2CPUs&DSP
0.00E+001.00E+082.00E+083.00E+084.00E+085.00E+086.00E+087.00E+088.00E+089.00E+08
Effe
ctiv
e B
/s
Time in ns
Windowed DDR traffic
DSP1 CPU1 CPU2
Combined peak bandwidth >2Gbps
© Accellera Systems Initiative
CrossTriggering-Deadlock
70
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
AXIComm
JTAGComm
USBComm
UniversalStreamingComm
DSP
SystemMemoryBuffer
Whatwashappeningwhen
thesystemdeadlocked?
© Accellera Systems Initiative
CrossTriggering-Deadlock
71
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
AXIComm
JTAGComm
USBComm
UniversalStreamingComm
DSP
SystemMemoryBuffer
Whatwashappeningwhen
thesystemdeadlocked?
ConfigureStatusMonitortodetectillegalscenario(locksequence)
© Accellera Systems Initiative
CrossTriggering-Deadlock
72
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
AXIComm
JTAGComm
USBComm
UniversalStreamingComm
DSP
SystemMemoryBuffer
Whatwashappeningwhen
thesystemdeadlocked?
ConfigureStatusMonitortodetectillegalscenario(locksequence)
ConfigurePAMstohaltCPUsonevent
ConfigurePAMstohaltCPUsonevent
ConfigureBUSMonitortoreporttrace
“TO”theevent
© Accellera Systems Initiative
CrossTriggering-Deadlock
73
Interconnect (AXI, ACE, ACE-lite, OCP, NoC)
GPUDRAMcontroller
CustomLogic
BusMon
TraceReceiver PAM PAM Trace
Encoder PAM StaticInstrumentation DMA Status
Monitor
Message Engine Message Engine Message Engine
Message Engine
AXIComm
JTAGComm
USBComm
UniversalStreamingComm
DSP
SystemMemoryBuffer
Whatwashappeningwhen
thesystemdeadlocked?
ConfigureStatusMonitortodetectillegalscenario(locksequence)
ConfigurePAMstohaltCPUsonevent
ConfigurePAMstohaltCPUsonevent
ConfigureBUSMonitortoreporttrace
“TO”theevent
© Accellera Systems Initiative
• Heterogeneousarchitectures• Non-intrusive,wire-speed• High-speedanalyticsoverUSB/SerDes• Debug,forensics,optimization
– pre-silicon&post-silicon• In-lifesystemmonitoring• CooperationwithinRISC-Vcommunityiscritical
74
RISC-V Post Silicon Analytics Summary
© Accellera Systems Initiative
Questions
© Accellera Systems Initiative 76