CUSTOMIZATION OF A RISC-V PROCESSORTO ACHIEVE DSP PERFORMANCE GAIN
CODASIP STUDIO
2
Codasip Studio
Set the best ratio of power consumption and performance Easily add optional subsets and features Fine-tune the processor for intended application Differentiate and gain competitive advantage!
CodALProcessor description
language
element i_mac {
use reg as dst, src1, src2;
assembler { “mac” dst “,” src1 “,” src2 };
binary { OP_MAC:8 dst src1 src2 0:9 };
semantics {
rf[dst] += rf[src1] * rf[src2];
};
};
Integrated processor
development environment
Toolchain automationStandards based tools & models
Verification AutomationVSP and processor validation
RTL AutomationPowerful High level Synthesis
Codasip Studio:
• Introduced in 2014• Silicon-proven by major vendors• Allows for fast & easy
customization of base instruction set:
• Single cycle MAC
• Floating point
• Custom crypto functions
• Non-standard data types
• … and many others
1. STEP: BK SELECTION
3
The Berkelium series Base versions:
• Bk1 – ultra-low cost option
• Bk3 – all-purpose
• Bk5 – best performance
• Bk5-64 – high data bandwidth, energy efficient
64bit variant, new in November 2017!
All fully compliant with the RISC-V specification
All fully customizable
Cost, Power, Consumption
Pe
rfo
rman
ce
Selection of low-power, high-performance options for any design
2. STEP: ISA CONFIGURATION
4
F F
I
M M
Bk1 E E E E
C C
I = integer ISA, 32 GPRsE = integer ISA, 16 GPRsM = multiplication extensionC = compressed instructionsF = floating-point ISA
M M
Bk3E
C C
I
E
I
E
I
E
M M
Bk5 I I I I
p = parallel multiplierd = JTAG debug
5
3. STEP: NEW INSTRUCTIONS
4. STEP: AUTOMATED GENERATION
Processor ModelingSoftware analysis
SDK Synthesis RTL Synthesis Verification
6
CA Simulator, Profiler, Debugger
Application(s)/Programs(s)
C/C++ Compiler
Assembler
Linker
IA Simulator, Profiler, Debugger
Universal Verification
Methodology (UVM) Environment
Reference Model
RTL
Codasip IP or User Specified
Architecture in CodAL
Instruction Accurate Model (IA)
Cycle Accurate Model (CA)
7
Requirements:
• Low power
• Low cost
• Possibility to create derivative
designs to meet diverse
requirements
• Reduced time-to-market
• Performance improvement
THE MICROSEMI USE CASE: AUDIO PROCESSING SOLUTION FOR IOT
7
1. STEP: Berkelium core selection
• Bk3
2. STEP: ISA configuration
• Bk3-I = enabled integer instructions only
• Bk3-IM = enabled multiplication/division instructions
• Bk3-IM-p = enabled multiplication/division + parallel HW multiplier
3. STEP: new instructions
• new DSP instructions
4. STEP: automated generation
• SDK + RTL + UVM automatically generated for Bk3-IM-p + DSP
THE MICROSEMI USE CASE: STEPS
THE MICROSEMI USE CASE:RESULTS (TABLE)
Codasip RISC-V Processor
Clock Cycles
Code size
Area (Gates)
Difference against the lower configuration
Speedupvs. RV32-I
Areavs. RV32-I
Speedup vs. RV32-IM
Area vs. RV32-IM
Speedupvs. RV32-IM-p
Areavs. RV32-IM-p
Bk-3 Base Configuration
1,764,256 232 16.0k
Bk-3 Base + Serial Multiplier
427,561 148 19.7k 4.12x 1.24x
Bk-3 Base + Parallel
Multiplier133,061 148 26.2k 13.26x 1.64x 3.21x 1.32x
Bk-3 Base + DSP Extensions
31,371 64 38.7k 56.24x 2.43x 13.62x 1.96x 4.24x 1.48x
Des
ign
Iter
atio
ns
= 3
Day
s
Performance improvement No advanced manufacturing processes No increase in clock frequency Took only days
8
THE MICROSEMI USE CASE:RESULTS (GRAPH)
Performance
improvements for FIR
filter using Studio*
* Implementing RISC-V for IoT
applications, Dan Ganousis & Vijay
Subramaniam, Design Automation
Conference 2017
Base Bk3
Bk3 + serial mult
Bk3 + parallel mult Bk3 + DSP extensions
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2000000
0 5000 10000 15000 20000 25000 30000 35000 40000 45000
Cycle
s
Gatecount
9