a synthesizeable vhdl model of the 1750a integer subset robert b. reese/ vince sanders microsystems...
TRANSCRIPT
A Synthesizeable VHDL Model A Synthesizeable VHDL Model of the 1750A Integer Subsetof the 1750A Integer Subset
Robert B. Reese/ Vince Sanders
Microsystems Prototyping Lab
Mississippi State/NSF Engineering Research [email protected], http://www.erc.msstate.edu/mpl
2/98 BR/VS - MPL -1750A 2
IntroductionIntroduction
Robert Reese, PhD EE, TAMU 1985– MCC Cad Program ‘85-88– Joined MSU ECE faculty 09/89
Microsystems Prototyping Lab– Associated with MSU/NSF ERC– 3 Faculty, 2 full time engineers, MS/PhD
grad students– Current projects: Mixed Signal VLSI, VHDL
modeling, ECAD via the WWW
2/98 BR/VS - MPL -1750A 3
Presentation OutlinePresentation Outline
Goals1750A OverviewF9450 ImplementationModel StructureModel TestingSynthesis ResultsWhat’s left….
2/98 BR/VS - MPL -1750A 4
GoalsGoals
Synthesizeable Model of 1750A for Legacy Replacement Experiments
Constraints– Not much time (start 8/15/97, finish 12/31/97)– Unfamiliarity with 1750A
Decided on– Use F9450 Implementation (D. Barker)– No floating point– Would not try to duplicate F9450 instruction cycle counts
2/98 BR/VS - MPL -1750A 5
EnvironmentEnvironment
Sun SparcStations, Solaris OS Model Tech VHDL Simulator (Mentor
qhdl/qhsim) Synopsys synthesis Sparcserver with 8x250Mhz CPUs, 2Gb
RAM used for regression tests
2/98 BR/VS - MPL -1750A 6
Mil-Spec-1750AMil-Spec-1750A
True CISC (Complicated Inst Set Comp)– Large number of instructions –Many addressing modes (e.g., load has 8
addressing modes)– 16 bit (single precision) and 32 bit (double
precision) operations– 16 General Purpose Registers, Status,
Fault, Pending Interrupt, Interrupt Mask– Optional Extended Addressing capability
2/98 BR/VS - MPL -1750A 7
1750A (cont.)1750A (cont.)
Separate set of IO instructions– 13 required, 40 optional– IO instructions access optional assets such as timers, MMU
Console Mode Operation– Allows external hardware access to internal registers– 14 console operations
16 Interrupt Sources Mil-Std 1750 does not specify bus interface
2/98 BR/VS - MPL -1750A 8
Fairchild F9450Fairchild F9450
Complete Mil-Std-1750A implementationShared External Addr/Data Bus (IB)– Arbitrated bus access – wait signals for both addr & data
Shortest instruction 4 clocks (logic ops), longest integer op 245 clks (dbl precision integer divide)
Microprogrammed control
2/98 BR/VS - MPL -1750A 9
1750A VHDL Model1750A VHDL Model
Implements 173 opcodes– All integer operations– Required IO + 2 optional IO – Console Mode Enter, Examine Register,
ContinueImplements F9450 Bus functionalityInstruction cycle counts same or less
than F9450Synthesizeable
2/98 BR/VS - MPL -1750A 10
MDR
IC
MAR
Inc Logic
A
B
RF
CORE
Constant Gen Logic
SW
PI
FLT
Bus Diagram for Model
ABUSBBUS
IC_old
IB
IBADDR IR
IR_exe
DecodeLogic
CONTROL
FSM
C
FMK
IMK
XH
XL
YH
YL
ALU
2/98 BR/VS - MPL -1750A 11
Register Definition (all registers 16-bits)IR instruction register, dest for instr being fetchedIR_exe instruction currently being executedMDR memory data register, data buffer to/from IBIBADDR address register, drives IBIC instruction counter of next instructionIC_old instruction counter for currently exe instMAR memory address register, operand addressA,B buffer registers for data read from RF coreC buffer register for data written to RF coreFLT fault registerSW status word registerPI pending interrupt registerFMK fault mask registerIMK interrupt mask registerXH,XL temporary registers used in Mult, Div, IO opsYH,YL temporary registers used in Mult, Div, IO ops
Some smaller misc registers not shown on diagram. Used for RF addresscomputation, constant block addressing, etc.
2/98 BR/VS - MPL -1750A 12
Entity Hierarchycpu1750a - tristate signals to pads here
cpucore - structural
biu - external bus interface
aproc - structural
incdec - increment,decrement
aproclogic - IC, MAR
dpath - structural
rf - structural
rfcore - 16 GPR regs (latches)
rflogic - buffer regs for in/out data, RF addressing
alulogic - all ALU functions except +/-
addsub - ALU adder/subtractor
constants - constants generation
fault - interrupt logic
ioproc - temp regs for IO, mul/div ops
decode - opcode decode
control - structural
fsm0 - fsm nstate logic
fsm1 - fsm nstate logic
fsm2 - fsm nstate logic
…..
fsm6 - fsm nstate logic
merge - merge for fsm0:fsm6 outputs
cstate - fsm state registers
2/98 BR/VS - MPL -1750A 13
Comments on Model HierarchyComments on Model Hierarchy
Often created separate entities for purposes of hardware mapping– ALU split into alulogic and addsub
• alulogic is random logic• addsub implementation technology dependent
(I.e. X4000 fastcarry chain versus standard cell CLA implementation)
– Register file split into rfcore (16 GPRs) and rflogic• X4000E CLB DPRAM good for RFCORE impl.
2/98 BR/VS - MPL -1750A 14
Control (#states = 547)Dpath Ctrl
(Unregistered,
6 to BIU, 2 to Fault)
Dpath Ctrl
(Registered , 83 signals)
FSM State, Flags
(Registered , 53 signals)
FSM0
FSM1
FSM2
FSM3
FSM4
FSM5
FSM6
MERGE
(OR)
2/98 BR/VS - MPL -1750A 15
Comments on ControlComments on Control
547 states in FSM (3.2 states average per opcode)– states NOT distributed equally (logic ops < 1.0
unique states per opcode, VIO required 24 states)
Unregistered signals from MERGE go to shallow logic (BIU fsm) or immediately registered (FAULT) at destination
2/98 BR/VS - MPL -1750A 16
Comments on Control (cont.)Comments on Control (cont.)
FSM split for easier synthesisEfficient synthesis requires 2 step process– synthesis of indiv. blocks to gates– flatten gate netlist from CONTROL down and
resynthesize to remove gates due to MERGE block
FSM implemented 1-level subroutine capability to increase state sharing
2/98 BR/VS - MPL -1750A 17
Model TestingModel Testing
VHDL Testbench has cpu1750a + memory + stimulus
Unix-based sim1750/as1750 for producing golden results
K. Hill provided SEAFAC* VSW 1750A assembly tests (circa 1984)– 272 non-floating point tests
* Systems Engineering Avionics FACility Verification SoftWare
2/98 BR/VS - MPL -1750A 18
SEAFAC TestsSEAFAC Tests
Separate ASM file for each instruction/addressing mode – lubi5131.asm : load from upper byte,
memory indirect indexed– lubi5130.asm: load from upper byte, memory
indirectMultiple operand data sets– lubi5130 contain 18 operand sets
Result, flags, interrupt bits checked
2/98 BR/VS - MPL -1750A 19
Regression Test SystemRegression Test System
Perl script which would– Read original SEAFAC ASM file, convert
to be compatible with as1750/sim1750– Run sim1750a to produce golden result– Run VHDL simulation to produce test
result– Indicate pass/fail, if fail, indicate
operand set(s) which failed
2/98 BR/VS - MPL -1750A 20
Regression Test ResultsRegression Test Results
Regression Tests run against behavioral model and synthesized netlist model
Of the 272 Tests:– 220 passed– 48 could not be automatically converted or
incompatible with sim1750– 4 failed because simulator produced
incorrect ‘C’ flag value (bug identified in 1750A simulator C code).
2/98 BR/VS - MPL -1750A 21
Sample Execution TimesSample Execution Times
drqqa110 (dbl prec div), 242 op sets– 110 min (gate-level)– 4 m : 50s (behavioral)
ddxqa220 (dbl prec div), 198 op sets– 85 min (gate-level)– 3 m : 39s (behavioral)
dmrq9210 (dbl prec mul), 198 op sets– 80 min (gate-level)– 3 m : 51s (behavioral)
2/98 BR/VS - MPL -1750A 22
Module SCMOS Cells X4000 CLBSaddsub 192 44alulogic 390 174aproclogic 202 133biu 379 170constants 244 142control 2902 1718decode 563 317fault 594 246inc 45 12ioproc 298 168rfcore 899 362*rflogic 562 289Total 7270 3775 *RFCORE could be implemented in < 50 CLBS using X4000E DPRAM
Synthesis Results
2/98 BR/VS - MPL -1750A 23
Comments on Synthesis ResultsComments on Synthesis Results
Model Synthesized to:–MSU SCMOS standard cell library– X4000 CLB Netlist
Only SCMOS Netlist simulatedSynthesized for area, used a max fanout
constraintNo attempt to make use of special X4000
features (ROM, DPRAM, fast carry, etc).
2/98 BR/VS - MPL -1750A 24
Comments on Synthesis (cont)Comments on Synthesis (cont)
Synopsys DesignWare Library used for Incrementer, Add/Sub blocks– CLA architecture specified
Synthesis time for entire design < 2 hours– will increase if more constraints
specified
2/98 BR/VS - MPL -1750A 25
Synthesis TweakingSynthesis Tweaking
RFCORE for Xilinx used 1 FF per bit (16x16 bits)– 128 CLBs for storage, rest for decoding– If X4000E DPRAM (1 write port, 2 read
ports), RFCORE < 50 CLBsDecode/Constant blocks basically
ROMs, CLB count can be lower if ROM capability used.
2/98 BR/VS - MPL -1750A 26
Synthesis Tweaking (cont.)Synthesis Tweaking (cont.)
Two stage synthesis for Control significantly reduced cell count– Xilinx• after 1st phase 2653 CLBs• after 2nd phase 1718 CLBs
– SCMOS• after 1st phase 4649• after 2nd phase 2902
2/98 BR/VS - MPL -1750A 27
What is Left?What is Left?
Hardware mapping improvementsBetter testing of bus interface, interruptsAdd floating point– Estimate control increase by 50%– After tweaking, addition of FP, estimate
approx 4100 CLBs.Also add optional IO, console mode– 2 timers to datapath
2/98 BR/VS - MPL -1750A 28
Module CLBs (Now) CLBs (projected)addsub 44 44alulogic 174 174aproclogic 133 133biu 170 170constants 142 71control 1718 2577decode 317 159fault 246 246inc 12 12ioproc 168 168rfcore 362 50rflogic 289 289Total 3775 4093 constants+decode decreased 50%, rfcore decreased, control + 50%
Projected CLBs if FP added, RF/Decode/Constants optimized
2/98 BR/VS - MPL -1750A 29
Using current CLB/State numbers:Of 1718 Control CLBs, only 83 used for CSTATE
1635 * 32 bits/CLB = 52320 uCode bits
Will guesstimate an average of 3 uWords perinstruction (would be based on average # of machinecycles per instruction).
uCode width = datapath control lines + next uAddress selection
= 91 + (9 bits direct address + 5 bits condition) = 105 bits estimate
178 opcodes * 3 uWords * 105 = 56070 bits
Would Microcode Reduce Control Size?
2/98 BR/VS - MPL -1750A 30
Not a clear winner between FSM and uCode for Xilinx
Clever use of machine cycles could reduce average microcode words per opcode
Vertical encoding of datapath signals could reduce uCode width (could also reduce gate count in FSM as well)
uCode versus FSM tradeoffs is technology dependent.What about other FPGA technologies besides Xilinx?
Not Clear…..
Ucode would give more predictable delay path for control.Further investigation may be warranted.
Would Microcode Reduce Control Size? (cont.)
2/98 BR/VS - MPL -1750A 31
In ClosingIn Closing
CDROM has all VHDL (behavioral and netlist) and regression tests
Regression Perl scripts dependent on qhsim but conversion to different simulator should not be difficult
For questions:– reese,[email protected]