Hardware-Software Interface Specification for Verification in Accelerator-Rich Platforms
Sharad Malik
Princeton University
LATTE ’21, April 15, 2021
This work was supported in part by the Applications Driving Architectures (ADA) Research Center, a JUMP Center co-
sponsored by SRC and DARPA; by the DARPA POSH and SSITH programs; and by NSF XPS Grant No. 1628926.
2
ILA Team + Collaborators
Bo-Yuan HuangPramod Subramanyan
Yakir Vizel Weikun Yang Grigory Fedyukovich
Aarti Gupta
Yue Xing Yu ZengHuaixi LuYi Li
Sharad Malik
Hongce Zhang
Caroline Trippel Yatin A. Manerkar
Margaret Martonosi
3
Hardware-Software Interface
ISA
Software
Hardware
Abstraction for software
Specification for hardware
Pentium® III ProcessorSingle-core uniprocessor
Source: pdcfaculty.org
Instruction-Set Architecture
4
Hardware-Software Interface
ISA + MCM
Software
Hardware
Abstraction for software
Specification for hardware
Intel SkylakeMulti-core uniprocessor
Source: wikichip.org
Instruction-Set Architecture +
Memory Consistency Model
5
Hardware-Software Interface
8-Core GPU
4 Firestorm
Cores + 12MB
L2
4 Icestorm
Core
+ 4MB L2
SLC Cache
16–Core
Neural
Engine
Apple M1 Die Photo
Source: AnandTech
https://www.anandtech.com/show/16226
/apple-silicon-m1-a14-deep-dive
???
Software
Hardware
Abstraction for software
Specification for hardware
Heterogenous System-on-Chip
6
Accelerator Interface
On-chip Interconnect
CPU GPU Flash
DMA …MMU+DRAM
Microcontroller + Firmware
Memory
HW accelerators
NoC interface
// Load instruction
// Store instruction
Triggering operations
Firmware C code
Accelerator
Address Register
0xff00 status
0xff02 enable
… …
Accessing registers
7
• Accelerator implementation verification
• Formal
• Simulation-based
• Firmware hardware co-verification• Formal
• Simulation-based
• Shared memory accesses and memory consistency
SoC Verification
On-chip Interconnect
CPU GPU Flash
DMA …MMU+DRAM
Microcontroller + Firmware
Memory
HW accelerators
NoC interface
8
• Merits (similar to ISA)• Software-visible “architectural”
state variables
• Modular: set of instructions• Per-instruction state-update
• More than ISA• Formal
• Hierarchical
• Generalizes ISA to include accelerators
Instruction-Level Abstraction (ILA)
Inte
rface
Length
Key
Counter
State
Address
Inte
rco
nn
ect
AES Block Encryption
Interface Commands ≝ Instructions
Visible State
MMIO accesses Commands
Write, 0xff00, 0x1 START_ENCRYPT
Write, 0xff02, data WRITE_ADDRESS
… …
module Top_rtl (clk, rst, interrupt,if_axi_rd_r_msg, if_axi_rd_r_rdy,if_axi_wr_w_msg, if_axi_wr_w_rdy// other AXI interface ports
);
assign and_192 = and_6& (fsm_out[2]);assign or_117 = (fsm_out[2]) | (fsm_out[5]);
always @(posedge clk or negedge rst_bar) beginif (~rst_bar) beginaddrBound_4_1_equal <= 1’b0;
endelse if (run_w_wen & (~while_case_0)) beginaddrBound_4_1_equal <= addrBound_1_1_tmp_7;
endend
{WR 0x33000100 val (SetWeightAddr val),WR 0x33000120 val (SetTensorSize val),WR 0x33000130 val (SetLSTMConfig val),WR 0x33000200 0x1 (StartLSTM), ...
}
SetWeightAddr 0xff100100SetDataAddr 0xff100200SetOutputAddr 0xff100300SetTensorSize 0x20SetNumTimestep 0x10SetLSTMConfig 0x02cd87f9StartLSTM
(a) Accelerator RTL implementation snippet.
(b) Accelerator instruction set (partial).
Each HW operation triggered by an
MMIO access is modeled as an
individual instruction.
(c) A sequence of accelerator
instructions per- forming an LSTM
layer.
Instruction-Level Abstraction (ILA)
FlexASR Accelerator (Speech Recognition)
Harvard Wei Group
10
• Accelerator implementation verification
• Formal
• Simulation-based
• Firmware hardware co-verification• Formal
• Simulation-based
• Shared memory accesses and memory consistency
SoC Verification
On-chip Interconnect
CPU GPU Flash
DMA …MMU+DRAM
Microcontroller + Firmware
Memory
HW accelerators
NoC interface
11
• Formal verification of RTL implementation
Application of ILA: Hardware Verification
Add: rd=rs1+rs2
pc+=4, …
Jump:
pc= rs1+imm, …
Processor or accelerator RTL
ILA specification
SetLength:if !busy: length=axi_wdata
SetKey:if !busy: key=axi_wdata
(for processors or accelerators)
Refinement
Mapping
“instructions”: [{“instruction” : “WR_ADDR”,“ready_signal” : “RTL.xram_ack_delay_1”,“max_bound” : 20 }, …]
“state_mapping”: {“aes_address” : “RTL.aes_reg_opaddr_i.reg_out”,“aes_length” : “RTL.aes_reg_oplen_i.reg_out”,… }
12
• Formal verification of RTL implementation• Auto-generate complete formal properties for each instruction
Application of ILA: Hardware Verification (cont’d)
ILA specification
Instruction i: S’=F(S)
Implementation
Instruction i
State S
Instruction i
State S’
Check: match?Assume: match
Symbolically execute the instruction
13
• Formal verification of RTL implementation• Modular verification — per-instruction checking
• Automating modular verification
Application of ILA: Hardware Verification (cont’d)
f = RTL state transitions
f*f*V’V
U U’𝑖ILA
𝑟 𝑟
𝑖 = 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛
𝑟 = refinement relations
RTL
f*
For each instruction, check:
• in all valid RTL starting states (environment)
14
• Formal verification of RTL implementation• Modular verification — per-instruction checking
• Automating modular verification
Application of ILA: Hardware Verification (cont’d)
f = RTL state transitions
f*
V’V
U U’𝑖ILA
𝑟 𝑟
𝑖 = 𝑖𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛
𝑟 = refinement relations
RTL
For each instruction, check:
• in all valid RTL starting states (environment)
• Environment constraint auto-generation
𝑪
15
• Formal verification of RTL implementation
Application of ILA: Hardware Verification (cont’d)
Category Designs
Design Size ILA Size RTL size vs.
ILA size
(ratio of LoC)
Proof Coverage (in 1 hour on
double Xeon 5222
+ 256GB RAM)RTL
LoC
#. of state bit(wo. Mem. Abs)
DSL
LoC
#. of instr./cmd.
Processors
8051 micro-controller 8.9k 0.6k (2.6k) 0.6k 256 15x 88%
Rocket 18.2k 4.6k 0.7k 37 26x 71%
Piccolo 11.4k 7.8k (79.2k) 0.7k 37 16x 80%
Nibbler 2.7k 2.7k 0.6k 15 5x 100%
Accelerators
AES encryption 1.1k 7.5k 0.5k 16 2x 100%
RBM 10.6k 6.3k (184.6k) 1.0k 17 11x 100%
Gaussian blur image processing 6.9k 36.8k 0.5k 2 14x 100%
Memory system OpenPiton L1.5 cache 12.0k 5.9k (104.7k) 1.5k 19 8x 100%
Communication
interfaces
LeWiz Ethernet MAC core (TX) 12.5k 12.9k (428.7k) 2.3k 29 5x 97%
EMesh AXI interfaces 0.9k 0.45k 0.5k 22 2x 100%
16
• Accelerator implementation verification
• Formal
• Simulation-based
• Firmware hardware co-verification• Formal
• Simulation-based
• Shared memory accesses and memory consistency
SoC Verification
On-chip Interconnect
CPU GPU Flash
DMA …MMU+DRAM
Microcontroller + Firmware
Memory
HW accelerators
NoC interface
17
• Simulation-based Validation• ILA supports auto-generation of simulation model
Application of ILA: Hardware Verification (cont’d)
RTL executable model (RTEM)RTL Impl.
ILA Spec. ILAtor
Existing RTL
Compiler*
Instruction level executable model (ILEM)
Tool’s inputGenerated
ExecutableILA toolchain e.g. Verilator
18
• Simulation-based validation (traditional)• Activate the RTL Execution Model (RTEM) with test stimuli (a sequence of RTL inputs)
• After the simulation, check if the RTL signal/register value matches the reference result
Application of ILA: Hardware Verification (cont’d)
RTL Design
Execution ModelTest Stimuli Reference
result
High-Level
Functional Model
Manually
constructed
Deficiency: need to run and analyze the full test regardless of when bug
is triggered – instead, should stop when bug is triggered
checka = 6;b = 2;
res = a/b = 3;
res = 3;res = 2;
res = 3;
19
• Simulation-based validation (tandem simulation)• After simulating each instruction, check if the RTEM Architectural Variables (RTAV) match
ILEM Architectural Variables (ILAV)
Application of ILA: Hardware Verification (cont’d)
RTL Design
Execution Model
Instruction
Sequence Stimuli
ISA Execution
Model (ILEM)
RTAV
valuesILAV
values
AV-check
Identify bugs right at the instruction that causes AV deviations
add r1, r2, r3;
jmp;
…
add
add AV-check
jmp
jmp
Use Refinement Map
20
• Accelerator implementation verification
• Formal
• Simulation-based
• Firmware hardware co-verification• Formal
• Simulation-based
• Shared memory accesses and memory consistency
SoC Verification
On-chip Interconnect
CPU GPU Flash
DMA …MMU+DRAM
Microcontroller + Firmware
Memory
HW accelerators
NoC interface
21
FW/HW co-verification in SoCs
• Communicating (heterogeneous) IPs • Processor
• Firmware
• Specialized accelerators
• Hard to design, hard to verify• FW/HW co-design
• Heterogeneity
• Security critical
22
• FW/HW interaction• Hardware functions not captured
• RTL models not practical (too complex)
• HW/SW semantics gap
• Scalability• Bit-precise reasoning
• Concurrency
• message handling, synchronization
Challenges
ILA for Hardware
+
SW Verification Tech.
23
Co-verification methodology
•Modeling• ILA for specialized HW
• Source-level (LLVM) modeling of SW
•Verification• Software verification
techniques
Program 1 Program 2
Interacting
program threads
Using ILA Context
bounding
VerifiedbyBoogie/Corral
Single programIP1
HW 1a
FW 1
HW 1b
IP2
FW 2
HW 2a
HW 2b
24
• Accelerator implementation verification
• Formal
• Simulation-based
• Firmware hardware co-verification• Formal
• Simulation-based
• Shared memory accesses and memory consistency
SoC Verification
On-chip Interconnect
CPU GPU Flash
DMA …MMU+DRAM
Microcontroller + Firmware
Memory
HW accelerators
NoC interface
25
• Simulation-based• hardware-software co-simulation using ILA as verified abstraction
Hardware-Software Co-Verification (cont’d)
ILA ModelInstruction-level
simulator
Verilog
design
ILAtor
Formally-verified
abstraction
HW/SW co-simulation
(QEMU)
High speed
simulation
Low speed simulation
26
• Accelerator implementation verification
• Formal
• Simulation-based
• Firmware hardware co-verification• Formal
• Simulation-based
• Shared memory accesses and memory consistency
SoC Verification
On-chip Interconnect
CPU GPU Flash
DMA …MMU+DRAM
Microcontroller + Firmware
Memory
HW accelerators
NoC interface
27
• Shared memory accesses – memory consistency issues
Hardware-Software Co-Verification with MCM
Driver DeviceCrypto
Accelerator
Shared Memory
System-wide Verification
28
• Shared memory accesses – memory consistency issues• Memory consistency issues: memory access reordering
Hardware-Software Co-Verification with MCM (cont’d)
Thread 1 (T1):
1: st [x], 1
2: ld r1, [y]
Thread 2 (T2):
1’: st [y], 1
2’: ld r2, [x]
r1 == 0 and r2 == 0
Observable under Total Store Order
(TSO: adopted by x86)
Reordering visible to
the other thread
Example:
29
• Shared memory accesses – memory consistency issues• ILA-MCM framework: security and correctness verification using ILA and MCM
Driver DeviceCrypto
Accelerator
Shared Memory
Device Crypto-Acc.Driver
ILA ILA ILA
Memory Consistency Model
Property
State Update Values
State Update Orderings
System-wide Verification w. MCM
Hardware-Software Co-Verification with MCM (cont’d)
30
• Shared memory accesses – memory consistency issues• ILA: operational, state update functions (values)
• MCM: axiomatic & relational, state update re-ordering effects (ordering)
• Integrating ILA and MCM• Facet and facet event
• Captures different entities’ different observations
xx.T1 (observed by T1)
x.T2 (observed by T2)
Shared (memory)
variablesObserved by some entityDenoted as <variable.entity>
Hardware-Software Co-Verification with MCM (cont’d)
31
• Shared memory accesses – memory consistency issues• ILA: operational, state update functions (values)
• MCM: axiomatic & relational, state update re-ordering effects (ordering)
• Integrating ILA and MCM• Facet and facet event
• Captures different entities’ different observations
ILA
Facet
integrate values and orderings
a+b 0
vorderings
MCM
valuesa+b 0 v
Hardware-Software Co-Verification with MCM (cont’d)
32
• Verifying existing hardware-interacting programs
• Synthesize new programs (malicious exploits, buggy trace and etc…)
• Program sketch• Set of ILA instructions
• Partial ordering over instances of
instructions (optional)
IO_WRITE 0x100, 1 (SendResetCmd)
memcpy FwAddr,SM? (WriteImage)
IO_WRITE 0x104, 1 (SendLdFwCmd)
Driver Device Crypto-Acc.
…
…
…
…
…
…
Hardware-Software Co-Verification with MCM (cont’d)
33
Hardware-Software Co-Verification with MCM (cont’d)
• But TOCTOU
exists under
TSO!
SndRst. @1 2Rst. @3
SndRst. @11 12
StFwSM. @7 8
MemCpy. @15 16
SndAuthRq. @16 17
LdCmd. @10
SndLdCmd. @8 9
StFwSM. @12 16
SndLdCmd. @16 17
Rst. @17
LdCmd. @18
MemCpy. @23 26
SndAuthRq. @33 35
Lock.
SndRsp. @34 36
Auth. @21
ChkSig. @25
RdStsBit @38
SndInt. @39
PASS:JMP. @40
HandleRsp. @37
Set the lock of device to
prevent changes to FW image
Overwritten is still
allowed, but visible later
Core problem:
The overwritten could be visible
late due to relaxed MCM, thus it
circumvents the checking
Driver Device Crypto-Acc
Okay in SC
And example of the
security exploit found
by our verification:
34
Summary: Hardware-Software Interface
8-Core GPU
4 Firestorm
Cores + 12MB
L2
4 Icestorm
Core
+ 4MB L2
SLC Cache
16–Core
Neural
Engine
Apple M1 Die Photo
Source: AnandTech
https://www.anandtech.com/show/16226
/apple-silicon-m1-a14-deep-dive
ILA+MCM
Software
Hardware
Abstraction for software
Specification for hardware
Heterogenous System-on-Chip
35
• Accelerator implementation verification
• Formal
• Simulation-based
• Firmware hardware co-verification• Formal
• Simulation-based
• Shared memory accesses and memory consistency
Summary: ILA Based SoC Verification
On-chip Interconnect
CPU GPU Flash
DMA …MMU+DRAM
Microcontroller + Firmware
Memory
HW accelerators
NoC interface
36
The ILAng Tool
GitHub: https://github.com/PrincetonUniversity/ILAng
Wiki: https://bo-yuan-huang.gitbook.io/ilang/
Docker: https://hub.docker.com/r/byhuang/ilang/
37
• ILAng: A Modeling and Verification Platform for SoCs using Instruction-
Level Abstractions. [TACAS19]
• Integrating Memory Consistency Models with Instruction-Level
Abstractions for Heterogeneous System-on-Chip Verification. [FMCAD18]
• Formal Security Verification of Concurrent Firmware in SoCs using
Instruction-Level Abstraction for Hardware. [DAC18]
• Instruction-Level Abstraction (ILA): A Uniform Specification for System-
on-Chip (SoC) Verification. [TODAES18] (Best Paper Award)
Selected Bibliography