presenter : ching-hua huang 2013/11/4 temporal parallel simulation: a fast gate-level hdl simulation...
TRANSCRIPT
Presenter : Ching-Hua Huang
2013/11/4
Temporal Parallel Simulation: A Fast Gate-level HDL Simulation Using Higher Level Models
Cited count : 3
Dusung Kim ; Ciesielski, M. ; Dept. of Electr. & Comput. Eng., Univ. of Massachusetts, Amherst, MA, USAKyuho Shim ; Seiyang Yang ;Dept. of Comput. Eng. Pusan National Univ., Busan, Korea Design, Automation & Test in Europe Conference & Exhibition (DATE), 2011
National Sun Yat-sen University
Embedded System Laboratory
Simulation speedup offered by distributed parallel event-driven simulation is known to be seriously limited by the synchronization and communication overhead. These limiting factors are particularly severe in gate-level timing simulation. This paper describes a radically different approach to gate-level simulation based on a concept of temporal rather than conventional spatial parallelism. The proposed method partitions the entire simulation run into simulation slices in temporal domain and each slice is simulated separately. With each slice being independent from each other, an almost linear speedup is achievable with a large number of simulation nodes.
Abstract
2
This concept naturally enables “correct by simulation” methodology that explicitly maintains the consistency between the reference and the target specifications. Experimental results clearly show a significant simulation speed-up.
Abstract (Cont.)
3
4
What’s the problem
The performance of hardware simulation For complex designs becomes prohibitively low. Limited by the synchronization and communication overhead.
Proposed method to solve above problem A radically different approach to gate-level simulation based on a
concept of temporal parallelism.
Related Work
5
[7] SimCluster
[This paper]Temporal Parallel Simulation:
A Fast Gate-level HDL Simulation Using Higher Level Models
[2]TPSim – GL timing
simulationThe basic idea of this approach and preliminary results for special cases were introduced.
[6] Parallel Discrete Event Simulation
(PDES)
[9] Principles of conservative
parallel simulation
lock-step based synchronization
partitions the design into separate modules and performs concurrent simulation
Rollback-based synchronization
[12] performance improvement
[13] speed up
Developed the first Verilogdistributed simulator
A large gate-leveldecoder design improvement
6
Proposed method – TPSim TPSim (Temporal Parallel Simulation)
(1) Partitions the entire simulation into slices in temporal domain. (2) Each slice is simulated separately. It consists of two major steps:
。Fast reference simulation Performed on a high-level abstraction of the design. To store essential state information.
。Detailed, fine-grained target simulation Performed on a lower level (gate-level) model. It is applied in parallel to each simulation slice.
(1) State checking(2) State matching
7
Difficulties in Generalization of Temporal Parallelism (1) Multiple Asynchronous Clocks
Multiple-clock design may not be 100% cycle-by-cycle consistent with the RTL simulation.
Proposed solution : Abstract delay annotation method Allowed to overlap by the value equal to the longest delay in the
design
DataA[N-1:0]
ReqB
ClkB
8
Difficulties in Generalization of Temporal Parallelism (2) State Checkpointing in Event-driven Simulation
Finding correct placement for checkpoints is more difficult because of arbitrary delay between the event edges.
Proposed solution : Checkpoint window The size of the checkpoint window is one clock-cycle equivalent
The correct value for Q could be reliably obtained at the end of each window
Overlap period must be increased accordingly so that it contains the entire target checkpoint window.
9
Difficulties in Generalization of Temporal Parallelism (3) State Matching
Maintain functional correctness of the restored target state. During synthesis the design undergoes a number of logic
transformations。Combinational and sequential logic optimization, retiming, and
algebraic transformations
Proposed solution : A promising preliminary work in state matching has recently been published in [17].
Handling testbench Testbench is a sequential process
。It has no hardware “states” ,so it cannot be restarted at an arbitrary point of time.
Proposed solution : Testbench forwarding Saved continuously during the reference simulation
Before the experiment…
10
How many performance can TPSim improve ? Slices Multiple clock issue ?
Tool selection Synthesis : Design Compiler Cell library : 65nm technology library Simulator : NC-Sim 8.2
Experiment 1 – JPEG Encoder
11
This design was from OpenCores
Total gate count of GL design is 0.9M
This design was from OpenCores Total gate count of GL design is 25K
Experiment 2 – AES (Advanced Encryption Standard)
12
Conclusions and My comments
13
Conclusions This is accomplished by performing temporal partitioning
of the simulation period. This paper provides not only significant performance
improvement but also a smarter method for simulation-based verification.
My comments Because, I have some problem about the Performance
gap between RTL and GL timing simulation. This paper give me a other reference about this area.