05763799
Post on 07-Apr-2018
222 Views
Preview:
TRANSCRIPT
-
8/6/2019 05763799
1/2
Latch-based FPGA emulation methodfor design verification: case study withmicroprocessor
M. Kim, J. Kong, T. Suh and S.W. Chung
Using latches in a digital designis considered wrongowingto thetiming
issue. Field-programmable gate array (FPGA) vendors also recommend
flip-flops instead of latches in emulation.In this reported work, however,
the usefulness and benefit of utilising latches in FPGA emulation for
processor design verification is demonstrated. The study shows that alatch-based register file provides the seamless capability of
functionality validation, whereas the flip-flop based one requires modi-
fication to the original design, potentially harming the completeness
of functional verification. Experiment results with Xilinx and Altera
devices show marginal differences in terms of emulation performance
and area requirement in both approaches. This study reveals that
replacing SRAM with latches rather than flip-flops is appealing and
preferable in emulation with FPGAs.
Introduction: In digital design, one of the most time-consuming
processes is verification. Software-based hardware description language
(HDL) simulation is beneficial in a sense that internal signals of interest
can be observed. However, it is impractical to validate logic with high
complexity using HDL simulation because of intolerable simulation
time. To remedy this shortcoming, field-programmable gate array
(FPGA) based emulation has been most widely used. It provides the
capability of validating the design more than 1000 times faster than the
traditional software-based simulation [1]. However, the FPGA-based
emulation often requires modification to the original design owing to
the restricted internal structures and limited resources in FPGAs. For
example, large caches in modern microprocessors do not typically fit
into a single FPGA, and they should be split into several FPGAs.
A microprocessor is one of the most complex digital designs including
various logics and memories. Its validation requires the exhaustive
coverage of different combinations of all the instructions, interrupts and
exceptions. Therefore, FPGA-based emulation is typically an inevitable
step in the design process. However, some of the logics are not seamlessly
translated to theFPGA fabric. Oneof such logics isthe registerfile since it
is often custom-designed with SRAM[2] andthe required numberof ports
varies depending on the instruction set architecture (ISA). A simple dual-
issue microprocessor usually requires two write ports and four read ports
in the register file [2]. In FPGAs,the memoryelements (see Note) support
a limitednumberof ports. For example, the Altera CycloneII [3]provides
only two read ports and one write port in the memory element. Thus, the
register file should be converted by using the logic elements and there are
two options for implementation: latches or flip-flops. FPGA vendors
recommend flip-flops rather than latches, insisting that using latches
incurs complicated timing problems [4].
The operational difference between latches and flip-flops has a direct
effect on the digital design. A flip-flop is an edge-triggered device
enabling a write operation at a rising (or falling) edge of a clock,
whereas a latch is a level-triggered one at the high (or low) level of a
clock. Therefore, the operation of a latch-based register file is similar to
that of the original SRAM-based design. The adoption of the flip-flop-
based register file in emulation requires the modification of the originaldesign, potentially affecting the validation correctness. Specifically, it
causes the Read-After-Write (RAW) hazard, which does not exist in
the original SRAM-based register file. The hazard occurs when the
destination register of a write operation is the same as the source register
of a subsequent read operation. Owing to the edge-triggered nature of a
flip-flop-based register file, the data to be read is not available in the
current clock cycle because the write operation occurs at the end of the
clock period. Therefore, the hazard should be resolved by adding
additional forwarding paths or by stalling the microprocessor. This
design change impedes the main purpose of emulation and could harm
the completeness of functional verification.
In this Letter, we implement a microprocessor with the latch-based
register file for validation using FPGA emulation and compare it with
the flip-flop-based one in terms of performance and area. Throughout
the Letter, we show the usefulness and benefit of using latches invalidation with FPGAs.
Implemented microprocessors: We compare two versions of a micro-
processor in emulation: one with a latch-based register file (Pl) and
the other with a flip-flop-based register file (Pff). Note that Pff requires
special forwarding paths to overcome the RAW hazard explained
earlier. The processor is based on ARM9, which has five pipeline
stages: Instruction Fetch (IF), Instruction Decode (ID), Execution
(EX), Memory Access (MEM), and Write-Back (WB). It is based on
ARMv5 instructions except supplementary instructions such as copro-
cessor, thumb, and load/store multiple instructions.The register file in Pl consists of 15 latch-based registers and one flip-
flop-based register; the 15 registers are general purpose registers and the
only register with flip-flops is the program counter (PC). Since latches
are level-triggered, the data written in the first half of the clock can be
read in the second half of the clock. Thus, the RAW dependency is
naturally resolved without any additional forwarding path. Fig. 1
shows an example of the dependency. In the case of the latch-based
register file, the result of the first instruction (mov r0, #1) is written
back in the register r0 in the first half of clock cycle 4. In the second
half of the same clock cycle, the register r0 is read by the fourth instruc-
tion (add r4, r0, r5). Therefore, the register file in P l does not need a
forwarding path from the WB stage to in front of the ID/EX pipelineregister (dotted arrows in Fig. 1). Note that Pff requires this forwarding
path to resolve the hazard.
IF ID MEM WBEX
IF ID MEM WBEX
IF ID MEM WBEX
IF ID MEM WBEX
0mov r0, #1
1
mov r1, #1
2subs r3, r2, #1
3
add r4, r0, r5
R0
0 1 2 3 4 5 6 7
clock cycle
Instruction No.Instruction
Fig. 1 Example of forwarding from WB to in front of ID/EX pipeline register
During the actual implementation of Pl, however, the register file
suffered from timing errors caused by glitch. To remove glitch, we
utilised an AND gate. Inputs to the AND gate are a phase-shifted
clock signal (908 in our study) and the original write enable. Then,
the output of the AND gate is connected to the write-enable for eachregister. As a result, the write enable signal is kept low for one fourth
of a clock cycle, ignoring wrong data generated by timing errors, as
shown in Fig. 2. Note that the AND gate is located inside the register
file and does not affect the original processor design outside the register
file. The 908 phase-shifted clock is not specially contrived for the latch-
based register file. It was constructed to maintain the same memory
(or cache) access latency of one cycle as the original design in the
MEM pipeline stage. The read latency of the memory elements in
FPGAs is more than one cycle because of its input register (flip-flops).
original clock(0o phase shifted)
phase shifted clock
(90o phase shifted)
write enable of register(before conjugation)
write enable of register(after conjugation)
data
wrong data
Fig. 2 Resolving glitch by utilising phase shifted clock
The register file in Pff purely consists of flip-flops and enables a write
operation only at the rising (or falling) edge of a clock cycle. As a result,
the read and write operation to a register cannot take place in the sameclock cycle, resulting in the RAW hazard. There are two options to
resolve the RAW hazard: forwarding from the WB stage to in front of
the ID/EX pipeline or stall to prevent the execution of the fourth instruc-tion with wrong data. Stalling the processor for one cycle leads to a
ELECTRONICS LETTERS 28th April 2011 Vol. 47 No. 9
-
8/6/2019 05763799
2/2
different execution time of a program compared to the original design
with the SRAM-based register file. Furthermore, the stall logic should
be added as well. The forwarding option resolves the RAW hazard
without affecting the execution cycle time. Nevertheless, the forwarding
path is located outside the register file and may cause unexpected side-
effects such as functional errors hidden in the extra forwarding path.
Thus, Pff requires extra verification process after replacing the register
file with the original SRAM-based one and removing forwarding paths.
Analysis and discussion: In this Section, we present experiment results
with FPGAs (Altera Cyclone II and Xilinx XC3S500E FPGAs):
maximum frequency and area for P l and Pff. The maximum frequency
is obtained by analysing the critical path of Pl and Pff from the synthesis
report of the design tools for each FPGA (Altera Quartus II 9.1 Web
Edition and Xilinx ISE 12.2). The area is also obtained from the same
report.
The maximum clock rates of Pl and Pff are similar on both FPGAs, as
shown in Table 1. Cyclone II reports a 5MHz lower frequency for Pl than
that of Pff. XC3S500E reports exactly the same frequency for Pl and Pff.
The difference in clock rates is caused by the characteristic of the storage
elements (flip-flops or latches) in each FPGA. Cyclone II has configur-
able storage elements called dedicated logic registers, which are located
inside each logic element. However, the dedicated logic registers can
only be used as flip-flops. In other words, the latches are implemented
by configuring and routing logic elements, consuming more logic
elements. On the other hand, XC3S500E can configure the storageelements (called slice flip-flops) as latches. Hence, the implementation
of a latch does not require an additional logic element to be configured
or routed, compared to the flip-flop implementation. This feature of
Cyclone II impacts more significantly on the area. P l occupies a larger
area than Pff by 14.3% on Cyclone II, while Pl utilises only a 0.2%
larger area than Pff on XC3S500E.
Table 1: Area and performance of Pl and Pff
FPGA type Altera Cyclo ne II Xilinx XC3S500 E
Register
File type
Flip-flop
based (Pff)
Latch
based (Pl)
Flip-flop
based (Pff)
Latch
based (Pl)
Area 4 058 LEs 4 639 LEs 24 74 slices 2 47 8 slices
Performance(clock frequency) 55 MHz 50 MHz 35 MHz 35 MHz
Conclusion: We have demonstrated the usefulness and benefit of utilis-
ing latches in emulation with FPGAs. In the processor emulation, the
latch-based register file provides the seamless capability of functional
validation, whereas the flip-flop-based one requires extra logic in a
processor which potentially harms the functional verification. Both
approaches do not show the notable differences in terms of emulation
speed andarearequirement.Our studyshowsthatthe latch based approach
for the register file is appealing and preferable in functional validation
with emulation using FPGAs.
Note: An FPGA usually includes two kinds of elements: memory
element and logic element. Memory element can only be configured
as memory whereas logic element is able to be configured into many
different kinds of combinational or sequential logics.
Acknowledgments: This work was supported in part by the Ministry of
Knowledge Economy, Korea, under the Information Technology
Research Centre support programme supervised by the National IT
Industry Promotion Agency (NIPA-2011-C1090-1121-0010).
# The Institution of Engineering and Technology 2011
19 February 2011
doi: 10.1049/el.2011.0462
M. Kim, J. Kong and S.W. Chung ( Division of Computer and
Communication Engineering, Korea University, Seoul 136-713,
Republic of Korea)
E-mail: swchung@korea.ac.krT. Suh ( Department of Computer Science Education, College of
Education, Korea University, Seoul 136-713, Republic of Korea)
References
1 Nakamura, Y., Hosokawa, K., Kuroda, I., Yoshikawa, K., andYoshimura, T.: A fast hardware/software co-verification method forsystem-on-chip by using a C/C++ simulator and FPGA emulatorwith shared register communication. Proc. of 41st Annual DesignAutomation Conf., (DAC04), San Diego, CA, USA, 2004, pp. 299304
2 Homayoun, H., Gupta, A., Veidenbaum, A., Sasan, A., Kurdahi, F., andDutt, N.: RELOCATE: register file local access pattern redistributionmechanism for power and thermal management in out-of-orderembedded processor, Lect. Notes Comput. Sci., 2010, 5952/2010,pp. 216231
3 Altera Corporation: Cyclone II memory blocks, Cyclone II DeviceHandbook, Vol. 1, Chapter 8, February 2008
4 Xilinx: Xilinx design reuse methodology for ASIC and FPGAdesigners, Reuse Methodology Manual For System-on-Chip Designs
ELECTRONICS LETTERS 28th April 2011 Vol. 47 No. 9
top related