esm
DESCRIPTION
Semi design of QDR-II SRAM on virtex 5 FPGATRANSCRIPT
CHAPTER 1
INTRODUCTION
1.1 Introduction to Embedded Systems
An embedded system is a special-purpose computer system designed to perform one or a few
dedicated functions, often with real-time computing constraints. It is usually embedded as part of a
complete device including hardware and mechanical parts. In contrast, a general-purpose computer, such
as a personal computer, can do many different tasks depending upon programming. Embedded systems
control many of the common devices in use today. Since the embedded system is dedicated to specific
tasks, design engineers can optimize it by reducing the size and cost of the product, or increasing the
reliability and performance. An embedded system can also be defined as an engineering artefact
involving computation that is subject to physical constraints arising through interactions of
computational processes with the physical world. These physical constraints are divided into reaction
and execution constraints. Reaction constraints originate from the behavioral requirements and specify
the deadlines, throughput and jitter whereas the execution constraints originate from the implementation
requirements and put bounds on available processor speeds, power, and memory and hardware failure
rates.
Some embedded systems are mass-produced, benefiting from economies of scale. In general,
"embedded system" is not an exactly defined term, as many systems have some element of
programmability. Physically, embedded systems range from portable devices such as digital watches and
MP4 players to large stationary installations like traffic lights, factory controllers, or the systems
controlling nuclear power plants/missiles/satellites. Complexity varies from low, with a single
microcontroller chip, to very high with multiple units, peripherals and networks mounted inside a large
chassis or enclosure.
Embedded systems range from no user interface at all, dedicated only to one task, to complex
graphical user interfaces that resemble modern computer desktop operating systems. Simple embedded
devices use buttons, LEDs, and small character-or digit-only displays, often with a simple menu system.
Embedded processors can be broken into two broad categories: ordinary microprocessors (µP) and
microcontrollers (µC), which have many more peripherals on chip, reducing cost and size.
A common configuration for very-high-volume embedded systems is the system on a chip
(SOC). A system on chip is an integrated circuit which contains a complete system consisting of
multiple processors, multipliers, caches and interfaces on a single chip. SOCs can be implemented as an
application-specific integrated circuit (ASIC) or using a field-programmable gate array (FPGA).
Page 1 of 82
1.1.1 Embedded System Characteristics
Embedded systems are designed to some specific task, rather than be a general-purpose
computer for multiple tasks. Some have real-time performance constraints that must be
met, for reasons such as safety and usability; others may have low or no performance
requirements, allowing the system hardware to be simplified to reduce costs.
Embedded systems are not always stand-alone devices. Many embedded systems consist
of small, computerized parts within a larger device that serves a more general purpose.
For example, the Gibson Robot Guitar features an embedded system for tuning the
strings; the overall purpose of the guitar is, of course to play music. Similarly, an
embedded system in an automobile provides a specific function as a subsystem of the car
itself.
The program instruction written for embedded systems are referred to as firmware, and
are stored in read-only memory or flash memory chips. They run with limited computer
hardware resources, little memory, small or non-existent keyboard and/or screen.
Embedded systems often reside in machines that are expected to run continuously for
years without errors and in some cases recover by them if an error occurs. Therefore the
software is usually developed and tested more carefully than for personal computers, and
unreliable mechanical moving parts such as hard drives, switches or buttons are avoided.
1.2 Introduction to Electronic Warfare
The term Electronic Warfare (EW) refers to any action involving the use of the electromagnetic
spectrum (EMS) or directed energy (DE) to control the EMS or to attack the enemy. EW includes three
major subdivision and they are: Electronic attack (EA), Electronic Protect (EP), and Electronic warfare
support (ES). The purpose of EW is to deny the opponent an advantage in the EMS and ensure friendly
unimpeded access to the EM spectrum portion of the information environment. EW can be applied from
air, sea, land, and space by manned and unmanned systems.
Page 2 of 82
1.2.1 Description of EW
The term Electronic attack (EA) refers to the usage of electromagnetic energy, directed energy, or
anti-radiation weapons to attack personnel, facilities, or equipment with the intent of degrading,
neutralizing, or destroying enemy combat capability. In case of EM energy, this action is referred to as
jamming and can be performed on communications systems or radar systems.
Electronic protect or Electronic protective measures (EPM) involves actions taken to protect
personnel, facilities and equipment from any effects of friendly or enemy use of electromagnetic
spectrum that degrade, neutralize or destroy friendly combat compatibility.
In military telecommunications, the terms Electronic Support (ES) or Electronic Support Measures
(ESM) describe the division of electronic warfare involving actions taken under direct control of an
operational commander to detect, intercept, identify, locate, record, and/or analyze sources of radiated
electromagnetic energy for the purposes of immediate threat recognition (such as warning that fire
control RADAR has locked on a combat vehicle, ship, or aircraft) or longer-term operational planning.
Thus, Electronic Support provides a source of information required for decisions involving Electronic
Protection (EP), Electronic Attack (EA), avoidance, targeting, and other tactical employment of forces.
Electronic Support data can be used to produce signals intelligence (SIGINT), communications
intelligence (COMINT) and electronics intelligence (ELINT).
Digital communication became important with the expansion of the use of computers and data
processing and had continued to grow as a major industry providing the inter connection of computer
peripherals and transmission of data between distant sites. With the requirement of higher and higher
speeds of data transmission, the stress on the development of digital communication techniques has
increased, Also, the channel and its characteristics bandwidth, frequency, noise, distortion, transmission
speed, type of coding etc. got improved from time to time.
Electronic Support Measures gather intelligence through passive "listening" to electromagnetic
radiations of military interest. Electronic support measures can provide.
1. Initial detection or knowledge of foreign systems.
2. A library of technical and operational data on foreign systems.
3. Tactical combat information utilizing that library.
Page 3 of 82
Desirable characteristics for electromagnetic surveillance and collection equipment include.
1. Wide-spectrum or bandwidth capability because foreign frequencies are initially
unknown.
2. Wide dynamic range because signal strength is initially unknown.
3. Narrow band pass to discriminate the signal of interest from other electromagnetic radiation on
nearby frequencies.
4. Good angle-of arrival measurement for bearings to locate the transmitter.
1.2.2 Electronic Counter Measures
Electronic Counter Measures (ECM) are a subsection of electronic warfare which includes any
sort of electrical or electronic device designed to trick or deceive Radar, Sonar, or other detection
systems like IR (infrared) and Laser. It may be used both offensively and defensively in any method to
deny targeting information to an enemy. The system may make many separate targets appear to the
enemy, or make the real target appear to disappear or move about randomly. It is used effectively to
protect aircraft from guided missiles. Most air forces use ECM to protect their aircraft from attack. That
is also true for military ships and recently on some advanced tanks to fool laser/IR guided missiles.
Frequency is coupled with stealth advances so that the ECM system has an easier job. Offensive ECM
often takes the form of jamming. Defensive ECM includes using blip enhancement and jamming of
missile terminal homers.
1.2.3 Electronic Counter-Counter Measures
Electronic Counter-Counter Measures (ECCM) describes a variety of practices which attempt to
reduce or eliminate the effect of Electronic Counter Measures (ECM) on electronic sensors aboard
vehicles, ships and aircraft and weapons such as missiles. ECCM is also known as Electronic Protective
Measures (EPM), chiefly in Europe. Electronic Protection (EP) involves actions taken to protect
personnel, facilities, and equipment from any effects of friendly or enemy use of the electromagnetic
spectrum that degrade, neutralize, or destroy friendly combat capability. While defensive EA actions and
EP both protect personnel, facilities, capabilities, and equipment, EP protects from the effects of EA
(friendly and/or adversary). Some examples of EPM are ECM detection, Pulse compression by
"chirping", or linear frequency modulation, Frequency hopping, Side lobe cancellation, Polarization and
Radiation homing.
Page 4 of 82
1.3 Field Programmable Gate Array
A Field Programmable Gate Array (FPGA) is a semiconductor device that can be configured by
the customer or designer after manufacturing hence the name ''field-programmable". FPGAs are
programmed using a logic circuit diagram or a source code in a hardware description language (HDL) to
specify how the chip will work. They can be used to implement any logical function that an application
specific integrated circuit (ASIC) could perform, but the ability to update the functionality after shipping
offers advantages for many applications. FPGAs contain programmable logic components called "logic
blocks", and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together”-
somewhat like a one-chip programmable breadboard logic blocks can be configured to perform complex
combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic
blocks also include memory elements, which may be simple flip-flops or more complete blocks of
memory.
The cost of an FPGA design is much lower than that of an ASIC (although the ensuing ASIC
components are much cheaper in large production runs). At the same time, implementing design changes
is much easier in FPGAs, and the time-to-market for such designs is much faster. FPGAs are often used
to prototype ASIC designs or to provide a hardware platform on which to verify the physical
implementation of new algorithms. However, their low development cost and short time-to-market mean
that they are increasingly finding their way into final products (some of the major FPGA vendors
actually have devices that they specifically market as competing directly against ASICs).
Field Programmable Gate Array
Fig. 1.1: FPGA Introduction
In order to be programmable, we need some mechanism that allows us to configure (program) a
prebuilt silicon chip.
1.3.1 FPGA Origin
Around the beginning of the I980s, it became apparent that there was a gap in the digital IC
continuum. At one end, there were programmable devices like SPLDs and CPLDs, which were highly
Page 5 of 82
configurable and had fast design and modification times, but which couldn't support large or complex
functions. At the other end of the spectrum were ASICs. These could support extremely large and
complex functions, but they were painfully expensive and time-consuming to design. Furthermore, once
a design had been implemented as an ASIC it was effectively frozen in silicon.
The Gap
Fig. 1.2: The Gap between PLDs and ASICs
The early devices were based on the concept of a programmable logic block, which comprised a
3-input lookup table (LUT), a register that could act as a flip-flop or a latch, and a multiplexer, along
with a few other elements that are of little interest here.
a b c y
q
d clock
Fig. 1.3: The key elements forming a simple programmable logic block
Each FPGA contained a large number of these programmable logic blocks, as discussed below.
By means of appropriate SRAM programming cells, every logic block in the device could be configured
to perform a different function. Each register could be configured to initialize containing logic 0 or logic
1 and to act as a flip-flop (as shown in Fig: 1.3) or a latch. If the flip-flop option were selected, the
Page 6 of 82
ASICs
Gate Arrays
Structured ASICs*
Standard Cell
Full Custom
PLDS
SPLDS
CPLDS
Mux flip-flop
3-input
LUT
register could be configured to be triggered by a positive-or negative-going clock (the clock signal was
common to all of the logic blocks). The multiplexer feeding the flip-flop could be configured to accept
the output from the LUT or a separate input to the logic block, and the LUT could be configured to
represent any 3-input logical junction.
1.3.2FPGA Architecture
The complete FPGA comprised of a large number of programmable logic block called "islands"
surrounded by a "sea" of programmable interconnects. High-level illustration is merely an abstract
representation. All of the transistors and interconnects would be implemented on the same piece of
silicon using standard IC creation techniques. In addition to the local interconnect reflected in figure,
there would also be global (high-speed) interconnection paths that could transport signals across the chip
without having to go through multiple local switching elements. The device would also include primary
I/O pins and pods. By means of its own SRAM cells, the interconnect could be programmed such that
the primary inputs to the device were connected to the inputs of one or more programmable logic blocks,
and the outputs from any logic block could be used to drive the inputs, the primary outputs from the
device, or both.
Fig. 1.4: Top-down view of simple, generic FPGA architecture
The end result was that FPGAs successfully bridged the gap between PLDs and ASICs and also
they were highly configurable and had the fast design and modification times associated with PLDs. On
the other hand, they could be used to implement large and complex functions that had previously been
the domain only of ASICs (which were still required for the really large, complex, high-performance
Page 7 of 82
designs), but as FPGAs increased in sophistication they started to encroach further and further into ASIC
design space.
1.4 XilinxTMI Virtex-5 FPGA
Virtex-I is the newest generation FPGA from Xilinx. Virtex-5 family contains five distinct
platforms, the most choice offered by any FPGA family. Each platform contains a different ratio of
features to address the needs of a wide variety of advanced logic designs. In addition to the most
advanced, high performance logic fabric, Virtex-5 FPGAs contain many hard-IP system level blocks,
including powerful 36-Kbit block RAM/FIFOs, second generation 25*18 DSP slices. Also Virtex-5
offers the best solution for addressing the needs of high performance logic designers, high performance
DSP designers, and high performance embedded systems designers with unprecedented logic, DSP,
hard/soft microprocessor and connectivity capabilities. The Virtex-5 LX, LXT, SXT, TXT and FXT
platforms include high speed serial connectivity and link/transaction layer capability.
The 5 platforms are:
Virtex-5 LX: High performance general logic applications.
Virtex-5 LXT: High performance logic with advanced serial connectivity.
Virtex-5 SXT: High performance signal processing applications with advanced serial connectivity.
Virtex-5 TXT: High performance system with double density advanced and serial connectivity.
Virtex-5 FXT: High performance embedded systems with advanced serial connectivity.
1.4.1 Architectural Description
Virtex-5 devices are user-programmable gate arrays with various configurable elements and
embedded cores optimized for high-density and high-performance system designs. Virtex-5
devices implement the following functionality:
• I/O blocks provide the interface between package pins and the internal configurable logic.
Most popular and leading-edge I/O standards are supported by programmable I/O blocks
(IOBs). The IOBs can be connected to very flexible Chip Sync logic for enhanced source-
synchronous interfacing. Source-synchronous optimizations include per-bit deskew (on both
input and output signals), data serializers or deserializers, clock dividers, and dedicated I/O
and local clocking resources.
• Configurable Logic Blocks (CLBs), the basic logic elements for Xilinx FPGAs, provide
combinatorial and synchronous logic as well as distributed memory and SRL32 shift register
Page 8 of 82
capability. Virtex-5 FPGA CLBs are based on real 6-input look-up table technology and
provide superior capabilities and performance compared to previous generations of
programmable logic.
• Block RAM modules provide flexible 36 Kbit true dual port RAM that are cascadable to
form larger memory blocks. In addition, Virtex-5 FPGA block RAMs contain optional
programmable FIFO logic for increased device utilization. Each block RAM can also be
configured as two independent 18 Kbit true dual-port RAM blocks, providing memory
granularity for designs needing smaller RAM blocks.
• Clock Management Tile (CMT) blocks provide the most flexible, highest-performance
clocking for FPGAs. Each CMT contains two Digital Clock Manager (DCM) blocks (self-
calibrating, fully digital), and one PLL block (self-calibrating, analog) for clock distribution
delay compensation, clock multiplication/division, coarse- /fine-grained clock phase shifting,
and input clock jitter filtering.
1.4.2 Virtex-5 FPGA Features
Input/output Blocks (Select IO) IOBs are programmable and can be categorized as
Programmable single-ended or differential (LVDS) operation.
Input block with an optional single data rate (SDR) or double data rate (DDR) register.
Output block with an optional SDR or DDR register
Bidirectional block
Per-bit de skew circuitry
Dedicated I/O and regional clocking resources
Built-in data serializer/deserializer
The IOB registers are either edge-triggered D-type flip-flops or level-sensitive latches.
The Digitally Controlled Impedance (DCI) I/O feature can be configured to provide on-chip
termination for each single-ended I/O standard and some differential I/O standards.
Data serializer/deserializer capability is added to every I/O to support source-synchronous
Interfaces. A serial-to parallel converter with associated clock divider is included in the input
path, and a parallel-to-serial converter in the output path.
Configurable Logic Blocks (CLBs) A Virtex-5 FPGA CLB resource is made up of two slices. Each slice is equivalent and contains:
• Four function generators
• Four storage elements
• Arithmetic logic gates
• Large multiplexers Page 9 of 82
• Fast carry look-ahead chain
The function generators are configurable as 6-input LUTs or dual-output 5-input LUTs. In addition, the
four storage elements can be configured as either edge-triggered D-type flip-flops or level sensitive
latches. Each CLB has internal fast interconnect and connects to a switch matrix to access general
routing resources.
Block RAM The 36 Kbit true dual-port RAM block resources are programmable from 32K x 1 to 512 x 72, in
various depth and width configurations.
In addition, each 36-Kbit block can also be configured to operate as two, independent 18- Kbit
dual-port RAM blocks. Each port is totally synchronous and independent, offering three “read-
during-write” modes.
Block RAM is cascadable to implement large embedded storage blocks. Additionally, back-end
pipeline registers, clock control circuitry, built-in FIFO support, ECC, and byte write enable
features are also provided as options.
Global Clocking The CMTs and global-clock multiplexer buffers provide a complete solution for designing high-
speed clock networks. Each CMT contains two DCMs and one PLL. The DCMs and PLLs can
be used independently or extensively cascaded. Up to six CMT blocks are available, providing
up to eighteen total clock generator elements. Each DCM provides familiar clock generation
capability.
To generate de skewed internal or external clocks, each DCM can be used to eliminate clock
distribution delay. The DCM also provides 90°, 180°, and 270° phase-shifted versions of the
output clocks. Fine-grained phase shifting offers higher resolution phase adjustment with fraction
of the clock period increments. Flexible frequency synthesis provides a clock output frequency
equal to a fractional or integer multiple of the input clock frequency.
To augment the DCM capability, Virtex-5 FPGA CMTs also contain a PLL. This block provides
reference clock jitter filtering and further frequency synthesis options. Virtex-5 devices have 32
global-clock MUX buffers. The clock tree is designed to be differential. Differential clocking
helps reduce jitter and duty cycle distortion.
DSP48E Slices DSP48E slice resources contain a 25 x 18 two’s complement multiplier and a 48-bit adder/subs
tractor/accumulator. Each DSP48E slice also contains extensive cascade capability to efficiently
implement high-speed DSP algorithms.
Routing Resources All components in Virtex-5 devices use the same interconnect scheme and the same access to the global
routing matrix. In addition, the CLB-to-CLB routing is designed to offer a complete set of connectivity
in as few hops as possible. Timing models are shared, greatly improving the predictability of the
performance for high speed designs.Page 10 of 82
Configuration Virtex-5 devices are configured by loading the bit stream into internal configuration memory using one
of the following modes:
• Slave-serial mode
• Master-serial mode
• Slave Select MAP mode
• Master Select MAP mode
• Boundary-Scan mode (IEEE-1532 and -1149)
• SPI mode (Serial Peripheral Interface standard Flash)
• BPI-up/BPI-down modes (Byte-wide Peripheral interface standard x8 or x16 NOR Flash)
System Monitor FPGAs are an important building block in high availability/reliability infrastructure. Therefore,
there is need to better monitor the on-chip physical environment of the FPGA and its immediate
surroundings within the system.
For the first time, the Virtex-5 family System Monitor facilitates easier monitoring of the FPGA
and its external environment. Every member of the Virtex-5 family contains a System Monitor
block.
The System Monitor is built around a 10-bit 200kSPS ADC (Analog-to-Digital Converter). This
ADC is used to digitize a number of on-chip sensors to provide information about the physical
environment within the FPGA. On-chip sensors include a temperature sensor and power supply
sensors. Access to the external environment is provided via a number of external analog input
channels. These analog inputs are general purpose and can be used to digitize a wide variety of
voltage signal types.
Support for unipolar, bipolar, and true differential input schemes is provided. There is full access
to the on-chip sensors and external channels via the JTAG TAP, allowing the existing JTAG
infrastructure on the PC board to be used for analog test and advanced diagnostics during
development or after deployment in the field.
The System Monitor is fully operational after power up and before configuration of the FPGA.
System Monitor does not require an explicit instantiation in a design to gain access to its basic
functionality. This allows the System Monitor to be used even at a late stage in the design cycle
1.4.3 Virtex-5 Ordering Information
XC5VFX100T-1FFG1738
Page 11 of 82
Pin count
Lead free
Logical capacity
Speed
Flip ChipVirtex 5
Xilinx
CHAPTER 2
QDR-II STATIC RAM
2.1 Introduction to Memories
Computer data storage, often called storage or memory, refers to computer components, devices,
and recording media that retain digital data used for computing for some interval of time. Computer data
storage provides one of the core functions of the modern computer, that of information retention.
Memory is directly accessible to CPU. The CPU continuously reads instructions stored there and
executes them as required. Any data actively operated on is also stored there in uniform manner. This
memory is mainly of two types RAM and ROM.
2.1.1 Random Access Memory
Random access memory (RAM) is a form of computer data storage. It takes the form of
integrated circuits that allows the stored data to be accessed in any order (i.e., at random). The word
random thus refers to the fact that any piece of data can be returned in a constant time, regardless of its
physical location and whether or not it is related to the previous piece of data. This contrasts with storage
mechanisms such as tapes, magnetic discs and optical discs, which rely on the physical movement of the
recording medium or a reading head. In these devices, the movement takes longer than the data transfer,
and the retrieval time varies depending on the physical location of the next item. The word RAM is
mostly associated with volatile types of memory, where the information is lost after the power is
switched off.
Modern types of writable RAM generally store a bit of data in either the state of a flip-flop, as in
SRAM (static RAM), or as a charge in a capacitor (or transistor gate), as in DRAM (dynamic RAM),
EPROM, EEPROM and Flash. Some types have circuitry to detect and/or correct random faults called
memory errors in the stored data, using parity bits or error correction codes. RAM of the read-only type.
As both SRAM and DRAM are volatile, other forms of computer storage, such as disks and magnetic
tapes, have been used as persistent storage in traditional computers.
2.1.2 Read Only Memory
Page 12 of 82
Embedded Power Processor
Read-only memory (usually known by its acronym, ROM) is a class of storage media used in
computers and other electronic devices. Because data stored in ROM cannot be modified (at least not
very quickly or easily), it is mainly used to distribute firmware (software that is very closely tied to
specific hardware, and unlikely to require frequent updates). ROM is fabricated with the desired data
permanently stored in it, and thus can never be modified. However, more modern types such as EPROM
and flash EEPROM can be erased and re-programmed multiple times; they are still described as "read-
only memory" (ROM) because the reprogramming process is generally infrequent, comparatively slow,
and often does not permit random access writes to individual memory locations. There are different
types of ROM Classic mask programmed ROM chips are integrated circuits that physically encode the
data to be stored, and thus it is impossible to change their contents after fabrication
1. Programmable read-only memory (PROM), or one-time programmable ROM (OTP), can be
written to or programmed via a special device called a PROM programmer. Typically, this device uses
high voltages to permanently destroy or create internal links (fuses or anti fuse) within the chip.
Consequently, a PROM can only be programmed once.
2. Erasable programmable read-only memory (EPROM) can be erased by exposure to strong
ultraviolet light (typically for 10 minutes or longer), then rewritten with a process that again requires
application of higher than usual voltage. Repeated exposure to UV light will eventually wear out an
EPROM, but the endurance of most EPROM chips exceeds 1000cycles of erasing and reprogramming.
EPROM chip packages can often be identified by the prominent quartz "window" which allows UV light
to enter. After programming, the window is typically covered with a label to prevent accidental erasure.
Some EPROM chips are factory erased before they are packaged, and include no window: these are
effectively PROM.
3. Electrically erasable programmable read-only memory (EEPROM) is based on a similar
semiconductor structure to EPROM but allows its entire contents (or selected banks) to be electrically
erased, then rewritten electrically, so that they need not be removed from the computer (or camera, MP3
player, etc). Writing or flashing an EEPROM is much slower (milliseconds per bit) than reading from a
ROM or writing to a RAM (nanosecond in both cases).
4. Electrically alterable read-only memory (EAROM) is a type of EEPROM that can be modified
one bit at a lime. Writing is a very slow process and again requires higher voltage (usually around 12V)
than is used for read access. EAROMs are intended for applications that require infrequent and only
partial rewriting. EAROM may be used as non-volatile storage for critical system setup information; in
many applications, EAROM has been supplanted by CMOS RAM supplied by mains power and backed-
up with a lithium battery.
5. Flash memory (or simply flash) is a modern type of EEPROM invented in 1984. Flash memory
can be erased and rewritten faster than ordinary EEPROM, and newer designs feature very high Page 13 of 82
endurance (exceeding 1,000,000 cycles). Modern NAND flash makes efficient use of silicon chip area,
resulting in individual ICs with a capacity as high as 16 GB as of 2007; this feature, along with its
endurance and physical durability, has allowed NAND flash to replace magnetic in some applications
(such as USB flash drives). Flash memory is sometimes called flash ROM or flash EEPROM when used
as a replacement for older ROM types, but not in applications that take advantage of its ability to be
modified quickly and frequently.
2.2 Introduction to QDRII SRAM
The QDR consortium (Cypress, Renesas, IDT, NEC, and Samsung) defined and developed the
Quad Data Rate (QDR) SRAM technology for high-performance communications applications. The
QDRII SRAM architecture provides dedicated input and output ports that independently operate at
double data rate (DDR). This results in four data transfers per clock cycle and overcomes bus contention
issues. QDR SRAM devices were developed in response to the demand jar higher bandwidth memories
targeted at networking and telecommunications applications.
The basic QDR architecture has independent read and write data paths for simultaneous
operation. Both paths use Double Data Rate (DDR) transmission to deliver two words per clock cycle,
one word on the rising clock edge and another on the falling edge. The result is that four bus-widths of
data (two read and two write) are transferred during each clock period, hence the name quad data rate.
QDR memory devices are offered in both 2-word burst and 4word burst architectures. The 2-word burst
devices transmit two words per read or write request. A DDR address bus is used to allow Read requests
during the first half of the clock period and Write requests during the second half of the clock period. In
contrast, 4-word burst devices transmit four words per Read or Write request, and hence only require a
Single Data Rate (SDR) address bus to maximize data bandwidth. Read and Write operations must be
requested on alternating clock cycles (i.e., on-overlapping), allowing the address bus to be shared.
One of the unique features of the QDRII architecture is the echo-clock (CQ) output that is
frequency locked to the device input clock (K) but edge aligned to the data transmitted on the Read path
outputs (Q). The CQ clock output is retimed to align with the Q data outputs using a delay-locked loop
(DLL) circuit internal to the QDRII memory device. This clock forwarding, or source-synchronous,
method of interface allows greater timing margin. It also enables the simple and elegant direct-clocking
methodology used in this reference design, discussed in detail in this application note. The QDRII
reference design is composed of four main elements:
I. User Interface
II. Physical Interface
III. Read/Write State Machine
Page 14 of 82
IV. Delay Calibration State Machine
The user interface uses a simple protocol Based entirely on SDR signals to make Read/Write
requests. This module is constructed primarily from FIFO16 primitives and is used to store the address
and data values for Read/Write operations before and after execution.
The Read/Write state machine is responsible for monitoring the status of the First in first out
(FIFO) within the user interface module, coordinating the flow of data between the user interface and
physical interface, and initializing the actual Read/Write commands to the external memory device. It
ensures execution of Read/Write operations with minimal latency in a concurrent manner as per the
requirements of the QDR II memory specification.
The physical interface is responsible for generating the proper timing relationships and DDR
signaling to communicate with the external memory device in a manner that conforms to its command
protocol and timing requirements.
The delay calibration state machine is an integral component of the direct-clocking methodology
used to achieve maximum performance while greatly simplifying the task of read data capture inside the
FPGA. The delay calibration state machine leverages this unique capability to adjust the timing of the
read data returning from the memory device so that it can be synchronized directly to the global FPGA
system clock without any complex local-clocking or data recapture techniques.
The reference diagram of QDR-II is shown below as follows.
QDRII User Interface Physical Interface Memory
Device
FIFO Status
Read/Write Control
Address path
Write Path
Read Path
CLK_DIV4
Page 15 of 82
USER_CLK0USER_RESET
USER_W_nUSER_R_nUSER_QEN_n
USER_AD_WRUSER_AD_RD
USER_BW_nUSER_DWLUSER_DWH
USER_QRLUSER_QRH
USER_WR_FULLUSER_RD_FULLUSER_QR_EMPTY
USER_CLK0 USER_CLK270 USER_RESET
QDR_W_n QDR_R_n
QDR_SA
QDR_BW_n QDR_D
QDR_CD QDR_D
QDR_K QDR_K_n
Read/Write State Machine
Delay Calibration
State Machine
Fig. 2.1: QDR II Reference Design
2.3 Implementation of QDRII SRAM with Virtex-4 PRO FPGA
The QDR II reference design was implemented to take advantage of the unique
capabilities of the Virtex-4 family. Advances in I/O, clocking, and storage element technology
enable the high-performance, turnkey operation of this design. The following sections describe
the design implementation in further detail.
2.3.1 User Interface
The user interface module utilizes six FIFO16 blocks to store the address and data values
for Read/Write operations. For Write commands, three FIFO16 blocks are used, one to store the
Write address (USER_AD_WR) and byte write enable (USER_BW_n) signals, and two to store
the Low (USER DWL) and High (USER DWH) 36-bit data words to be written to the memory.
Read commands also use three FIFO16 blocks, one to store the Read address (USER_AD_RD)
and two to store the Low (USER_QRL) and High (USER_QRH) 36-bit data words returning
from the memory as a result of the Read execution. The Read/Write state machine manages the
interleaving of Read and Write requests to the external memory device, relieving the user
interface of this responsibility.
2.3.2 Read/Write State Machine
This state machine is responsible for coordinating the flow of data between the user
interface and physical interface. It initiates the Read/Write commands to the external memory
device based on the requests stored in the user interface FIFOs.
A USER_RESET always returns the state machine to the INIT state; where memory
operations are suspended until the delay calibration state machine has completed adjusting the
delay on the IDELAY blocks for all of the QDR_Q inputs to center align the Read path data to
the FPGA system clock, USER_CLK0. Completion of the calibration operation is signaled by an
active-High DLY_CAL_DONE input that transitions the Read/Write state machine to the Idle
state to await Read/Write requests from the user interface. From the Idle state, Write commands
take precedence on the presumption that a Write to memory must always occur before there is
any valid Read data. When there are no Read or Write requests pending, the stale machine loops
in the Idle state.
A Write request pending in the user interface FIFOs causes transition to the Write stale where a
Write command is initiated via the internal WR_INIT_n strobe. This strobe pulls the Write address and Page 16 of 82
data values from the FIFO and results in the initiation of the external QDR_W_n Write control strobe to
the memory device. Assuming there is a pending Read request, the state machine then transitions to the
Read state where the internal RD_INIT_n strobe is activated. This strobe pulls the Read address from
INIT USER_RESET START_CAL=1
DLY_CAL_DONE
(FIFO_WR_EMPTY, FIFO_RD_EMPTY) | (FIFO_WR_EMPTY ⋅,FIFO_QR_FULL) IDLE
FIFO_WR_EMPTY
FIFO_WR_EMPTY (FIFO_WR_EMPTY ⋅, )
FIFO_RD_EMPTY | FIFO_QR_FULL
FIFO_WR_EMPTY WRITE READ WR_INIT_n = 0 RD_INIT_n=0
Figure2.2: 4-Word Burst Read/Write State Machine
DLY_CAL_DONE
INIT USER_RESET START_CAL=1
DLY_CAL_DONE (FIFO_WR_EMPTY • FIFO_RD_EMPTY) | (FIFO_WR_EMPTY • FIFO_QR_FULL
IDLE
/FIFO_WR_EMPTY | (FIFO_WR_EMPTY • FIFO_RD_EMPTY) | (/FIFO_RD_EMPTY • /FIFO_QR_FULL) FIFO_WR_EMPTY. FIFO_QR_FULL
READ/WRITE WR_INIT_n = 0?
rd_init_n = 0? /FIFO_WR_EMPTY) |
Page 17 of 82
(/FIFO_RD_EMPTY • /FIFO_QR_FULL)
Fig. 2.3: A 2-word burst read/write state machine
the FIFOs and launches an external QDR_R_n strobe to the memory device. Capture of the return values
in the Read data FIFOs also occurs as a result of this process.
The Read/Write slate machine continuously monitors the user interface FIFO status signals to
determine if there are any pending Read/Write requests. A continuous flow of concurrent Read/Write
requests causes the state machine to simply alternate between the Read and Write states, ensuring
properly interleaved requests to the external memory. A stream of Write requests results in alternating
Idle and Write stales. While a stream of Read requests similarly alternates between Idle and Read slates.
The operation of a 2-word burst state machine is quite similar to the 4-word burst slate machine,
with the exception that a single READ_IVRITE state manages the Read and Write requests to the
memory. All 2-word burst QDR 11 memory devices allow Read and Write requests to occur on the same
clock cycle, allowing these operations to be initialed from the same state.
The state diagram for 4 word burst read/write and 2 word burst read/write are shown below.
2.3.3 Physical Interface
The Physical Interface of the QDRII reference design generates the actual I/O signaling and
timing relationships for communication of Read/Write commands to the external memory device,
including the DDR data signals. It provides the necessary timing margins and 1/0 signaling standards
required to meet the overall design performance specifications.
2.4 Functional Description of QDRII SRAM
The CY7C15JJV18, CY7C1526V18, CY7C1513V18, andCY7C1515V18 are 1.8V
Synchronous Pipelined SRAMs, equipped with QDRII architecture. QDRII architecture consists of two
separate ports to access the memory array. The Read port has dedicated Data Outputs to support Read
operations and the Write Port has dedicated Data Inputs to support Write operations. QDRII architecture
has separate data inputs and data outputs to completely eliminate the need to "turn-around" the data bus
required with common I/O devices. Access to each port is accomplished through a common address bus.
Addresses jar Read and Write addresses are latched on alternate rising edges of the input (K) clock.
Accesses to the QDRII Read and Write ports are completely independent of one another. In order
to maximize data throughput, both Read and Write ports are equipped with Double Data Rate (DDR)
interfaces. Each address location is associated with four 8-bit words (CY7CI5JlVI8) or 9-bit words
Page 18 of 82
(CY7CI526VI8) or I8-bit words (CY7CI5I3VI8) or 36-bit words (CY7CI5I5VI8) that burst sequentially
into or out of the device. Since data can be transferred into and out of the device one very rising edge of
both input clocks (K and K and C and C), memory bandwidth is maximized while simplifying system
design by eliminating bus "turn-around"
Fig. 2.4: Logic diagram of CY7C1515V18
Depth expansion is accomplished with Port Selects for each port, Port selects allow each port to
operate independently. All synchronous inputs pass through input registers con/rolled by the K or K
input docks. All do/a outputs pass through output registers controlled by the C or C (or K or K in a
single clock domain) input docks, Writes ore conducted with on-chip synchronous self-timed write
circuitry.
2.4.1 Pin Definitions
Pin Name I/O Pin Description
D[x: 0]Input-
Synchronous
Data input signals, sampled on the rising edge of K and clocks during valid write operations.
CY7C 1511V18-D[7:0]CY7C 1526V18-D[8:0]CY7C1513V18-D[17:0]CY7CI515V18-D[35:0]
Input-Synchronous
Write Port Select, active LOW. Sampled on the rising edge of the K clock. When asserted active, a write operation is initiated. Disserting will deselect the Write port. Deselecting tile Write port Will cause D[x: 0] to be ignored.
Page 19 of 82
,
Input-Synchronous
Nibbl_Write Select 0, 1-active LOW. (CY7C1511V18 Only) Sampled on the rising edge of the K and K clocks during write operations. Used to select which nibble is
written into the device controls D [3:0] and
controls D [7:4] the entire Nibble write Selects are sample on the same edge as the data. Deselecting a Nibble Write Select will cause the corresponding nibble of data to be ignored and not written into the device.
, ,
,
Input-Synchronous
Byte write select 0, 1, 2 and 3-active low. Sampled on the rising edge of the k and clocks during write operations. Used to select which byte is written into the device during the current portion to the write operations. Bytes not written remain unaltered.
CY7C1526V18- controls D[8:0]
CY7C1513V18- controls D[8:0] and controls
D[17:9]
CY7C1515V18- controls D[8:0], controls
D[17:9], controls D[26:18], controls D[35:27]
A
Input-Synchronous
Address Inputs. Sampled on tile rising edge of the K clock during active read and write operations. These address inputs are multiplexed for both Read and Write operations. Internally, the device is organized as 8M x 8 (4 arrays each or 2M x 8) for CY7C151W18, 8M x 9 (4 arrays each of 2M x 9) for CY7C1526V18, 4M x 18(4 arrays each of 1M x 18) for CY7C1513V18 and 2M x 36 (4 arrays each or 512K x 36) for CY7C 1515V18. Therefore, only 21 address inputs are needed to access the entire memory array of CY7C 1511Vl8 and CY7C1526V18, 20 address Inputs for CY7C1513V18 and 19 address inputs for CY7C1515V18.These inputs are ignored when the appropriate port is deselected
Q[x: 0]Outputs-
Synchronous
Data Output signals. These pins drive out the requested data during a Read operation. Valid data is driven out on the rising edge of both the C and C clocks during Read operations or and K. when in single clock mode, When the Read port is deselected, Q[x: 0] are automatically tri-stated. CY7C1511V18 -Q[7:0] CY7C1525V18 -Q[18:0] CY7C1513V18-Q[17: 0] CY7C1515V18-Q[35:0]
Input-Synchronous
Read Port Select, active LOW. Sampled on the rising edge of Positive Input Clock (K). When active, a Read operation is initiated. Deasserting Will cause the Read port to be
Page 20 of 82
deselected. When deselected, the pending access is allowed to complete and the output drivers are automatically tri-stated following the next rising edge of tile C clock. Each read access consists of a burst of four sequential transfers.
C Input-Clock
Positive Input Clock for Output Data. C is used in conjunction with to clock out the Read data from the
device. C and can be used together to deskew the flight tunes of various devices on the board back to the controller. See application example for further details
Input-Clock
Negative Input Clock for Output data. is used in conjunction with C to clock out the Read data from the device. C and can be used together to deskew the flight times or various devices on the board back to the controller. See application example for further details
K Input-ClockPositive Input Clock Input: The rising edge of k is used to capture synchronous inputs to the device and to drive out data through Q[x: 0] when in single clock mode. All accesses are initiated on the rising edge of K.
Input-ClockNegative Input Clock Input: is used synchronous inputs being presented to the devices and to drive out data through Q[x: 0] when in single clock mode.
CQ Echo Clock
CQ is referenced with respect to C. This is a free running clock and is synchronized to the input clock for output data (C) of the QDR-II. In the single clock mode. CQ is generated with respect to K. The timings for the echo clocks are shown in the AC timing table.
Echo Clock
is referenced with respect to : This is free running clock and is synchronized to the Input clock for output data (
) of the QDR-II. In the Single clock mode. is
generated with respect to . The timings for the echo clocks are shown in the AC Tuning table.
ZQ Input
Output Impedance Matching Input. Thus input is used to turn the device outputs to the system data bus impedance. CQ, and Q[x: 0] output impedance are set to 0.2 x RQ, where RQ is a resistor connected between ZQ and ground. Alternately, this pin can be connected directly to VDDQ, which enables the minimum impedance mode. This pin cannot be connected directly to GND or left unconnected.
InputDLL Turn Off- Active LOW. Connecting this pin to ground will turn off the DLL inside the device. The timing in the DLL turned off operation will be different from those listed in this data sheet
TDO Output TDO for JTAG.
Page 21 of 82
TCK Input TCK pin for JTAG.
TDI Input TOI pin for JTAG.
TMS Input TMS pin for JTAG.
NC N/A Not connected to the die. Can be tied to any voltage level.
Vss/144M Input Address expansion for 144M. Can be tied to any voltage level.
Vss/288M Input Address expansion for 288M. Can be tied to any voltage level.
Vref Input ReferenceReference Voltage Input. Static input used to set the reference level for HSTL inputs and Outputs as well as AC measurement points.
VDD Power Supply Power supply inputs to the core of the device.
Table 2.1: Pin definitions
2.5 Functioning Mechanism of QDRII SRAM
The CY7CI511V18, CY7C1526V18, CY7C1513V18, CY7C1515V18 are synchronous pipelined
Burst SRAMs equipped with both a Read Port and a Write Port. The Read port is dedicated to Read
operations and the Write Port is dedicated to Write operations. Data flows into the SRAM through the
Write port and out through the Read Port. These devices multiplex the address inputs in order to
minimize the number of address pins required. By having separate Read and Write ports, the QDRII
completely eliminates the need to" turn-around" the data bus and avoids any possible data contention,
thereby simplifying system design. Each access consists of four 8-bit data transfers in the case of
CY7C1511V18, four 9-bit data transfers in the case of C17CI526VI8, four 18-bit data transfers in the
case of CY7CI513VI8, and four 36-bit data in the case of C17C1515V18 transfers in two clock cycles.
Accesses for both ports are initiated on the Positive Input Clock (K). All synchronous input
timing is referenced from the rising edge of the input clocks (K and K) and all output timings referenced
to the output clocks (C and C or K and K when in single clock mode).
All synchronous data inputs (D[x:0]) inputs pass through input registers controlled by the input
clocks (K and K), All synchronous data outputs (Q[x:0]) outputs pass through output registers controlled
by the rising edge of the output clocks (C and C or K and K when in single-clock mode).
All synchronous control (RPS, WPS, BWSx:O) inputs pass through input registers controlled
by the rising edge of the input clocks (K and K).CY7CI513VI8 is described in the following sections.
2.5.1 Read Operations
Page 22 of 82
The CY7CI513VI8 is organized internally as 4 arrays of 1M x18. Accesses are completed in a
burst of four sequential 18-bitdata words. Read operations are initiated by asserting RPS active at the
rising edge of the Positive Input Clock (K). The address presented to Address inputs is stored in the
Read address register. Following the next K clock rise, the corresponding lowest order 18bit word of
data is driven onto the Q [17:0] using C as the output timing reference. On the subsequent rising edge of
C the next 18-bit data word is driven onto the Q [17:0]. This process continues until all four 18-bit data
words have been driven out onto Q [17:0]. The requested data will be valid 0.45 ns from the rising edge
of the output clock (C or C or (K or K when in single-clock mode)). In order to maintain the internal
logic, each read access must be allowed to complete. Each Read access consists of four 18-bit data
words and takes 2 clock cycles to complete. Therefore, Read accesses to the device cannot be initiated
on two consecutive clock rises. The internal logic of the device will ignore the second Read request
Read accesses can be initiated one very other K clock rise. Doing so will pipeline the data flow such that
data is transferred out of the device on every rising edge of the output clocks (C and C or K and K when
in single-clock mode).
When the read port is deselected, the CY7CI5I3VI8 will first complete the pending read
transactions. Synchronous internal circuitries will automatically tri-state the outputs following the next
rising edge of the Positive Output Clock (C). This will allow for a seamless transition between devices
without the insertion of wait states in a depth expanded memory.
2.5.2 Write Operations
Write operations are initiated by asserting WPS active at the rising edge of the Positive input
Clock (K). On the following K clock rise the data presented to D[I7:0] is latched and stored into the
lower I8-bit Write Data register, provided BWS[1:0] are both asserted active. On the subsequent rising
edge of the Negative Input Clock (K) the information presented to D [I7:0] also stored into the Write
Data Register, provided BWS [1:0] are both asserted active. This process continues for one more cycle
until four I8-bit 'words (a total of 72 bits) of data are stored in the SRAM. The 72 bits of data are then
written into the memory array at the specified location. Therefore, Write accesses to the device cannot
be initiated on two consecutive K clock rises. The internal logic of the device will ignore the second
Write request. Write accesses can be initiated on every other rising edge of the Positive Input Clock (K).
Doing so will pipeline the data flow such that 18bits of data can be transferred into the device on every
rising edge of the input clocks (K and K).
When deselected, the write port will ignore all inputs after the pending Write operations have
been completed.
2.5.3 Byte Write Operations
Page 23 of 82
Byte Write operations are supported by the CY7CI 513VI8. A write operation is initiated as
described in the Write Operation section above. The bytes that are written are determined by BWS0 and
BWS1, which are sampled with each set of 18-bitdata words. Asserting the appropriate Byte Write
Select input during the data portion of a write will allow the data being presented to be latched and
written into the device. Deasserting the Byte Write Select input during the data portion of a write 'will
allow the data stored in the device for that byte to remain unaltered. This feature can be used to simplify
Read/Modify/Write operations to a Byte Write operation. Even CY7C1515V18 also supports byte write
which is determined by BWS0, BWS1, BWS2, and BWS3.
2.5.4 Single Clock Mode
The CY7CJ513VI8 can be used with a single clock that controls both the input and output
registers. In this mode the device will recognize only a single pair of input clocks (K and K) that controls
both the input and output registers. This operation is identical to the operation if the device had zero
skew between the K/K and C/C clocks. All timing parameters remain the same in this mode. To use this
mode of operation, the user must tie C and C HIGH at power on. This function is a strap option and not
alterable during device operation.
2.5.4 Concurrent Transactions
The Read and Write ports on the CY7Cl5J3V18 operate completely independently of one
another. Since each port latches the address inputs on different clock edges, the user can Read or Write
to any location, regardless of the transaction on the other port. If the ports access the same location when
a read follows a write in successive clock cycles, the SRAM will deliver the most recent information
associated with the specified address location. This includes forwarding data from a Write cycle that was
initiated on the previous K clock rise.
Read accesses and Write access must be scheduled such that one transaction is initiated on any
clock cycle. If both ports are selected on the same K clock rise, the arbitration depends on the previous
state of the SRAM If both ports were deselected, the Read port will take priority, If a Read was initiated
on the previous cycle, the Write port will assume priority (since Read operations cannot be initiated on
consecutive cycles). If a Write was initiated on the previous cycle, the Read port will assume priority
(since Write operations cannot be initiated on consecutive cycles). Therefore, asserting both ports selects
active from a deselected state will result in alternating Read/Write operations being initiated, with the
first access being a Read
2.5.6 Depth Expansion
The CY7C1513V18 has a Port Select input for each port. This allows for easy depth expansion.
Both Port Selects are sampled on the rising edge of the Positive Input Clock only (K).Each port select
Page 24 of 82
input can deselect the specified port. Deselecting a port will not affect the other port. All pending
transactions (Read and Write) will be completed prior to the device being deselected.
2.5.7 Programmable Impedance
An external resistor, RQ, must be connected between the ZQ pin on the SRAM and VSS to allow
the SRAM to adjust its output driver impedance. The value of RQ must be 5X the value of the intended
line impedance driven by the SRAM, The allowable range of RQ to guarantee impedance matching with
a tolerance of ±15% is between 175Ω and 350Ω, with VDDQ = 1.5V. The output impedance is
adjusted every I024 cycles upon power up to account for drifts in supply voltage and temperature.
2.5.8 Echo Clocks
Echo clocks are provided on the QDR-II to simplify data capture on high speed systems. Two
echo clocks are generated by the QDR-II. CQ is referenced with respect to C and CQ is referenced with
respect to C. These are free running clocks and are synchronized to the output clock of the QDR-II. In
the single clock mode, CQ is generated with respect to K and CQ is generated with respect to K. The
timings for the echo clocks are shown in the AC liming table.
2.5.9 Delay Lock loops
These chips utilize a Delay Lock Loop (DLL) that is designed to function between 80 MHz and
the specified maximum clock frequency. During power-up, when the DOFF is tied HIGH, the DLL gels
locked after 1024 cycles of stable clock. The DLL canal so be reset by slowing or stopping the input
clock K and K for a minimum of 30 ns. However, it is not necessary for the DLL to be specifically reset
in order to lock the DLL to the desired frequency. The DLL will automatically lock 1024 clock cycles
after a stable clock is presented. The DLL may be disabled by applying ground to the DOFF pin. For
information refer to the application note "DLL Considerations in QDRIITM/DDRII/QDRII+/DDRII+”.
Page 25 of 82
CHAPTER 3
XILINX AND MODEL SIM
3.1 Xilinx Overview
The Integrated Software Environment (ISE'"'V is the Xilinx® design software suite that allows
you to take your design from design entry through Xilinx device programming. The ISE Project
Navigator manages and processes your design through the following steps in the ISE design flow.
3.2 Project Navigator Overview
Project Navigator organizes design files and runs processes to move the design from design entry
through implementation to programming the targeted Xilinx® device. Project Navigator is the high-level
manager for Xilinx FPGA and CPLD designs, which allows doing the following:
1. Add and create design source files, which appear in the Sources window
2. Modify your source files in the Workspace
3. Run processes on your source files in the Processes window
4. View output from the processes in the Transcript window
Optionally, we can run processes from a script created or from a command line prompt.
However, it is recommended that we first become familiar with the basic use of the Xilinx Integrated
Software Environment (ISETM) software and with project management.
Project navigator main window is divided into four (4) types of sub windows, they are as follows:
1. Tool bar
Page 26 of 82
2. Sources window
3. Processes window
4. Workspace
5. Transcript window
From the figure below on the top left is the Sources window which hierarchically displays the
elements included in the project. Beneath the Sources window is the Processes window, which displays
available processes for the currently selected source. The third window at the bottom of the Project
Navigator is the Transcript window which displays status messages, errors, and warnings and also
contains interactive tabs for Tcl scripting and the Find in Files function. The fourth window to the right
is a multi-document interface (MDI) window referred to as the Workspace. It enables you to view html
reports, ASCII text files, schematics, and simulation waveforms.
3.2.1 Project Navigator Main Window
Fig 3.1: Project Navigator Main Window
Page 27 of 82
3.3 ISE Design flow
The ISE Project Navigator manages and processes the design through the following steps in the
ISE design flow.
3.3.1 Design Entry
Design entry is the first step in the ISE design flow. During design entry, one creates the source
files based on design objectives. Also we can create the top-level design file using a Hardware
Description Language (HDL), such as VHDL, Verilog, or ABEL, or using a schematic. Use multiple
formats for the lower-level source files in the design.
If we are working with a synthesized EDIF or NGCINGO file, then skip design entry and
synthesis and start with the implementation process.
3.3.2 Synthesis
After design entry and optional simulation, run synthesis. During this step, VHDL, Verilog, or
mixed language designs become netlist files that are accepted as input to the implementation step.
3.3.3 Implementation
After synthesis, run design implementation, which converts the logical design into a physical file
format that can be downloaded to the selected target device. From Project Navigator, run the
implementation process in one step, or run each of the implementation processes separately.
Implementation processes V(fly depending on whether we are targeting a Field Programmable Gate
Array (FPGA) or a Complex Programmable Logic Device (CPLD).
3.3.4 Verification
It verifies the functionality of the design at several points in the design flow. Then we can use
simulator software to verify the functionality and timing of the design or a portion of design. The
simulator interprets VHDL or Verilog code into circuit functionality and displays logical results of the
described HDL to determine correct circuit operation. Simulation allows creating and verifying complex
functions in a relatively small amount of time and also run in circuit verification after programming
device.
3.3.5 Device Configuration
After generating a programming file, configure the device. During configuration, generate
configuration files and download the programming files from a host computer to a Xilinx device.
IMPACT tool Overview
Page 28 of 82
IMPACT, is a tool featuring batch and graphical user interface (GUI) operations, allows you to
perform the following functions: Device Configuration and File Generation.
The Device Configuration enables you to directly configure Xilinx® FPGAs or program Xilinx
CPLDs and PROMs with the Xilinx cables (MutiPRO Desktop Tool, Parallel Cable IV, or Platform
Cable USB) in various modes. In the Boundary-Scan mode, Xilinx FPGAs, CPLDs, and PROMs com be
configured or programmed. In the Slave Serial or Select MAP configuration modes only FPGAs can be
configured directly. In the Desktop Configuration mode Xilinx CPLDs or PROMs can be programmed.
In the Direct SPI Configuration mode select SPJ serial flash (STMicro: M25P, M25PE, M45PE or
Atmel: AT45DB) can be programmed.
File Generation enables you to create the following types of programming files; System ACE
CF, PROM, SVF, STAPL, and XSVF files.
IMPACT also enables us to do the following:
1. Read back and verify design configuration data
2. Debug configuration problems
3. Execute SVF and XSVF files
Fig 3.2: Hardware interconnection
3.3.6 FPGA Design flow
Page 29 of 82
Design VerificationDesign Entry
Design Synthesis
Design Implementation
Behavioral Simulation
Functional Simulation
Static Timing Analysis
Timing Simulation
Back Annotation
Fig 3.3: FPGA Design Flow
3.4 Core Generator
The CORE Generator TM is a design tool that delivers parameterized Intellectual Property (IP)
optimized for Xilinx-FPGAs.
The CORE Generator provides ready-made functions which include:
1. FIFOs and memories
2. Reed-Solomon Decoder and Encoder
3. Fir filters
4. FFTs
5. Standard bus interfaces such as PCI and PCI-X,
Connectivity and networking interfaces (Ethernet, SPJ-4.2, Rapid IO, CAN and PCI Express).
3.4.1 Memory Interface Generator
This Memory Interface Generator (AIIG) is a simple menu driven tool to generate advanced
memory interfaces. DDR2 SDRAM, DDR SDRAM DDRII SRANM, QDRII SRAM, and RLDRAM II
are supported. This tool generates HDL and pin placement constraints that will help us design our
application
3.4.2 Memory Interface Generator
Interfacing QDRII SRAM with MIG
The Figure below shows a top-level block diagram a/the QDRII memory controller. One side of
the QDRII memory controller connects to the user interface denoted as Block Application. The other
side of the controller interfaces to QDRII memory. The memory interface data width is selectable.
Page 30 of 82
Xilinx Device Programming
In-circuit Verification
QDR-II Memory Controller
Block Application
QDR-II Memory
Fig. 3.4: QDR-II Memory Controller
Data is double-pumped to QDRJJ SRAM on both the positive and the negative clock edges. The
HSTL_18 Class I/O standard is used for the data, address, and control signals. QDR-II SRAM interfaces
are source-synchronous and double data rate like DDR SDRAM interfaces. The key advantage to QDR-
II devices is they have separate data buses for reads and writes to SRAM. These rams are faster and
more protected from error and faults.
Interface model
The memory interface is layered to simplify the design and make the design modular-The Figure
below shows the layered memory interface in the QDRII memory controller-The three layers are the
application layer, the implementation layer, and the physical layer
The application layer comprises the user interface, which initiates memory
writes and reads by writing data and memory addresses to the User Interface
FIFOs. The implementation layer comprises the infrastructure, datapath, and
control logic.
1. The infrastructure logic consists of the DCM and reset logic generation circuitry.
2. The datapath logic consists of the calibration logic by which the data from the
memory component is captured using the FPGA clock.
3. The control logic determines the type of data transfer that is, read/write with
the memory component, depending on the User Interface FIFO’s status signals.
Page 31 of 82
User Interface
Implementation Layer
Infrastructure Data path Control
Physical Layer
Fig. 3.5: Interface layering model
The physical layer comprises the I/O elements of the FPGA. The controller
communicates with the memory component using this layer. The I/O elements
(such as IDDRs, ODDRs, and IDELAY elements) are associated with this layer.
Hierarchy
The above figure shows the hierarchical structure of the QDRII SRAM design generated by MIG
with a test bench and a DCM. The modules are classified as follows:
Design modules
1. Test bench modules
2. Clocks and reset generation modules parameters selected from MIG.
MIG can generate QDRII SRAM designs in four different ways:
1. With a test bench and a DCM
2. Without a test bench and with a DCM
3. With a test bench and without a DCM
4. Without a test bench and without a DCM
Design clocks and resets are generated in the infrastructure_top module. When the use DCM
option is checked in MIG, a DCM primitive and the necessary clock buffers are instantiated in the
infrastructure_top module. The inputs to this module are the differential design clock and a 200 MHz
differential clock required for the IDELAYCTRL module. A user reset is also input to this module.
Using the input clocks and reset signals, the system clocks and the system resets used in the design are
generated in this module. When the Use DCM option is unchecked in MIG, the infrastructure _top
module does not have the DCM and the corresponding clock buffer instantiations; therefore, the system
operates on the user-provided clocks. The system reset is generated in the infrastructure top module
using the DCM_LOCK signal and the ready signal of the IDELAYCTRL element.
Page 32 of 82
Fig. 3.8: QDR·II SRAM Controller Hierarchy
3.5 Chip Scope Pro
After configuring the device, debug the FPGA design using Chip scope™ Pro software. From the
Project Navigator Processes tab, double-click Analyze Design Using Chip scope to launch the Chip
scope Pro Analyzer. To use this process, purchase the Xilinx@ Chip scope Pro software and must design
with debug and verification in mind, as described in the following sections. Chip Scope Pro comprises
the Chip Scope Pro cores in the CORE Generator, the Chip Scope Pro Core Inserter, and the Chip Scope
Pro Analyzer.
We use Chip Scope Pro to test the interfacing logic on the hardware i.e., Virtex 5 FPGA by
analyzing the user interface signals which include PPD interface and PLB interface. These signals are
captured using the FIFOs implemented in FPGA and sent to the display interface on PC using JTAG.
3.5.1 Chip Scope Pro Design Flow Overview
To use the Chip Scope Pro software to perform in-circuit verification, we should do the
following:
Page 33 of 82
1. Insert Chip scope Pro cores in the design using the CORE Generator or Core Inserter.
2. Implement the design in Project Navigator and configure device.
3. Analyze the design using the Chip Scope Pro Analyzer
3.5.2 Chip scope Pro Core Insertion
It is used to insert Chip scope Pro cores in the design with the Chip scope Pro tools using one of
the following methods:
1. During design entry using the CORE Generator.
Using the CORE Generator software we create the cores and instantiate those in HDL
source file. Use this software to generate all of the cores available in the Chip scope Pro system. The
wizard provided to create NGC net lists with HDL instantiation templates for any of the supported
synthesis tools. Then use the templates to connect the Chip scope Pro cores to the design logic.
2. After the Synthesize process in Chip scope Pro Core Inserter.
Using the Chip scope Pro Core Inserter to create the ILA, ATC2, and ICON cores and insert
them in a post-synthesis netlist.
Projects saved in the Core Inserter hold all relevant information about source files, destination
files, core parameters, and core settings. This allows you to store and retrieve information about core
insertion between sessions. The project file (.cdc extension) can also be used as an input to the Analyzer
to import signal names.
Fig. 3.9: Core Inserter as Launched from Project Navigator
Page 34 of 82
2.5.3 Chip scope Pro Cores
Chip scope Pro allows embedding the following cores within design, which assist with
on-chip debugging: integrated logic analyzer (ILA), integrated bus analyzer (IBA), and virtual
input/output (VIO) low-profile software cores. These cores allow viewing internal signals and
nodes in FPGA, including the IBM® Core Connect" processor local bus (PLB) that supports the
IBM PowerPC TM 405. Following are the Chip scope Pro cores and their functions:
1. ICON
The Integrated Controller (ICON) core provides the communication between the
embedded ILA, IBA, and VIO cores and the computer running the Chip scope Pro
Analyzer software.
2. ILA
The ILA core is a customizable logic analyzer core that can be used to monitor the
internal signals in design. Because the ILA core is synchronous to the design being
monitored, all design clock constraints applied to design are also applied to the
components inside the ILA core.
3. ATC2
The Agilent Trace Core 2 (ATC2) is a customizable logic analyzer core. This is similar to
the ILA core but does not use on-chip Block RAM resources to store captured trace data. The
ATC2 core synchronizes Chip scope Pro to the Agilent FPGA dynamic probe technology,
delivering the first integrated application for FPGA debug with logic analyzers.
4. VIO
The virtual input/output core is a customizable core that can both monitor and
drive internal FPGA signals in real lime. Unlike the ILA and IBA cores, the VIO core
does not require on chip RAM.
Page 35 of 82
Fig. 3.10: Chip scope Pro Cores
3.5.4 Chip scope Pro Analyzer
The Chip scope Pro Analyzer tool interfaces directly to the Chip scope Pro cores. Using this
software to download designs, set Trigger conditions, and display data. The waveforms, lists, or graphs,
can be shown and values can be tokenized.
3.6 ModelSim
ModelSim provides a comprehensive simulation and debug environment for complex ASIC and
FPGA designs. Support is provided for multiple languages including Verilog, System Verilog, VHDL
and SystemtC. It also provides an integrated flow with the Model Technology ModelSim simulator
which enables simulation to run from the Xilinx Project Navigator graphical user interface.
Pathnames pane values pane waveform pane
Page 36 of 82
CHAPTER 4
HARDWARE BOARD DESCRIPTION
4.1 Board Overview
The hardware on which we are working is a subsystem on single board which is used in the
processing of signals intercepted. II consists of Xilinx" FPGAs, Optical transceivers, cPCI Interface,
Memories (DDR SDRAM and QDRII SRAM) and Ethernet Interface.
4.1.1 Requirement of this Board
Before designing this board as many as 16 independent boards were used for the purpose. But
due to the advances in VLSI technology, all these are now integrated onto a single board. This is highly
advantageous as the board thus developed is smaller in size and the speed of operation is faster.
IV.1.2Board Block Diagram
cPCI Backplane
Fig. 4.1: Board Block Diagram
Page 38 of 82
Virtex-II Pro
XC2VP7
PPD Main
Address Control & Data
De interleavedPDW
36MB QDR-II SRAM
Virtex-4XC4LX100
Virtex-II Pro XC2VP7
10/100 Ethernet
PHY
8MB Flash
Memory
128MB DDR
SDRAM
cPCI Bridge
4.2 Signal Interception
4.2.1 Block Diagram
Fig. 4.2: Signal Reception Block diagram
4.2.2 Signal Reception
An ESM system comprises of a receiver which intercepts pulses from various sources of
emission in the environment and determines the pulse parameters which mainly include Frequency,
Pulse Width, Direction of Arrival (DOA) and Amplitude, Using these parameters, it builds the Pulse
Descriptor (PD) Word which is a digitized form of the pulse information. The PD Words are transmitted
by the receiver over optical fiber to the ESM Processor, The function of the ESM Processor is to receive
the interleaved PD Words, de-interleave them, build the emitter file and send the information to display.
This de-interleaving is done at two levels, which are operated independently.
The PD Words received over optical fiber in serial form is converted to parallel form by the Multi
Gigabit Transceiver (MGT) core of the Virtex-II Pro FPGA. The speed of operation of this MGT core is
3.125 Gbits/sec. This Virtex-II Pro FPGA then sends parallel data to Virtex-4 FPGA for 1st level de-
interleaving.
4.3 1st level De-interleaving
In first level de-interleaving, the PD Words are de-interleaved by Virtex-4 Pro FPGA based on
Intra pulse parameters-Frequency, DOA, and Pulse Width and stored in memory. Here, Virtex-4 Pro
used is a logic extensive FPGA. The de-interleaved PD Words are stored in the dual port memory
(SRAM) using the concept of Content Addressable Memory (CAM). This memory has both write and
read independent ports so we can do both read and/or write operation at a time.
Here we are using quad data rate SRAM which does 4 operations in one clock cycle i.e., two
write and two read operation here one of the operations is done at the rising edge of the clock pulse and
other one at the falling edge of the clock pulse.
Page 39 of 82
PPD Rear IO Module
Virtex-II Pro XC2VP7
The simple block diagram of 1st level of de-interleaving is shown below
PDW
Fig 4.3: 1st level de-interleaving
4.4 Memory details
4.4.1 Content Addressable Memory
Content-addressable memory (CAM) is a special type of computer memory used in certain very
high speed searching applications, Unlike standard computer memory (RAM) in which the user supplies
a memory address and the RAM returns the data word stored at that address, a CAM is designed such
that the user supplies a data word and the CAM searches its entire memory to see if that data word is
stored anywhere in it. If the data word is found, the CAM returns a list of one or more storage addresses
where the word was found. In case the word is not found, then it stores it in a new location and returns
the location address. Thus, a CAM is the hardware embodiment of what in software terms would be
called an associative array.
4.4.2 Why is memory required?
Memory is used to solve the speed synchronization problems caused due to a mismatch in the
access rates of the lst and 2nd levels of de-interleaving. Ist level deinterleaving uses. Hardware which
processes and stores one PD Word at a time, hence it is faster. A 2nd level de-interleaving use Software
which reads a group of de-interleaved PD Words at a time and processes them. Hence it is slower. .
So, a memory is required to store the de-interleaved PD words outputted by the 1st stage. For the
purpose of storage of PD Words de-interleaved in Ist level, dual port memory is used, which is required
Page 40 of 82
36MB QDR-II SRAM
Virtex-4XC4LX100
Virtex-II Pro XC2VP7
to be independently accessed by the two processes. This type of a memory improves performance by
reducing the memory access conflicts between the two levels, and thus Increases the speed of operation.
Due to high speed memory access requirements, Quad Data Rate SRAMs are used which have
independent read and write ports. This SRAM ideally suits the requirement as there are independent
ports for writing and reading.
4.5 2nd level De-interleaving
In second level de-interleaving, the de-interleaved PD Words are read from memory and
processed to extract the emitter parameters which mainly include Frequency, Pulse Width, DoA,
Amplitude, and Pulse Repetition Frequency (PRF). With the help of these parameters, the emitter file is
built and sent to display via Ethernet.
Fig. 4.4: 2nd level de-interleaving
4.6 Scope of the Project
Our Project involves implementation of an algorithm in VHDL, to control SRAM
memory access using Virtex-4 FPGA. So, we here develop a logic for interfacing Virtex-
4 Pro FPGA with QDR-II SRAM For designing and simulation testing the logic, we will
be using XilinxTM ISE vI0.l.
Interleaved
PD Data
Page 41 of 82
Virtex-4XC4LX100
36MB QDR-II SRAM
Virtex-II Pro XC2VP7
Viterx-4 Pro FPGA PPD
SRAM Interfacing Logic
in VHDL
SRAM
Fig. 4.5: SRAM interfacing logic
The code is implemented in two phases,
1. Write Cycle: Interfacing QDRII with PPD.
2. Read Cycle: Interfacing QDRII with Emitter Processor Software
Virtex-4 Pro
Fig. 4.6: Software interface developed in VHDL
Page 42 of 82
PPDS/W
Interface QDR-II Memory
Controller
Emitter Processor PLB i/f
QDR-II Memory
V2 PRO
CHAPTER 5
VHDL INTERFACE DESIGN AND STATE DIAGRAM
5.1 Write-Read State Machines
Fig. 5.1: Read-Write Interfaces developed
The above diagram tells about the read write interface developed. The VHDL language is written
in the s/w interface. PPD is mainly used for writing into the QDR-II memory by the QDR-II memory
controller, and PLB is for reading the data inside the QDR-II memory.
VHDL Language – s/w interface
PPD – Writing
PLB – Reading
The PowerPC™ 405 core accesses high speed and high performance system resources through
Processor Local Bus (PLB) interfaces on the instruction and data cache controllers. The PLB interfaces
provide separate 32-bit address and 64-bit data buses for the instruction and data sides.
The PLB supports read and write data transfers between master and slave devices equipped with
a PLB bus interface and connected through PLB signals. Bus architecture supports multiple master and
slave devices. Each PLB master is attached to the PLB through separate address, read-data, and write-
data buses. PLB slaves are attached to the PLB through shared, but decoupled, address, read-data, and
write-data buses and a plurality of transfer control and status signals for each data bus.
Page 43 of 82
PPDS/W
Interface
Emitter Processor PLB i/f
QDR-II Memory
Controller
QDR-II Memory
V2 PRO
5.2 State Diagrams
Invalid State del_cal=’1’
hw_fifo_empty=’0’ and
user_wr_full=’0’
Fig. 5.2: Write cycle state diagram
reset = 1 dly_calc=1
proc_rd=0
proc_rd=1 proc_rd=1 proc_rd=1
proc_addr(1 down to 0)=11 proc_addr(1 downto 0_=01
proc_addr(1 downto)=10
proc_rd=0 proc_rd=0 proc_rd=0 test_w_n =1
Page 44 of 82
INIT_WR
IDLE_WR
LT_PDW_0_1
LT_PDW_2_3
WRFIFO_RD
IDLE_RD
LATCH_RD_ADDR
LT_W3
LT_W2
LT_W1
ACK_W0_GEN
INIT_RD
user_rd_full=0
user_qr_empty=0
Fig. 5.3: Read Cycle state diagram
CHAPTER 6
TEST RESULTS, CONCLUSION AND FUTURE SCOPE OF
WORK
6.1 Simulation Results in ModelSim
6.1.1 Write Cycle
The PPD logic and the processor PLB interface operate with external clock as reference, whereas
the QDRII SRAM Memory Controller operates at 166MHz which is the operating frequency of QDRII
SRAM device. The reset signal used is synchronous with respect to QDRII SRAM reference clock.
A signal with name, ‘dly_cal_done’ is an indicator signal which will indicate when the QDRII
SRAM device calibration is completed and is ready for access.
The logic uses a FIFO interface to store the processed PD Word which are written with a
minimum time of 200ns, which are to be written into QDRII SRAM device. We simulated this
requirement by generating a signal ‘wr_pulse’ every 200ns. We employed a counter generate the 128 bit
PD Word to be written the hardware address and hardware data into 5 FIFOs, 1for address and 4 for
data(32 bit each).
The QDRII SRAM operates at a clock rate of 166 Mhz. So, we take a user_clk equals to
166MHz. The write state machine remains idle till ‘dly_cal_done = 1’ condition has occurred. Once the
data is written into the FIFOs, the ‘hw_fifo_empty’ signal goes low signifying that there is a data present
in the PPD FIFO interface. As it goes low, at the next rising edge of the user clock, the state machine
mves into the ‘wrfifo_rd’ state and hardware read, ’hw_rd’ becomes high. The data and address are now
read from the FIFOSs into ‘qdr_wrdata’ and ‘hw_addr_out’ respectively. ‘qdr_wrdata’ which is a data
output of FIFOs is a 128 bit data line. The state machine next moves into ‘lt_pdw_0_1’ and subsequently
into ‘lt_pdw_2_3’ states. ‘user_w_n_i’ is an active low signal to latch the 128bit PD Word. ‘lt_pdw is a
Page 45 of 82
LATCH_EPW_2_3
RD_ADDR_Wr
WAIT_QR_EMPTY
LATCH_EPW_0_1
ACK_W3_GEN
ACK_W2_GEN
ACK_W1_GEN
2 bit vector which is ‘01’ for lower 64 bit data and ‘10’ for higher 64 bit data. ’user_dwl’ and
’user_dwh’ are two 32 bit data lines. Of the 64 bit data, the lower 32 bits are latched to ‘user_dwl’ and
higher 32 bits are latched to ‘user_dwh’. ’test_w_n_i’is the active low signal used to inhabit generation
of ‘user_r_n_i’ active low signal for read operation at the same time of ‘user_w_n_i’ signal generation.
Page 46 of 82
6.1.2 READ CYCLE
The read cycle is initiated by the Processor Local Bus (PLB). This bus is a 32-bit data bus. A
Read signal is generated every time a read operation is initiated by embedded PowerPc processor of
Virtex-II FPGA. These read request are simulated using VHDL and implemented in vertex-4 FPGA.
The read requests are generated every 2 microseconds. The ‘proc_rd’ signal goes high along with
address ‘proc_addr’, including the PLB for reading data from the QDR-II SRAM. Once the user
interface receives the address from PLB, it starts reading the data from the specified location onto its bus
(user interface). Once the data is present in user interface bus then it is latched onto the PLB data once
the ‘fifo_empty’ signal goes low. Then an acknowledgement signal is generated by the SRAM
suggesting that the data has been latched onto the PLB bus. Since the PLB bus is 32 bit data bus, unlike
in write cycle, only one word at a time is latched onto the PLB bus. As there are four PD Words (PDW)
to be read, it takes 4 read cycles to read them. When the ‘user_qr_empty’ signal low the first two words
(W0 and W1) are ready present on ‘user_qrl’ and ‘user_qrh’ respectively. This condition is known as
‘first word fall through’. So, the word W0 and W1 are latched onto the 128 bit ‘qdr_rddata’ bus when
‘user_qr_empty’ signal goes low. In the next clock cycle, W2 and W3 are latched onto ‘qdr_rddata’ bus,
and W0 is latched on the PLB data bus. PD Words W1, W2 and W3 are then latched in the next read
request cycles onto the PLB data bus, from ‘qdr_rddata’ bus. An acknowledgement signal ‘rd_ack’ is
generated by read state machine every time the data is latched onto the PLB bus.
Page 49 of 82
6.2 Hardware Verification using Chip Scope Pro
We use Chip scope Pro to test the interfacing logic on the hardware i.e., Virtex 5 FPGA by
analyzing the user interface signals which include PPD interface and PLB interface. These signals are
captured using the FIFOs implemented in FPGA and sent to the display interface on PC using JTAG.
PPD interface include the signal hw_data, hw_data and hw_addr. PLB interface include the signal
proc_data, proc_data, proc_addr, rd_ack. The debugging of this signal is done using Chip scope Pro
inserter by creating a definition and connection file (.cdc) to synthesized VHDL code.
The in-circuit verification of this signal is done using Chip scope Pro Analyzer (Refer fig.6.7). The main
windaw area can display multiple child windows ( such as trigger, waveform, listing, plot windows) at
the same time. Each window can be maximized, minimized resized and moved as needed. The signals
attached to Chip scope Pro Inserter Core as shown in the signal browser. The trigger setup window is
used to specify the condition for triggering and storing the data. The waveform window displays all the
signal which are sampled with respect to system clock of FPGA. The window is useful to analyze the
timings of the interface signals. Another window is the listing window (Refer fig.6.8) which display
interface buses which are stored using the storage qualification in the trigger setup.
6.2.1 Write Interface
To capture the data in the listing window for the PPD interface, we have added the signals hw_data1,
hw_data2, hw_data3, hw_data4. The storage condition used is the falling edge of hw_wr. The trigger
condition is an initial value of hw_data which is common for bath PPD and PLB interface. Using these
condition we captured the data which and exported it into an excel sheet for future reference.
6.2.2 Read Interface
To capture the data in the listing window for the PLB interface, we have added the signals proc_data,
proc_addr. The storage condition used is the falling edge of rd_ack. The trigger is an initial value of
hw_data which is common for both PPD and PLB interface. Using these condition we captured the data
and exported it into an excel for future reference.
From the two listing, we infer that the data written into the QDR-II memory and the data read from the
QDR-II memory match with one another. Hence, the interface designed by us fulfills the requirement of
the project.
Page 52 of 82
6.3 Conclusion
The VHDL code is written for the interface to control the SRAM memory access using Virtex-4
FPGA. The same has been verified using Modelsim simulation graphs and chip scope pro hardware
simulation in XilinxTM ISE 10.1. The result have been studied and verified. This interfacing logic enables
us to access SRAM with the highest possible speed which supports writing of continuous data input
stream at a rate of 640mbps. This interface logic can be utilized for interfacing QDR memory devices of
upcoming generation with improved technology. The interface enables us to attach to the PLB interface
of embedded PowerPC processor of Virtex family FPGAs with ease.
6.4 Future Scope of Work
The future scope of work for this project includes development of read and write interface
between the QDR-II Memory controller and QDR-II memory. Future projects involve implementation of
ESM processor which is an integration of PPD logic and EP software on a single chip. This helps in the
system on-chip implementation of ESM processor subsystem using a single FPGA (Virtex5 and above).
Page 56 of 82
APPENDIX-A: Program Code
1.1 Software interface code in VHDL
library IEEE;use IEEE.STD_LOGIC_1164.ALL;use IEEE.STD_LOGIC_ARITH.ALL;use IEEE.STD_LOGIC_UNSIGNED.ALL;use work.QDR2_SRAM_parameters_0.all;
---- Uncomment the following library declaration if instantiating---- any Xilinx primitives in this code.library UNISIM;use UNISIM.VComponents.all;
entity qdr_dpif is port(
user_clk0 : in std_logic;user_reset : in std_logic;dly_cal_done : in std_logic;
--PPD IF--------------------------clk_100 : in std_logic;hw_wr : in std_logic;hw_data : in std_logic_vector( 127 downto 0 );hw_addr : in std_logic_vector( 20 downto 0 );al_full : out std_logic;
proc_rd : in std_logic;proc_addr : in std_logic_vector( 22 downto 0 );proc_data : out std_logic_vector( 31 downto 0 );rd_ack : out std_logic;
user_w_n : out std_logic;user_r_n : out std_logic;user_ad_wr : outstd_logic_vector((ADDR_WIDTH_4D-1) downto 0);user_bwl_n : out std_logic_vector((BW_WIDTH-1) downto 0);user_bwh_n : out std_logic_vector((BW_WIDTH-1) downto 0);user_dwl : out std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);user_dwh : out std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);user_ad_rd : out std_logic_vector((ADDR_WIDTH_4D-1) downto 0);user_qen_n : out std_logic;compare_error : out std_logic;user_wr_full : in std_logic;user_rd_full : in std_logic;user_qrl : in std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);user_qrh : in std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);user_qr_empty : in std_logic
);end qdr_dpif;
Page 57 of 82
architecture Behavioral of qdr_dpif is
component synchroport(
reset : in std_logic; clock : in std_logic; sig_in : in std_logic; sig_out : out std_logic
);end component;
signal reset_r : std_logic;
constant unused : std_logic_vector(BW_WIDTH-1 downto 0) := (others => '0');
-- PPD HWDATA & HWADDR FIFO SIGNALSsignal qdr_wrdata : std_logic_vector( 127 downto 0 );signal data_al_full : std_logic_vector(3 downto 0);signal addr_al_full : std_logic;signal hw_data_empty : std_logic_vector(3 downto 0);signal hw_addr_empty : std_logic;signal hw_fifo_empty : std_logic;
signal hw_addr_in : std_logic_vector( 31 downto 0 );signal hw_addr_out : std_logic_vector( 31 downto 0 );
TYPE write_state_type is(INIT_WR,IDLE_WR,WRFIFO_RD,LT_PDW_0_1,LT_PDW_2_3
);
signal write_cs : write_state_type;signal write_ns : write_state_type;
signal hw_rd : std_logic;signal test_w_n_i : std_logic;signal user_w_n_i : std_logic;signal lt_hwdata : std_logic_vector(1 downto 0);
TYPE read_state_type is(INIT_RD,IDLE_RD,LATCH_RD_ADDR,RDADDR_WR,WAIT_Q_EMPTY,LT_EPW_0_1,LT_EPW_2_3_W0,ACK_W0_GEN,LT_W1,ACK_W1_GEN,
Page 58 of 82
LT_W2,ACK_W2_GEN,LT_W3,ACK_W3_GEN
);
signal read_cs : read_state_type;signal read_ns : read_state_type;
signal lt_rd_ad : std_logic;signal user_r_n_i : std_logic;signal user_qen_n_i : std_logic;
signal lt_q_0_1 : std_logic;signal lt_q_2_3 : std_logic;signal lt_word : std_logic_vector( 3 downto 0 );
signal proc_rd_sync : std_logic;signal proc_addr_sync : std_logic_vector( 22 downto 0 );
signal proc_data_i : std_logic_vector( 31 downto 0 );signal rd_ack_i : std_logic;
signal qdr_rddata : std_logic_vector(127 downto 0 );
signal byte_enb : std_logic_vector(7 downto 0);signal user_ad_rd_i : std_logic_vector((ADDR_WIDTH_4D-1) downto 0);signal user_ad_wr_i : std_logic_vector((ADDR_WIDTH_4D-1) downto 0);
begin
compare_error <= '0';
user_w_n <= user_w_n_i;user_r_n <= user_r_n_i;user_qen_n <= user_qen_n_i;
process (user_clk0) begin if(user_clk0'event and user_clk0 = '1') then
proc_data <= proc_data_i;rd_ack <= rd_ack_i;
end if; end process;
byte_enb <= "00000000";user_bwl_n <= byte_enb((BW_WIDTH-1) downto 0);user_bwh_n <= byte_enb((BW_WIDTH-1) downto 0);
process (user_clk0) begin if(user_clk0'event and user_clk0 = '1') then
Page 59 of 82
reset_r <= user_reset; end if; end process; --------------WR_SM-----------------------------------------------------------------------------------------
process (user_clk0) begin if(user_clk0'event and user_clk0 = '1') then if(reset_r = '1') then write_cs <= INIT_WR; else write_cs <= write_ns; end if; end if; end process;
process (write_cs, dly_cal_done, user_wr_full, hw_fifo_empty )begin
write_ns <= write_cs;
case write_cs iswhen INIT_WR =>
if(dly_cal_done = '1') thenwrite_ns <= IDLE_WR;
end if;
when IDLE_WR =>if( user_wr_full = '0' and hw_fifo_empty = '0' ) then
write_ns <= WRFIFO_RD;end if;
when WRFIFO_RD =>write_ns <= LT_PDW_0_1;
when LT_PDW_0_1 =>write_ns <= LT_PDW_2_3;
when LT_PDW_2_3 =>write_ns <= IDLE_WR;
when others => write_ns <= INIT_WR;
end case;end process;
with write_cs selecthw_rd <= '1' when WRFIFO_RD, '0' when others;
with write_cs selecttest_w_n_i <= '0' when LT_PDW_0_1,
Page 60 of 82
'1' when others;
with write_cs selectuser_w_n_i <= '0' when LT_PDW_2_3, '1' when others;
with write_cs selectlt_hwdata <= "01" when LT_PDW_0_1, "10" when LT_PDW_2_3,
"00" when others;
process(user_clk0)begin
if(user_clk0' event and user_clk0 = '1') thenif(reset_r = '1') then
user_dwl <= (others => '0');user_dwh <= (others => '0');
elsecase lt_hwdata is
when "01" => user_dwl <= X"0" & qdr_wrdata( 31 downto 0 );user_dwh <= X"0" & qdr_wrdata( 63 downto 32 );
when "10" => user_dwl <= X"0" & qdr_wrdata( 95 downto 64 );user_dwh <= X"0" & qdr_wrdata( 127 downto 96 );
when others => null;end case;
end if;end if;
end process;
--------------------------------------------------------------------------------------------------------------------------RD_SM-----------------------------------------------------------------------------------------
PROC_RD_SYNC_INST:synchro port map(
reset => reset_r, clock => user_clk0, sig_in => proc_rd, sig_out => proc_rd_sync
);
PROC_ADDR_SYNC_GEN: for i in 22 downto 0 generatePROC_ADDR_SYNC_INST:
synchro port map( reset => reset_r, clock => user_clk0, sig_in => proc_addr(i), sig_out => proc_addr_sync(i)
);end generate PROC_ADDR_SYNC_GEN;
process (user_clk0)Page 61 of 82
beginif(user_clk0'event and user_clk0 = '1') then
if(reset_r = '1') thenread_cs <= INIT_RD;
elseread_cs <= read_ns;
end if;end if;
end process;
process ( read_cs, dly_cal_done, proc_rd_sync, proc_addr_sync, test_w_n_i, user_rd_full, user_qr_empty )begin
read_ns <= read_cs;
case read_cs iswhen INIT_RD =>
if(dly_cal_done = '1') thenread_ns <= IDLE_RD;
end if;
when IDLE_RD =>if proc_rd_sync = '1' then
case proc_addr_sync(1 downto 0) iswhen "00" => read_ns <= LATCH_RD_ADDR;when "01" => read_ns <= LT_W1;when "10" => read_ns <= LT_W2;when "11" => read_ns <= LT_W3;when others => null;end case;
end if;
when LATCH_RD_ADDR =>if test_w_n_i = '1' and user_rd_full = '0' then
read_ns <= RDADDR_WR;end if;
when RDADDR_WR =>read_ns <= WAIT_Q_EMPTY;
when WAIT_Q_EMPTY =>if(user_qr_empty = '0') then
read_ns <= LT_EPW_0_1;end if;
when LT_EPW_0_1 =>read_ns <= LT_EPW_2_3_W0;
when LT_EPW_2_3_W0 =>read_ns <= ACK_W0_GEN;
when ACK_W0_GEN =>if proc_rd_sync = '0' then
Page 62 of 82
read_ns <= IDLE_RD;end if;
when LT_W1 =>read_ns <= ACK_W1_GEN;
when ACK_W1_GEN =>if proc_rd_sync = '0' then
read_ns <= IDLE_RD;end if;
when LT_W2 =>read_ns <= ACK_W2_GEN;
when ACK_W2_GEN =>if proc_rd_sync = '0' then
read_ns <= IDLE_RD;end if;
when LT_W3 =>read_ns <= ACK_W3_GEN;
when ACK_W3_GEN =>if proc_rd_sync = '0' then
read_ns <= IDLE_RD;end if;
when others =>read_ns <= INIT_RD;
end case;end process;
with read_cs selectlt_rd_ad <= '1' when LATCH_RD_ADDR,
'0' when others;
with read_cs selectuser_r_n_i <= '0' when RDADDR_WR,
'1' when others;
with read_cs selectuser_qen_n_i <= '0' when LT_EPW_0_1 | LT_EPW_2_3_W0,
'1' when others;
with read_cs selectlt_q_0_1 <= '1' when LT_EPW_0_1,
'0' when others;
with read_cs selectlt_q_2_3 <= '1' when LT_EPW_2_3_W0,
'0' when others;
with read_cs selectPage 63 of 82
lt_word <= "0001" when LT_EPW_2_3_W0,"0010" when LT_W1,"0100" when LT_W2,"1000" when LT_W3,"0000" when others;
with read_cs selectrd_ack_i <= '1' when ACK_W0_GEN | ACK_W1_GEN | ACK_W2_GEN | ACK_W3_GEN,
'0' when others;
process (user_clk0) begin if(user_clk0'event and user_clk0 = '1') then if(reset_r = '1') then
qdr_rddata <= (others => '0'); else
if lt_q_0_1 = '1' thenqdr_rddata( 63 downto 0) <= user_qrh(31 downto 0) &
user_qrl(31 downto 0);end if;if lt_q_2_3 = '1' then
qdr_rddata(127 downto 64) <= user_qrh(31 downto 0) & user_qrl(31 downto 0);end if;
end if; end if; end process;
process (user_clk0) begin if(user_clk0'event and user_clk0 = '1') then if(reset_r = '1') then
proc_data_i <= (others => '0'); else
case lt_word iswhen "0001" => proc_data_i <= qdr_rddata( 31 downto 0);when "0010" => proc_data_i <= qdr_rddata( 63 downto 32);when "0100" => proc_data_i <= qdr_rddata( 95 downto 64);when "1000" => proc_data_i <= qdr_rddata(127 downto 96);
when others => null;end case;
end if; end if; end process;
--------------------------------------------------------------------------------------------------------------------------ADDR_GEN0-------------------------------------------------------------------------------------
user_ad_rd <= user_ad_rd_i;user_ad_wr <= user_ad_wr_i;
Page 64 of 82
process (user_clk0) begin if(user_clk0'event and user_clk0 = '1') then if(reset_r = '1') then
user_ad_wr_i <= (others => '0'); elsif( test_w_n_i = '0' ) thenuser_ad_wr_i <= hw_addr_out((ADDR_WIDTH_4D-1) downto 0); end if; end if; end process;
process (user_clk0) begin if(user_clk0'event and user_clk0 = '1') then if(reset_r = '1') then
user_ad_rd_i <= (others => '0'); elsif lt_rd_ad = '1' thenuser_ad_rd_i <= proc_addr_sync((ADDR_WIDTH_4D+1) downto2); end if; end if; end process;--------------------------------------------------------------------------------------------------------------------------PPD_FIFO_IF-----------------------------------------------------------------------------------PPD_DATA : for I in 3 downto 0 generatebegin
DATA_FIFO : FIFO16generic map
(FIRST_WORD_FALL_THROUGH => false,ALMOST_FULL_OFFSET => X"00F",DATA_WIDTH => 36
)port map (
DI => hw_data(I*32+31 downto I*32),DIP => byte_enb(3 downto 0),RDCLK => user_clk0,RDEN => hw_rd,RST => reset_r,WRCLK => clk_100,WREN => hw_wr,ALMOSTEMPTY => open,ALMOSTFULL => data_al_full(I),DO => qdr_wrdata(I*32+31 downto I*32),DOP => open,EMPTY => hw_data_empty(I),FULL => open,RDCOUNT => open,RDERR => open,WRCOUNT => open,WRERR => open
Page 65 of 82
);end generate PPD_DATA;
hw_addr_in(20 downto 0) <= hw_addr;hw_addr_in(31 downto 21) <= (others => '0');
PPD_ADDR_FIFO : FIFO16generic map
(FIRST_WORD_FALL_THROUGH => false,ALMOST_FULL_OFFSET => X"00F",DATA_WIDTH => 36
)port map (
DI => hw_addr_in,DIP => byte_enb(3 downto 0),RDCLK => user_clk0,RDEN => hw_rd,RST => reset_r,WRCLK => clk_100,WREN => hw_wr,ALMOSTEMPTY => open,ALMOSTFULL => addr_al_full,DO => hw_addr_out,DOP => open,EMPTY => hw_addr_empty,FULL => open,RDCOUNT => open,RDERR => open,WRCOUNT => open,WRERR => open
);
al_full <= data_al_full(0) or data_al_full(1) or data_al_full(2) or data_al_full(3) or addr_al_full;
hw_fifo_empty <= hw_data_empty(0) or hw_data_empty(1) or hw_data_empty(2) or hw_data_empty(3) or hw_addr_empty;
end Behavioral;
1.2 VHDL code to generate input for testing
library IEEE;use IEEE.STD_LOGIC_1164.ALL;use IEEE.STD_LOGIC_ARITH.ALL;use IEEE.STD_LOGIC_UNSIGNED.ALL;
---- Uncomment the following library declaration if instantiating---- any Xilinx primitives in this code.--library UNISIM;
Page 66 of 82
--use UNISIM.VComponents.all;
entity hwdata_sim isPort (
clk_100 : in std_logic;reset : in std_logic;dly_cal_done : in std_logic;
--debughw_test_led : out std_logic;
hw_wr : out std_logic;hw_data : out std_logic_vector( 127 downto 0 );hw_addr : out std_logic_vector( 20 downto 0 );b0_full : in std_logic
);end hwdata_sim;
architecture Behavioral of hwdata_sim is
signal reset_r : std_logic;signal dly_cal_done_r : std_logic;
constant INIT : std_logic_vector( 5 downto 0 ) := "000001";constant IDLE : std_logic_vector( 5 downto 0 ) := "000010";constant WR_GEN : std_logic_vector( 5 downto 0 ) := "000100";constant DUMMY_ST : std_logic_vector( 5 downto 0 ) := "010000";constant INC_ADDR : std_logic_vector( 5 downto 0 ) := "100000";
signal current_state : std_logic_vector( 5 downto 0 );signal next_state : std_logic_vector( 5 downto 0 );
signal counter : std_logic_vector( 29 downto 0 );signal wr_count : std_logic_vector( 7 downto 0 );signal wr_pulse : std_logic;
signal hw_data1 : std_logic_vector( 31 downto 0 );signal hw_data2 : std_logic_vector( 31 downto 0 );signal hw_data3 : std_logic_vector( 31 downto 0 );signal hw_data4 : std_logic_vector( 31 downto 0 );
signal hw_data_i : std_logic_vector(127 downto 0 );signal hw_addr_i : std_logic_vector( 20 downto 0 );signal hw_wr_i : std_logic;
--debugsignal counter_dbg : std_logic_vector( 29 downto 0 );
signal hw_wr_dbg : std_logic;
signal del_1 : std_logic;
constant zeroes_23 : std_logic_vector( 22 downto 0 ) := (others => '0');Page 67 of 82
constant zeroes_30 : std_logic_vector( 29 downto 0 ) := (others => '0');constant zeroes_32 : std_logic_vector( 31 downto 0 ) := (others => '0');
begin
hw_data <= hw_data_i;hw_addr <= hw_addr_i( 20 downto 0 );hw_wr <= hw_wr_i;
process (clk_100) begin if(clk_100'event and clk_100 = '1') then reset_r <= reset;
dly_cal_done_r <= dly_cal_done; end if; end process; process (clk_100) begin if(clk_100'event and clk_100 = '1') then if reset_r = '1' or dly_cal_done_r = '0' or wr_pulse = '1' then
wr_count <= (others => '0'); else
wr_count <= wr_count + '1'; end if; end if; end process; wr_pulse <= '1' when wr_count = X"13" else
'0';
process (clk_100) begin if(clk_100'event and clk_100 = '1') then if(reset_r = '1') then current_state <= INIT; else current_state <= next_state; end if; end if; end process;
process (current_state, dly_cal_done, b0_full, wr_pulse )begin
next_state <= current_state;
case current_state iswhen INIT =>
if(dly_cal_done = '1') thennext_state <= IDLE;
end if;
when IDLE =>Page 68 of 82
if wr_pulse = '1' and b0_full = '0' thennext_state <= WR_GEN;
end if;
when WR_GEN =>next_state <= DUMMY_ST;
when DUMMY_ST =>next_state <= INC_ADDR;
when INC_ADDR =>next_state <= IDLE;
when others =>next_state <= INIT;
end case;end process;
hw_wr_i <= current_state(2);
process (clk_100) begin if(clk_100'event and clk_100 = '1') then if(reset_r = '1') then
counter <= (others => '0'); elsif( current_state(5) = '1' ) then
counter <= counter + '1'; end if; end if; end process;
hw_addr_i <= counter( 20 downto 0 );--hw_addr_i <= counter(2 downto 0) & counter( 21 downto 3 );
hw_data1 <= counter & "00";hw_data2 <= counter & "01";hw_data3 <= counter & "10";hw_data4 <= counter & "11";
hw_data_i <= hw_data4 & hw_data3 & hw_data2 & hw_data1;
--debug-----------
process (clk_100)begin
if(clk_100'event and clk_100 = '1') thencounter_dbg <= counter;hw_wr_dbg <= hw_wr_i;
end if;end process;
del_1 <= '1' when counter_dbg = zeroes_30 else'0';
Page 69 of 82
hw_test_led <= del_1 or hw_wr_dbg;
end Behavioral;
1.3 VHDL code to receive and view Output
library IEEE;use IEEE.STD_LOGIC_1164.ALL;use IEEE.STD_LOGIC_ARITH.ALL;use IEEE.STD_LOGIC_UNSIGNED.ALL;
---- Uncomment the following library declaration if instantiating---- any Xilinx primitives in this code.--library UNISIM;--use UNISIM.VComponents.all;
entity procdata_sim isPort (
clk_100 : in std_logic;reset : in std_logic;dly_cal_done : in std_logic;
proc_rd : out std_logic;proc_addr : out std_logic_vector( 22 downto 0 );proc_data : in std_logic_vector( 31 downto 0 );
test_done : out std_logic;rd_ack : in std_logic
);end procdata_sim;
architecture Behavioral of procdata_sim is
component synchroport(
reset : in std_logic; clock : in std_logic; sig_in : in std_logic; sig_out : out std_logic
);end component;
signal reset_r : std_logic;signal dly_cal_done_r : std_logic;
signal rd_count : std_logic_vector(12 downto 0);signal rd_pulse : std_logic;
constant INIT : std_logic_vector(7 downto 0) := "00000001";
Page 70 of 82
constant WRP_WAIT : std_logic_vector(7 downto 0) := "00000010";constant IDLE : std_logic_vector(7 downto 0) := "00000100";constant RD_GEN : std_logic_vector(7 downto 0) := "00001000";constant WAIT_RDACK : std_logic_vector(7 downto 0) := "00010000";constant CNT_CHK : std_logic_vector(7 downto 0) := "00100000";constant INC_ADDR : std_logic_vector(7 downto 0) := "01000000";constant RST_BRST : std_logic_vector(7 downto 0) := "10000000";
signal current_state : std_logic_vector(7 downto 0);signal next_state : std_logic_vector(7 downto 0);
signal word_count : std_logic_vector(4 downto 0);
signal counter : std_logic_vector(22 downto 0);signal proc_addr_i : std_logic_vector(22 downto 0);signal proc_rd_i : std_logic;
signal proc_data_sync : std_logic_vector(31 downto 0);signal rd_ack_sync : std_logic;
signal proc_data_i : std_logic_vector( 31 downto 0 );
TYPE ackgen_state_type is(IDLE_ACK,WAIT_FOR_ACK,ACK_GEN
);
signal ackgen_cs : ackgen_state_type;signal ackgen_ns : ackgen_state_type;
signal qdr_mem_ack : std_logic;
--debugsignal proc_rd_dbg : std_logic;signal proc_addr_dbg : std_logic_vector(22 downto 0);signal rd_ack_sync_dbg : std_logic;signal proc_data_dbg : std_logic_vector(31 downto 0);
signal del_1 : std_logic;signal del_2 : std_logic;
constant zeroes_23 : std_logic_vector( 22 downto 0 ) := (others => '0');constant zeroes_32 : std_logic_vector( 31 downto 0 ) := (others => '0');
begin
proc_addr <= proc_addr_i( 22 downto 0 );proc_rd <= proc_rd_i;
process (clk_100) begin if(clk_100'event and clk_100 = '1') then
Page 71 of 82
reset_r <= reset;dly_cal_done_r <= dly_cal_done;
end if; end process;
RD_ACK_SYNC_INST:synchro port map(
reset => reset_r, clock => clk_100, sig_in => rd_ack, sig_out => rd_ack_sync
);
process (clk_100) begin if(clk_100'event and clk_100 = '1') then if(reset_r = '1' or dly_cal_done_r = '0') then ackgen_cs <= IDLE_ACK; else ackgen_cs <= ackgen_ns; end if; end if; end process;
process (ackgen_cs, proc_rd_i, rd_ack_sync ) begin
ackgen_ns <= ackgen_cs;
case ackgen_cs iswhen IDLE_ACK =>
if proc_rd_i = '1' AND rd_ack_sync = '0' thenackgen_ns <= WAIT_FOR_ACK;
end if;
when WAIT_FOR_ACK =>if rd_ack_sync = '1' then
ackgen_ns <= ACK_GEN; end if;
when ACK_GEN =>if proc_rd_i = '0' then
ackgen_ns <= IDLE_ACK;end if;
when others =>ackgen_ns <= IDLE_ACK;
end case;end process;
with ackgen_cs selectqdr_mem_ack <= '1' when ACK_GEN, '0' when others;
Page 72 of 82
PROC_DATA_SYNC_GEN: for i in 31 downto 0 generatePROC_DATA_SYNC_INST:
synchro port map( reset => reset_r, clock => clk_100, sig_in => proc_data(i), sig_out => proc_data_sync(i)
);end generate PROC_DATA_SYNC_GEN;
process (clk_100) begin if(clk_100'event and clk_100 = '1') then if reset_r = '1' or dly_cal_done_r = '0' or rd_pulse = '1' then
rd_count <= (others => '0'); else
rd_count <= rd_count + '1'; end if; end if; end process; rd_pulse <= '1' when rd_count = X"14D" else
'0';
process (clk_100) begin if(clk_100'event and clk_100 = '1') then if(reset_r = '1') then current_state <= INIT; else current_state <= next_state; end if; end if; end process;
process (current_state, dly_cal_done_r, rd_pulse, qdr_mem_ack, word_count ) begin
next_state <= current_state;
case current_state iswhen INIT =>
if(dly_cal_done_r = '1') thennext_state <= IDLE;
end if;
when IDLE =>if rd_pulse = '1' then
next_state <= RD_GEN; end if;
when RD_GEN =>next_state <= WAIT_RDACK;
Page 73 of 82
when WAIT_RDACK =>if qdr_mem_ack = '1' then
next_state <= CNT_CHK;end if;
when CNT_CHK =>if word_count(4) = '1' then
next_state <= RST_BRST;else
next_state <= INC_ADDR;end if;
when INC_ADDR =>next_state <= RD_GEN;
when RST_BRST =>next_state <= IDLE;
when others =>next_state <= INIT;
end case;end process;
proc_rd_i <= current_state(3) or current_state(4);
process (clk_100) begin if(clk_100'event and clk_100 = '1') then if( reset_r = '1' or current_state(7) = '1' ) then
word_count <= "00001"; elsif( current_state(6) = '1' ) then
word_count <= word_count + '1'; end if; end if; end process;
process (clk_100) begin if(clk_100'event and clk_100 = '1') then if(reset_r = '1' ) then
counter <= (others => '0'); elsif( current_state(6) = '1' or current_state(7) = '1' ) then
counter <= counter + '1'; end if; end if; end process;
proc_addr_i <= counter( 22 downto 0 );--proc_addr_i <= counter(3 downto 2) & counter( 22 downto 4 ) & counter(1 downto 0);
process (clk_100) begin if(clk_100'event and clk_100 = '1') then
Page 74 of 82
if reset_r = '1' thenproc_data_i <= (others => '0');
elsif proc_rd_i = '1' thenproc_data_i <= proc_data_sync;
end if; end if; end process;
process (clk_100)begin
if(clk_100'event and clk_100 = '1') thenproc_rd_dbg <= proc_rd_i;proc_addr_dbg <= proc_addr_i;rd_ack_sync_dbg <= qdr_mem_ack;proc_data_dbg <= proc_data_i;
end if;end process;
del_1 <= '1' when proc_addr_dbg = zeroes_23 else
'0';
del_2 <= '1' when proc_data_dbg = zeroes_32 else'0';
test_done <= proc_rd_dbg or proc_rd_dbg or del_1 or rd_ack_sync_dbg or del_2;
end Behavioral;
Page 75 of 82
APPENDIX B: NAMES OF DIFFERENT MODULES AND THEIR
PURPOSE IN QDR-IISRAM CONTROLLER
1) Filename: qdr_sram.vhd
Purpose: This is the main module of the controller. It has clock forwarding logic, delay calibration
logic to capture data and synchronize it to the FPGA clock, interface controller logic.
2) Filename: mig_23_idelay_ctrl.vhd
Purpose: This module implements the delay generation for Calibration circuit.
3) Filename: qdr_sram_infrastructure_top.vhd
Purpose: This module incorporates Clock generation module, and Reset logic.
4) Filename: qdr_sram_main_0.vhd
Purpose: Top level example design incorporating QDRII Memory Controller module, an example
Clock generation module, and Reset logic.
5) Filename: qdr_sram_top_0.vhd
Purpose: Top level module for QDR-II memory controller design. This is the main module that
should be instantiated into a new FPGA design (along with all sub-modules) to implement a QDRII
interface.
6) Filename: qdr_sram_user_interface_0.vhd
Purpose: Responsible for storing the Read/Write requests made by the user design. Instantiates, the
FIFOs for Read and Write address, data, and control storage
7) Filename: qdr_sramJd_user_interface_0.vhd
Purpose: Responsible for storing the Read requests made by the user design Instantiates the FIFOs
for Read address, data, and control storage.
8) Filename: qdr_sramJd_addr_interface_0.vhd
Purpose: Responsible for storing the Read requests made by the user design. Instantiates the FIFOs
for Read address and control storage.
9) Filename: qdr_sramJd_data_interface_0.vhd
Purpose: Responsible for storing the Read requests made by the user design. Instantiates, the FIFOs
for Read data storage.
Page 76 of 82
10) Filename: qdr_sram_data_fifo_mem_0.vhd
Purpose: Responsible for storing the Write/Read requests made by the user design. Instantiates, the
FIFOs for Write/Read data storage
11) Filename: qdr_sram_wr_user_interface_0.vhd
Purpose: Responsible for storing the Write requests made by the user design. Instantiates the FIFOs
for Write address, data, and control storage.
12) Filename: qdr_sram_wr_addr_interface_0.vhd
Purpose: Responsible for storing the Write requests made by the user design. Instantiates the FIFOs
for Write address and control storage.
13) Filename: qdr_sram_wr_data_interface_0.vhd
Purpose: Responsible for storing the Write requests made by the user design. Instantiates the FIFOs
for write data storage.
14) Filename: qdr_sram_data_fifo_18_0.vhd
Purpose: Responsible for storing the Write/Read requests made by the user design. Instantiates, the
FIFOs for Write/Read data storage.
15) Filename: qdr_sram_data_bw_fifo_0.vhd
Purpose: Responsible for storing the Write/Read requests made by the user design. Instantiates, the
FIFOs for Write/Read data storage.
16) Filename: qdr_sram_qdr_mem_sm_0.vhd
Purpose: Monitors Read/Write queue status from User Interface FIFOs and generates strobe
signals to launch Read/Write requests to QDR II device.
17) Filename: qdr_sram_iobs_0.vhd
Purpose: This module implements the physical interface for the Write data path generates the write
path (QDR-II) from the WRITE data FIFOs to the OBUFs.
18) Filename: qdr_sram_c1ockjorward_0.vhd
Purpose: This module implements the physical interface for the clock path generates the forwarded
clocks (K and K) for the QDR-II SRAM Memory device. This scheme is used to match the Clock-to-Out
delays of the data path.
19) Filename: qdr_sram_ctrUobs_0.vhd
Purpose: This module implements the physical interface for the memory control signals.
Page 77 of 82
20) Filename: qdr_sram_address_burst_0.vhd
Purpose: This module is a part of physical interface. It describes the way the FF's and OBUFT's
need to be instantiated in order to present the address to the external memory:
21) Filename: qdr_sram_qdrJd_enable.vhd
Purpose: This module generates QDR_R_n (Read Enable) and QDR_W_n (Write Enable)
for QDR memory.
22) Filename: qdr_sram_bw_burst_0.vhd
Purpose: This module implements the physical interface for the Byte Write enable path Generates
the byte write path (BW_n) from the WRITE address FIFO to the OBUFs.
23) Filename: qdr_sram_data_path_iobs_0.vhd
Purpose: This module implements the physical interface for the Write data, read data path.
24) Filename: qdr_sram_qdr_d_iob_0.vhd
Purpose: This module transfers the data from memory to FIFO'S.
25) Filename: qdr_sram_qdr_cq_iob_0.vhd
Purpose: This module implements the delaying of echo clock CQ.
26) Filename: qdr_sram_qdr_q_iob_0.vhd
Purpose: This captures data from memory.
27) Filename: qdr_sram_data_path_0.vhd
Purpose: This module acts as an interface between the users and IOBs.
28) Filename: qdr_sramJead_ctrl_0.vhd
Purpose: This module generates QDR_R_n (Read Enable for QDR memory) and strobe for READ
FIFO.
29) Filename: qdr_sram_tap_logic_0.vhd
Purpose: This module implements the tap generation for the Read path (QDR_Q).
30) Filename: qdr_sram_dly_cal_sm.vhd
Purpose: Calibrates the IDELAY tap values for the QDR_Q inputs to allow direct capture of the
read data into the system clock domain.
31) Filename: qdr_sram_data_tap_inc.vhd
Purpose: This module implements the tap selection controller for data bits associated with a strobe.
Page 78 of 82
32) Filename: qdr_sram_write_burst_0.vhd
Purpose: This module implements the physical interface for the Write path. Generates the write
path (QDR_D)ji-ol17 the WRITE data FIFOs to the OBUFs.
33) Filename: qdr_sram_test_hench_0.vhd
Purpose: This module implements a hardware test bench that will issue interleaved Read and Write
requests to the QDR II memory device.
34) Filename: qdr_sram_wr_rd_sm_0.vhd
Purpose: This module implements a state machine for issuing Read/Write requests to the QDR II
memory device.
35) Filename: qdr_sram_q_sm_0.vhd
Purpose: This module implements a state machine for reading back values from read data FIFO'S
and comparing the values generated in test bench and also serves as an error detection module to make
sure that the data returning from the memory is same as the data written to it.
36) Filename: qdr_sram_data_gen_0.vhd
Purpose: This module implements a data generator that generates data for Read and Write requests
to the QDR II memory device
37) Filename: qdr_sram_addr_gen_0.vhd
Purpose: The module is a part of internal test bench It generates addresses for both read and
write.
Page 79 of 82
APPENDIX C: CHIP SCOPE PRO LISTING OF WRITE CYCLE
Sample in Window hw_addr_23_2 hw_data1 hw_data2 hw_data3 hw_data4
1 000000 00000000 00000001 00000002 000000032 000001 00000004 00000005 00000006 000000073 000002 00000008 00000009 0000000A 0000000B4 000003 0000000C 0000000D 0000000E 0000000F5 000004 00000010 00000011 00000012 000000136 000005 00000014 00000015 00000016 000000177 000006 00000018 00000019 0000001A 0000001B8 000007 0000001C 00000000 0000001E 0000001F9 000008 00000020 00000021 00000022 0000002310 000009 00000020 00000025 00000026 0000002711 00000A 00000028 00000029 0000002A 0000002B12 00000B 0000002C 0000002D 0000002E 0000002F13 00000C 00000030 00000031 00000032 0000003314 00000D 00000034 00000035 00000036 0000003715 00000E 00000038 00000039 0000003A 0000003B16 00000F 0000003C 0000003D 0000003E 0000003F17 000010 00000040 00000041 00000042 0000004318 000011 00000044 00000045 00000046 0000004719 000012 00000048 00000049 0000004A 0000004B20 000013 0000004C 0000004D 0000004E 0000004F21 000014 00000050 00000051 00000052 0000005322 000015 00000054 00000055 00000056 0000005723 000016 00000058 00000059 0000005A 0000005B24 000017 0000005C 0000005D 0000005E 0000005F25 000018 00000060 00000061 00000062 0000006326 000019 00000064 00000065 00000066 0000006727 00001A 00000068 00000069 0000006A 0000006B28 00001B 0000006C 0000006D 0000006E 0000006F29 00001C 00000070 00000071 00000072 0000007330 00001D 00000074 00000075 00000076 0000007731 00001E 00000078 00000079 0000007A 0000007B32 00001F 0000007C 0000007D 0000007E 0000007F33 000020 00000080 00000081 00000082 0000008334 000021 00000084 00000085 00000086 0000008735 000022 00000088 00000089 0000008A 0000008B36 000023 0000008C 0000008D 0000008E 0000008F37 000024 00000090 00000091 00000092 0000009338 000025 00000094 00000095 00000096 0000009739 000026 00000098 00000099 0000009A 0000009B40 000027 0000009C 0000009D 0000009E 0000009F41 000028 000000A0 000000A1 000000A2 000000A342 000029 000000A4 000000A5 000000A6 000000A743 00002A 000000A8 000000A9 000000AA 000000AB44 00002B 000000AC 000000AD 000000AE 000000AF45 00002C 000000B0 000000B1 000000B2 000000B346 00002D 000000B4 000000B5 000000B6 000000B7
Page 80 of 82
APPENDIX D: CHIP SCOPE PRO LISTING OF READ CYCLE
Sample in Window proc_addr proc_data
1 000000 00000000 2 000001 00000001 3 000002 00000002 4 000003 00000003
5 000004 000000046 000005 000000057 000006 000000068 000007 000000079 000008 0000000810 000009 0000000911 00000A 0000000A12 00000B 0000000B13 00000C 0000000C14 00000D 0000000D15 00000E 0000000E16 00000F 0000000F17 000010 0000001018 000011 0000001119 000012 0000001220 000013 0000001321 000014 0000001422 000015 0000001523 000016 0000001624 000017 0000001725 000018 0000001826 000019 0000001927 00001A 0000001A28 00001B 0000001B29 00001C 0000001C30 00001D 0000001D31 00001E 0000001E32 00001F 0000001F33 000020 0000002034 000021 0000002135 000022 0000002236 000023 0000002337 000024 0000002438 000025 0000002539 000026 0000002640 000027 0000002741 000028 0000002842 000029 0000002943 00002A 0000002A
44 00002B 0000002B 45 00002C 0000002C 46 00002D 0000002D
Page 81 of 82
REFERENCES:
[1] Clive Maxfield “The design warrior's guide to FPGAs”
[2] Will R. Moore, Wayne Luk “Field-programmable Logic and Applications”
[3] Marian Adamski, Marek Wegrzyn “Design of embedded control systems”
[4] Sunggu Lee “Advanced Digital Logic Design”
[5] Pong P Chu “RTL hardware design using VHDL “
[6] http://wwwfpga4fun.com
[7] http://www.fpgasummit.com
[8] http://www.fpga.com
[9]http://video.google.comlvideoplay?docid=-5776J46032722J35072
[10]http:/www.xilinx.comlsupport/documentation/virtex-4_userguides.htm
[11]http://www.actel.com/documents/modelsim_tutorial_ug.pdf
[12]http://www.xilinx.com/ise/optionalyrod/cspro.htm
[13]http://japan.xilinx.com/products/ipcenter/DO-CSP-PRO.htm
Page 82 of 82