esm

CHAPTER 1

INTRODUCTION

1.1 Introduction to Embedded Systems

An embedded system is a special-purpose computer system designed to perform one or a few

dedicated functions, often with real-time computing constraints. It is usually embedded as part of a

complete device including hardware and mechanical parts. In contrast, a general-purpose computer, such

as a personal computer, can do many different tasks depending upon programming. Embedded systems

control many of the common devices in use today. Since the embedded system is dedicated to specific

tasks, design engineers can optimize it by reducing the size and cost of the product, or increasing the

reliability and performance. An embedded system can also be defined as an engineering artefact

involving computation that is subject to physical constraints arising through interactions of

computational processes with the physical world. These physical constraints are divided into reaction

and execution constraints. Reaction constraints originate from the behavioral requirements and specify

the deadlines, throughput and jitter whereas the execution constraints originate from the implementation

requirements and put bounds on available processor speeds, power, and memory and hardware failure

rates.

Some embedded systems are mass-produced, benefiting from economies of scale. In general,

"embedded system" is not an exactly defined term, as many systems have some element of

programmability. Physically, embedded systems range from portable devices such as digital watches and

MP4 players to large stationary installations like traffic lights, factory controllers, or the systems

controlling nuclear power plants/missiles/satellites. Complexity varies from low, with a single

microcontroller chip, to very high with multiple units, peripherals and networks mounted inside a large

chassis or enclosure.

Embedded systems range from no user interface at all, dedicated only to one task, to complex

graphical user interfaces that resemble modern computer desktop operating systems. Simple embedded

devices use buttons, LEDs, and small character-or digit-only displays, often with a simple menu system.

Embedded processors can be broken into two broad categories: ordinary microprocessors (µP) and

microcontrollers (µC), which have many more peripherals on chip, reducing cost and size.

A common configuration for very-high-volume embedded systems is the system on a chip

(SOC). A system on chip is an integrated circuit which contains a complete system consisting of

multiple processors, multipliers, caches and interfaces on a single chip. SOCs can be implemented as an

application-specific integrated circuit (ASIC) or using a field-programmable gate array (FPGA).

of 82

1.1.1 Embedded System Characteristics

Embedded systems are designed to some specific task, rather than be a general-purpose

computer for multiple tasks. Some have real-time performance constraints that must be

met, for reasons such as safety and usability; others may have low or no performance

requirements, allowing the system hardware to be simplified to reduce costs.

Embedded systems are not always stand-alone devices. Many embedded systems consist

of small, computerized parts within a larger device that serves a more general purpose.

For example, the Gibson Robot Guitar features an embedded system for tuning the

strings; the overall purpose of the guitar is, of course to play music. Similarly, an

embedded system in an automobile provides a specific function as a subsystem of the car

itself.

The program instruction written for embedded systems are referred to as firmware, and

are stored in read-only memory or flash memory chips. They run with limited computer

hardware resources, little memory, small or non-existent keyboard and/or screen.

Embedded systems often reside in machines that are expected to run continuously for

years without errors and in some cases recover by them if an error occurs. Therefore the

software is usually developed and tested more carefully than for personal computers, and

unreliable mechanical moving parts such as hard drives, switches or buttons are avoided.

1.2 Introduction to Electronic Warfare

The term Electronic Warfare (EW) refers to any action involving the use of the electromagnetic

spectrum (EMS) or directed energy (DE) to control the EMS or to attack the enemy. EW includes three

major subdivision and they are: Electronic attack (EA), Electronic Protect (EP), and Electronic warfare

support (ES). The purpose of EW is to deny the opponent an advantage in the EMS and ensure friendly

unimpeded access to the EM spectrum portion of the information environment. EW can be applied from

air, sea, land, and space by manned and unmanned systems.

of 82

1.2.1 Description of EW

The term Electronic attack (EA) refers to the usage of electromagnetic energy, directed energy, or

anti-radiation weapons to attack personnel, facilities, or equipment with the intent of degrading,

neutralizing, or destroying enemy combat capability. In case of EM energy, this action is referred to as

jamming and can be performed on communications systems or radar systems.

Electronic protect or Electronic protective measures (EPM) involves actions taken to protect

personnel, facilities and equipment from any effects of friendly or enemy use of electromagnetic

spectrum that degrade, neutralize or destroy friendly combat compatibility.

In military telecommunications, the terms Electronic Support (ES) or Electronic Support Measures

(ESM) describe the division of electronic warfare involving actions taken under direct control of an

operational commander to detect, intercept, identify, locate, record, and/or analyze sources of radiated

electromagnetic energy for the purposes of immediate threat recognition (such as warning that fire

control RADAR has locked on a combat vehicle, ship, or aircraft) or longer-term operational planning.

Thus, Electronic Support provides a source of information required for decisions involving Electronic

Protection (EP), Electronic Attack (EA), avoidance, targeting, and other tactical employment of forces.

Electronic Support data can be used to produce signals intelligence (SIGINT), communications

intelligence (COMINT) and electronics intelligence (ELINT).

Digital communication became important with the expansion of the use of computers and data

processing and had continued to grow as a major industry providing the inter connection of computer

peripherals and transmission of data between distant sites. With the requirement of higher and higher

speeds of data transmission, the stress on the development of digital communication techniques has

increased, Also, the channel and its characteristics bandwidth, frequency, noise, distortion, transmission

speed, type of coding etc. got improved from time to time.

Electronic Support Measures gather intelligence through passive "listening" to electromagnetic

radiations of military interest. Electronic support measures can provide.

1. Initial detection or knowledge of foreign systems.

2. A library of technical and operational data on foreign systems.

3. Tactical combat information utilizing that library.

of 82

Desirable characteristics for electromagnetic surveillance and collection equipment include.

1. Wide-spectrum or bandwidth capability because foreign frequencies are initially

unknown.

2. Wide dynamic range because signal strength is initially unknown.

3. Narrow band pass to discriminate the signal of interest from other electromagnetic radiation on

nearby frequencies.

4. Good angle-of arrival measurement for bearings to locate the transmitter.

1.2.2 Electronic Counter Measures

Electronic Counter Measures (ECM) are a subsection of electronic warfare which includes any

sort of electrical or electronic device designed to trick or deceive Radar, Sonar, or other detection

systems like IR (infrared) and Laser. It may be used both offensively and defensively in any method to

deny targeting information to an enemy. The system may make many separate targets appear to the

enemy, or make the real target appear to disappear or move about randomly. It is used effectively to

protect aircraft from guided missiles. Most air forces use ECM to protect their aircraft from attack. That

is also true for military ships and recently on some advanced tanks to fool laser/IR guided missiles.

Frequency is coupled with stealth advances so that the ECM system has an easier job. Offensive ECM

often takes the form of jamming. Defensive ECM includes using blip enhancement and jamming of

missile terminal homers.

1.2.3 Electronic Counter-Counter Measures

Electronic Counter-Counter Measures (ECCM) describes a variety of practices which attempt to

reduce or eliminate the effect of Electronic Counter Measures (ECM) on electronic sensors aboard

vehicles, ships and aircraft and weapons such as missiles. ECCM is also known as Electronic Protective

Measures (EPM), chiefly in Europe. Electronic Protection (EP) involves actions taken to protect

personnel, facilities, and equipment from any effects of friendly or enemy use of the electromagnetic

spectrum that degrade, neutralize, or destroy friendly combat capability. While defensive EA actions and

EP both protect personnel, facilities, capabilities, and equipment, EP protects from the effects of EA

(friendly and/or adversary). Some examples of EPM are ECM detection, Pulse compression by

"chirping", or linear frequency modulation, Frequency hopping, Side lobe cancellation, Polarization and

Radiation homing.

of 82

1.3 Field Programmable Gate Array

A Field Programmable Gate Array (FPGA) is a semiconductor device that can be configured by

the customer or designer after manufacturing hence the name ''field-programmable". FPGAs are

programmed using a logic circuit diagram or a source code in a hardware description language (HDL) to

specify how the chip will work. They can be used to implement any logical function that an application

specific integrated circuit (ASIC) could perform, but the ability to update the functionality after shipping

offers advantages for many applications. FPGAs contain programmable logic components called "logic

blocks", and a hierarchy of reconfigurable interconnects that allow the blocks to be "wired together”-

somewhat like a one-chip programmable breadboard logic blocks can be configured to perform complex

combinational functions, or merely simple logic gates like AND and XOR. In most FPGAs, the logic

blocks also include memory elements, which may be simple flip-flops or more complete blocks of

memory.

The cost of an FPGA design is much lower than that of an ASIC (although the ensuing ASIC

components are much cheaper in large production runs). At the same time, implementing design changes

is much easier in FPGAs, and the time-to-market for such designs is much faster. FPGAs are often used

to prototype ASIC designs or to provide a hardware platform on which to verify the physical

implementation of new algorithms. However, their low development cost and short time-to-market mean

that they are increasingly finding their way into final products (some of the major FPGA vendors

actually have devices that they specifically market as competing directly against ASICs).

Field Programmable Gate Array

Fig. 1.1: FPGA Introduction

In order to be programmable, we need some mechanism that allows us to configure (program) a

prebuilt silicon chip.

1.3.1 FPGA Origin

Around the beginning of the I980s, it became apparent that there was a gap in the digital IC

continuum. At one end, there were programmable devices like SPLDs and CPLDs, which were highly

of 82

configurable and had fast design and modification times, but which couldn't support large or complex

functions. At the other end of the spectrum were ASICs. These could support extremely large and

complex functions, but they were painfully expensive and time-consuming to design. Furthermore, once

a design had been implemented as an ASIC it was effectively frozen in silicon.

The Gap

Fig. 1.2: The Gap between PLDs and ASICs

The early devices were based on the concept of a programmable logic block, which comprised a

3-input lookup table (LUT), a register that could act as a flip-flop or a latch, and a multiplexer, along

with a few other elements that are of little interest here.

a b c y

q

d clock

Fig. 1.3: The key elements forming a simple programmable logic block

Each FPGA contained a large number of these programmable logic blocks, as discussed below.

By means of appropriate SRAM programming cells, every logic block in the device could be configured

to perform a different function. Each register could be configured to initialize containing logic 0 or logic

1 and to act as a flip-flop (as shown in Fig: 1.3) or a latch. If the flip-flop option were selected, the

of 82

ASICs

Gate Arrays

Structured ASICs*

Standard Cell

Full Custom

PLDS

SPLDS

CPLDS

Mux flip-flop

3-input

LUT

register could be configured to be triggered by a positive-or negative-going clock (the clock signal was

common to all of the logic blocks). The multiplexer feeding the flip-flop could be configured to accept

the output from the LUT or a separate input to the logic block, and the LUT could be configured to

represent any 3-input logical junction.

1.3.2FPGA Architecture

The complete FPGA comprised of a large number of programmable logic block called "islands"

surrounded by a "sea" of programmable interconnects. High-level illustration is merely an abstract

representation. All of the transistors and interconnects would be implemented on the same piece of

silicon using standard IC creation techniques. In addition to the local interconnect reflected in figure,

there would also be global (high-speed) interconnection paths that could transport signals across the chip

without having to go through multiple local switching elements. The device would also include primary

I/O pins and pods. By means of its own SRAM cells, the interconnect could be programmed such that

the primary inputs to the device were connected to the inputs of one or more programmable logic blocks,

and the outputs from any logic block could be used to drive the inputs, the primary outputs from the

device, or both.

Fig. 1.4: Top-down view of simple, generic FPGA architecture

The end result was that FPGAs successfully bridged the gap between PLDs and ASICs and also

they were highly configurable and had the fast design and modification times associated with PLDs. On

the other hand, they could be used to implement large and complex functions that had previously been

the domain only of ASICs (which were still required for the really large, complex, high-performance

of 82

designs), but as FPGAs increased in sophistication they started to encroach further and further into ASIC

design space.

1.4 XilinxTMI Virtex-5 FPGA

Virtex-I is the newest generation FPGA from Xilinx. Virtex-5 family contains five distinct

platforms, the most choice offered by any FPGA family. Each platform contains a different ratio of

features to address the needs of a wide variety of advanced logic designs. In addition to the most

advanced, high performance logic fabric, Virtex-5 FPGAs contain many hard-IP system level blocks,

including powerful 36-Kbit block RAM/FIFOs, second generation 25*18 DSP slices. Also Virtex-5

offers the best solution for addressing the needs of high performance logic designers, high performance

DSP designers, and high performance embedded systems designers with unprecedented logic, DSP,

hard/soft microprocessor and connectivity capabilities. The Virtex-5 LX, LXT, SXT, TXT and FXT

platforms include high speed serial connectivity and link/transaction layer capability.

The 5 platforms are:

Virtex-5 LX: High performance general logic applications.

Virtex-5 LXT: High performance logic with advanced serial connectivity.

Virtex-5 SXT: High performance signal processing applications with advanced serial connectivity.

Virtex-5 TXT: High performance system with double density advanced and serial connectivity.

Virtex-5 FXT: High performance embedded systems with advanced serial connectivity.

1.4.1 Architectural Description

Virtex-5 devices are user-programmable gate arrays with various configurable elements and

embedded cores optimized for high-density and high-performance system designs. Virtex-5

devices implement the following functionality:

• I/O blocks provide the interface between package pins and the internal configurable logic.

Most popular and leading-edge I/O standards are supported by programmable I/O blocks

(IOBs). The IOBs can be connected to very flexible Chip Sync logic for enhanced source-

synchronous interfacing. Source-synchronous optimizations include per-bit deskew (on both

input and output signals), data serializers or deserializers, clock dividers, and dedicated I/O

and local clocking resources.

• Configurable Logic Blocks (CLBs), the basic logic elements for Xilinx FPGAs, provide

combinatorial and synchronous logic as well as distributed memory and SRL32 shift register

of 82

capability. Virtex-5 FPGA CLBs are based on real 6-input look-up table technology and

provide superior capabilities and performance compared to previous generations of

programmable logic.

• Block RAM modules provide flexible 36 Kbit true dual port RAM that are cascadable to

form larger memory blocks. In addition, Virtex-5 FPGA block RAMs contain optional

programmable FIFO logic for increased device utilization. Each block RAM can also be

configured as two independent 18 Kbit true dual-port RAM blocks, providing memory

granularity for designs needing smaller RAM blocks.

• Clock Management Tile (CMT) blocks provide the most flexible, highest-performance

clocking for FPGAs. Each CMT contains two Digital Clock Manager (DCM) blocks (self-

calibrating, fully digital), and one PLL block (self-calibrating, analog) for clock distribution

delay compensation, clock multiplication/division, coarse- /fine-grained clock phase shifting,

and input clock jitter filtering.

1.4.2 Virtex-5 FPGA Features

Input/output Blocks (Select IO) IOBs are programmable and can be categorized as

Programmable single-ended or differential (LVDS) operation.

Input block with an optional single data rate (SDR) or double data rate (DDR) register.

Output block with an optional SDR or DDR register

Bidirectional block

Per-bit de skew circuitry

Dedicated I/O and regional clocking resources

Built-in data serializer/deserializer

The IOB registers are either edge-triggered D-type flip-flops or level-sensitive latches.

The Digitally Controlled Impedance (DCI) I/O feature can be configured to provide on-chip

termination for each single-ended I/O standard and some differential I/O standards.

Data serializer/deserializer capability is added to every I/O to support source-synchronous

Interfaces. A serial-to parallel converter with associated clock divider is included in the input

path, and a parallel-to-serial converter in the output path.

Configurable Logic Blocks (CLBs) A Virtex-5 FPGA CLB resource is made up of two slices. Each slice is equivalent and contains:

• Four function generators

• Four storage elements

• Arithmetic logic gates

• Large multiplexers of 82

• Fast carry look-ahead chain

The function generators are configurable as 6-input LUTs or dual-output 5-input LUTs. In addition, the

four storage elements can be configured as either edge-triggered D-type flip-flops or level sensitive

latches. Each CLB has internal fast interconnect and connects to a switch matrix to access general

routing resources.

Block RAM The 36 Kbit true dual-port RAM block resources are programmable from 32K x 1 to 512 x 72, in

various depth and width configurations.

In addition, each 36-Kbit block can also be configured to operate as two, independent 18- Kbit

dual-port RAM blocks. Each port is totally synchronous and independent, offering three “read-

during-write” modes.

Block RAM is cascadable to implement large embedded storage blocks. Additionally, back-end

pipeline registers, clock control circuitry, built-in FIFO support, ECC, and byte write enable

features are also provided as options.

Global Clocking The CMTs and global-clock multiplexer buffers provide a complete solution for designing high-

speed clock networks. Each CMT contains two DCMs and one PLL. The DCMs and PLLs can

be used independently or extensively cascaded. Up to six CMT blocks are available, providing

up to eighteen total clock generator elements. Each DCM provides familiar clock generation

capability.

To generate de skewed internal or external clocks, each DCM can be used to eliminate clock

distribution delay. The DCM also provides 90°, 180°, and 270° phase-shifted versions of the

output clocks. Fine-grained phase shifting offers higher resolution phase adjustment with fraction

of the clock period increments. Flexible frequency synthesis provides a clock output frequency

equal to a fractional or integer multiple of the input clock frequency.

To augment the DCM capability, Virtex-5 FPGA CMTs also contain a PLL. This block provides

reference clock jitter filtering and further frequency synthesis options. Virtex-5 devices have 32

global-clock MUX buffers. The clock tree is designed to be differential. Differential clocking

helps reduce jitter and duty cycle distortion.

DSP48E Slices DSP48E slice resources contain a 25 x 18 two’s complement multiplier and a 48-bit adder/subs

tractor/accumulator. Each DSP48E slice also contains extensive cascade capability to efficiently

implement high-speed DSP algorithms.

Routing Resources All components in Virtex-5 devices use the same interconnect scheme and the same access to the global

routing matrix. In addition, the CLB-to-CLB routing is designed to offer a complete set of connectivity

in as few hops as possible. Timing models are shared, greatly improving the predictability of the

performance for high speed designs. of 82

Configuration Virtex-5 devices are configured by loading the bit stream into internal configuration memory using one

of the following modes:

• Slave-serial mode

• Master-serial mode

• Slave Select MAP mode

• Master Select MAP mode

• Boundary-Scan mode (IEEE-1532 and -1149)

• SPI mode (Serial Peripheral Interface standard Flash)

• BPI-up/BPI-down modes (Byte-wide Peripheral interface standard x8 or x16 NOR Flash)

System Monitor FPGAs are an important building block in high availability/reliability infrastructure. Therefore,

there is need to better monitor the on-chip physical environment of the FPGA and its immediate

surroundings within the system.

For the first time, the Virtex-5 family System Monitor facilitates easier monitoring of the FPGA

and its external environment. Every member of the Virtex-5 family contains a System Monitor

block.

The System Monitor is built around a 10-bit 200kSPS ADC (Analog-to-Digital Converter). This

ADC is used to digitize a number of on-chip sensors to provide information about the physical

environment within the FPGA. On-chip sensors include a temperature sensor and power supply

sensors. Access to the external environment is provided via a number of external analog input

channels. These analog inputs are general purpose and can be used to digitize a wide variety of

voltage signal types.

Support for unipolar, bipolar, and true differential input schemes is provided. There is full access

to the on-chip sensors and external channels via the JTAG TAP, allowing the existing JTAG

infrastructure on the PC board to be used for analog test and advanced diagnostics during

development or after deployment in the field.

The System Monitor is fully operational after power up and before configuration of the FPGA.

System Monitor does not require an explicit instantiation in a design to gain access to its basic

functionality. This allows the System Monitor to be used even at a late stage in the design cycle

1.4.3 Virtex-5 Ordering Information

XC5VFX100T-1FFG1738

of 82

Pin count

Lead free

Logical capacity

Speed

Flip ChipVirtex 5

Xilinx

CHAPTER 2

QDR-II STATIC RAM

2.1 Introduction to Memories

Computer data storage, often called storage or memory, refers to computer components, devices,

and recording media that retain digital data used for computing for some interval of time. Computer data

storage provides one of the core functions of the modern computer, that of information retention.

Memory is directly accessible to CPU. The CPU continuously reads instructions stored there and

executes them as required. Any data actively operated on is also stored there in uniform manner. This

memory is mainly of two types RAM and ROM.

2.1.1 Random Access Memory

Random access memory (RAM) is a form of computer data storage. It takes the form of

integrated circuits that allows the stored data to be accessed in any order (i.e., at random). The word

random thus refers to the fact that any piece of data can be returned in a constant time, regardless of its

physical location and whether or not it is related to the previous piece of data. This contrasts with storage

mechanisms such as tapes, magnetic discs and optical discs, which rely on the physical movement of the

recording medium or a reading head. In these devices, the movement takes longer than the data transfer,

and the retrieval time varies depending on the physical location of the next item. The word RAM is

mostly associated with volatile types of memory, where the information is lost after the power is

switched off.

Modern types of writable RAM generally store a bit of data in either the state of a flip-flop, as in

SRAM (static RAM), or as a charge in a capacitor (or transistor gate), as in DRAM (dynamic RAM),

EPROM, EEPROM and Flash. Some types have circuitry to detect and/or correct random faults called

memory errors in the stored data, using parity bits or error correction codes. RAM of the read-only type.

As both SRAM and DRAM are volatile, other forms of computer storage, such as disks and magnetic

tapes, have been used as persistent storage in traditional computers.

2.1.2 Read Only Memory

of 82

Embedded Power Processor

Read-only memory (usually known by its acronym, ROM) is a class of storage media used in

computers and other electronic devices. Because data stored in ROM cannot be modified (at least not

very quickly or easily), it is mainly used to distribute firmware (software that is very closely tied to

specific hardware, and unlikely to require frequent updates). ROM is fabricated with the desired data

permanently stored in it, and thus can never be modified. However, more modern types such as EPROM

and flash EEPROM can be erased and re-programmed multiple times; they are still described as "read-

only memory" (ROM) because the reprogramming process is generally infrequent, comparatively slow,

and often does not permit random access writes to individual memory locations. There are different

types of ROM Classic mask programmed ROM chips are integrated circuits that physically encode the

data to be stored, and thus it is impossible to change their contents after fabrication

1. Programmable read-only memory (PROM), or one-time programmable ROM (OTP), can be

written to or programmed via a special device called a PROM programmer. Typically, this device uses

high voltages to permanently destroy or create internal links (fuses or anti fuse) within the chip.

Consequently, a PROM can only be programmed once.

2. Erasable programmable read-only memory (EPROM) can be erased by exposure to strong

ultraviolet light (typically for 10 minutes or longer), then rewritten with a process that again requires

application of higher than usual voltage. Repeated exposure to UV light will eventually wear out an

EPROM, but the endurance of most EPROM chips exceeds 1000cycles of erasing and reprogramming.

EPROM chip packages can often be identified by the prominent quartz "window" which allows UV light

to enter. After programming, the window is typically covered with a label to prevent accidental erasure.

Some EPROM chips are factory erased before they are packaged, and include no window: these are

effectively PROM.

3. Electrically erasable programmable read-only memory (EEPROM) is based on a similar

semiconductor structure to EPROM but allows its entire contents (or selected banks) to be electrically

erased, then rewritten electrically, so that they need not be removed from the computer (or camera, MP3

player, etc). Writing or flashing an EEPROM is much slower (milliseconds per bit) than reading from a

ROM or writing to a RAM (nanosecond in both cases).

4. Electrically alterable read-only memory (EAROM) is a type of EEPROM that can be modified

one bit at a lime. Writing is a very slow process and again requires higher voltage (usually around 12V)

than is used for read access. EAROMs are intended for applications that require infrequent and only

partial rewriting. EAROM may be used as non-volatile storage for critical system setup information; in

many applications, EAROM has been supplanted by CMOS RAM supplied by mains power and backed-

up with a lithium battery.

5. Flash memory (or simply flash) is a modern type of EEPROM invented in 1984. Flash memory

can be erased and rewritten faster than ordinary EEPROM, and newer designs feature very high of 82

endurance (exceeding 1,000,000 cycles). Modern NAND flash makes efficient use of silicon chip area,

resulting in individual ICs with a capacity as high as 16 GB as of 2007; this feature, along with its

endurance and physical durability, has allowed NAND flash to replace magnetic in some applications

(such as USB flash drives). Flash memory is sometimes called flash ROM or flash EEPROM when used

as a replacement for older ROM types, but not in applications that take advantage of its ability to be

modified quickly and frequently.

2.2 Introduction to QDRII SRAM

The QDR consortium (Cypress, Renesas, IDT, NEC, and Samsung) defined and developed the

Quad Data Rate (QDR) SRAM technology for high-performance communications applications. The

QDRII SRAM architecture provides dedicated input and output ports that independently operate at

double data rate (DDR). This results in four data transfers per clock cycle and overcomes bus contention

issues. QDR SRAM devices were developed in response to the demand jar higher bandwidth memories

targeted at networking and telecommunications applications.

The basic QDR architecture has independent read and write data paths for simultaneous

operation. Both paths use Double Data Rate (DDR) transmission to deliver two words per clock cycle,

one word on the rising clock edge and another on the falling edge. The result is that four bus-widths of

data (two read and two write) are transferred during each clock period, hence the name quad data rate.

QDR memory devices are offered in both 2-word burst and 4word burst architectures. The 2-word burst

devices transmit two words per read or write request. A DDR address bus is used to allow Read requests

during the first half of the clock period and Write requests during the second half of the clock period. In

contrast, 4-word burst devices transmit four words per Read or Write request, and hence only require a

Single Data Rate (SDR) address bus to maximize data bandwidth. Read and Write operations must be

requested on alternating clock cycles (i.e., on-overlapping), allowing the address bus to be shared.

One of the unique features of the QDRII architecture is the echo-clock (CQ) output that is

frequency locked to the device input clock (K) but edge aligned to the data transmitted on the Read path

outputs (Q). The CQ clock output is retimed to align with the Q data outputs using a delay-locked loop

(DLL) circuit internal to the QDRII memory device. This clock forwarding, or source-synchronous,

method of interface allows greater timing margin. It also enables the simple and elegant direct-clocking

methodology used in this reference design, discussed in detail in this application note. The QDRII

reference design is composed of four main elements:

I. User Interface

II. Physical Interface

III. Read/Write State Machine

of 82

IV. Delay Calibration State Machine

The user interface uses a simple protocol Based entirely on SDR signals to make Read/Write

requests. This module is constructed primarily from FIFO16 primitives and is used to store the address

and data values for Read/Write operations before and after execution.

The Read/Write state machine is responsible for monitoring the status of the First in first out

(FIFO) within the user interface module, coordinating the flow of data between the user interface and

physical interface, and initializing the actual Read/Write commands to the external memory device. It

ensures execution of Read/Write operations with minimal latency in a concurrent manner as per the

requirements of the QDR II memory specification.

The physical interface is responsible for generating the proper timing relationships and DDR

signaling to communicate with the external memory device in a manner that conforms to its command

protocol and timing requirements.

The delay calibration state machine is an integral component of the direct-clocking methodology

used to achieve maximum performance while greatly simplifying the task of read data capture inside the

FPGA. The delay calibration state machine leverages this unique capability to adjust the timing of the

read data returning from the memory device so that it can be synchronized directly to the global FPGA

system clock without any complex local-clocking or data recapture techniques.

The reference diagram of QDR-II is shown below as follows.

QDRII User Interface Physical Interface Memory

Device

FIFO Status

Read/Write Control

Address path

Write Path

Read Path

CLK_DIV4

of 82

USER_CLK0USER_RESET

USER_W_nUSER_R_nUSER_QEN_n

USER_AD_WRUSER_AD_RD

USER_BW_nUSER_DWLUSER_DWH

USER_QRLUSER_QRH

USER_WR_FULLUSER_RD_FULLUSER_QR_EMPTY

USER_CLK0 USER_CLK270 USER_RESET

QDR_W_n QDR_R_n

QDR_SA

QDR_BW_n QDR_D

QDR_CD QDR_D

QDR_K QDR_K_n

Read/Write State Machine

Delay Calibration

State Machine

Fig. 2.1: QDR II Reference Design

2.3 Implementation of QDRII SRAM with Virtex-4 PRO FPGA

The QDR II reference design was implemented to take advantage of the unique

capabilities of the Virtex-4 family. Advances in I/O, clocking, and storage element technology

enable the high-performance, turnkey operation of this design. The following sections describe

the design implementation in further detail.

2.3.1 User Interface

The user interface module utilizes six FIFO16 blocks to store the address and data values

for Read/Write operations. For Write commands, three FIFO16 blocks are used, one to store the

Write address (USER_AD_WR) and byte write enable (USER_BW_n) signals, and two to store

the Low (USER DWL) and High (USER DWH) 36-bit data words to be written to the memory.

Read commands also use three FIFO16 blocks, one to store the Read address (USER_AD_RD)

and two to store the Low (USER_QRL) and High (USER_QRH) 36-bit data words returning

from the memory as a result of the Read execution. The Read/Write state machine manages the

interleaving of Read and Write requests to the external memory device, relieving the user

interface of this responsibility.

2.3.2 Read/Write State Machine

This state machine is responsible for coordinating the flow of data between the user

interface and physical interface. It initiates the Read/Write commands to the external memory

device based on the requests stored in the user interface FIFOs.

A USER_RESET always returns the state machine to the INIT state; where memory

operations are suspended until the delay calibration state machine has completed adjusting the

delay on the IDELAY blocks for all of the QDR_Q inputs to center align the Read path data to

the FPGA system clock, USER_CLK0. Completion of the calibration operation is signaled by an

active-High DLY_CAL_DONE input that transitions the Read/Write state machine to the Idle

state to await Read/Write requests from the user interface. From the Idle state, Write commands

take precedence on the presumption that a Write to memory must always occur before there is

any valid Read data. When there are no Read or Write requests pending, the stale machine loops

in the Idle state.

A Write request pending in the user interface FIFOs causes transition to the Write stale where a

Write command is initiated via the internal WR_INIT_n strobe. This strobe pulls the Write address and of 82

data values from the FIFO and results in the initiation of the external QDR_W_n Write control strobe to

the memory device. Assuming there is a pending Read request, the state machine then transitions to the

Read state where the internal RD_INIT_n strobe is activated. This strobe pulls the Read address from

INIT USER_RESET START_CAL=1

DLY_CAL_DONE

(FIFO_WR_EMPTY, FIFO_RD_EMPTY) | (FIFO_WR_EMPTY ⋅,FIFO_QR_FULL) IDLE

FIFO_WR_EMPTY

FIFO_WR_EMPTY (FIFO_WR_EMPTY ⋅, )

FIFO_RD_EMPTY | FIFO_QR_FULL

FIFO_WR_EMPTY WRITE READ WR_INIT_n = 0 RD_INIT_n=0

Figure2.2: 4-Word Burst Read/Write State Machine

DLY_CAL_DONE

INIT USER_RESET START_CAL=1

DLY_CAL_DONE (FIFO_WR_EMPTY • FIFO_RD_EMPTY) | (FIFO_WR_EMPTY • FIFO_QR_FULL

IDLE

/FIFO_WR_EMPTY | (FIFO_WR_EMPTY • FIFO_RD_EMPTY) | (/FIFO_RD_EMPTY • /FIFO_QR_FULL) FIFO_WR_EMPTY. FIFO_QR_FULL

READ/WRITE WR_INIT_n = 0?

rd_init_n = 0? /FIFO_WR_EMPTY) |

of 82

(/FIFO_RD_EMPTY • /FIFO_QR_FULL)

Fig. 2.3: A 2-word burst read/write state machine

the FIFOs and launches an external QDR_R_n strobe to the memory device. Capture of the return values

in the Read data FIFOs also occurs as a result of this process.

The Read/Write slate machine continuously monitors the user interface FIFO status signals to

determine if there are any pending Read/Write requests. A continuous flow of concurrent Read/Write

requests causes the state machine to simply alternate between the Read and Write states, ensuring

properly interleaved requests to the external memory. A stream of Write requests results in alternating

Idle and Write stales. While a stream of Read requests similarly alternates between Idle and Read slates.

The operation of a 2-word burst state machine is quite similar to the 4-word burst slate machine,

with the exception that a single READ_IVRITE state manages the Read and Write requests to the

memory. All 2-word burst QDR 11 memory devices allow Read and Write requests to occur on the same

clock cycle, allowing these operations to be initialed from the same state.

The state diagram for 4 word burst read/write and 2 word burst read/write are shown below.

2.3.3 Physical Interface

The Physical Interface of the QDRII reference design generates the actual I/O signaling and

timing relationships for communication of Read/Write commands to the external memory device,

including the DDR data signals. It provides the necessary timing margins and 1/0 signaling standards

required to meet the overall design performance specifications.

2.4 Functional Description of QDRII SRAM

The CY7C15JJV18, CY7C1526V18, CY7C1513V18, andCY7C1515V18 are 1.8V

Synchronous Pipelined SRAMs, equipped with QDRII architecture. QDRII architecture consists of two

separate ports to access the memory array. The Read port has dedicated Data Outputs to support Read

operations and the Write Port has dedicated Data Inputs to support Write operations. QDRII architecture

has separate data inputs and data outputs to completely eliminate the need to "turn-around" the data bus

required with common I/O devices. Access to each port is accomplished through a common address bus.

Addresses jar Read and Write addresses are latched on alternate rising edges of the input (K) clock.

Accesses to the QDRII Read and Write ports are completely independent of one another. In order

to maximize data throughput, both Read and Write ports are equipped with Double Data Rate (DDR)

interfaces. Each address location is associated with four 8-bit words (CY7CI5JlVI8) or 9-bit words

of 82

(CY7CI526VI8) or I8-bit words (CY7CI5I3VI8) or 36-bit words (CY7CI5I5VI8) that burst sequentially

into or out of the device. Since data can be transferred into and out of the device one very rising edge of

both input clocks (K and K and C and C), memory bandwidth is maximized while simplifying system

design by eliminating bus "turn-around"

Fig. 2.4: Logic diagram of CY7C1515V18

Depth expansion is accomplished with Port Selects for each port, Port selects allow each port to

operate independently. All synchronous inputs pass through input registers con/rolled by the K or K

input docks. All do/a outputs pass through output registers controlled by the C or C (or K or K in a

single clock domain) input docks, Writes ore conducted with on-chip synchronous self-timed write

circuitry.

2.4.1 Pin Definitions

Pin Name I/O Pin Description

D[x: 0]Input-

Synchronous

Data input signals, sampled on the rising edge of K and clocks during valid write operations.

CY7C 1511V18-D[7:0]CY7C 1526V18-D[8:0]CY7C1513V18-D[17:0]CY7CI515V18-D[35:0]

Input-Synchronous

Write Port Select, active LOW. Sampled on the rising edge of the K clock. When asserted active, a write operation is initiated. Disserting will deselect the Write port. Deselecting tile Write port Will cause D[x: 0] to be ignored.

of 82

,

Input-Synchronous

Nibbl_Write Select 0, 1-active LOW. (CY7C1511V18 Only) Sampled on the rising edge of the K and K clocks during write operations. Used to select which nibble is

written into the device controls D [3:0] and

controls D [7:4] the entire Nibble write Selects are sample on the same edge as the data. Deselecting a Nibble Write Select will cause the corresponding nibble of data to be ignored and not written into the device.

, ,

,

Input-Synchronous

Byte write select 0, 1, 2 and 3-active low. Sampled on the rising edge of the k and clocks during write operations. Used to select which byte is written into the device during the current portion to the write operations. Bytes not written remain unaltered.

CY7C1526V18- controls D[8:0]

CY7C1513V18- controls D[8:0] and controls

D[17:9]

CY7C1515V18- controls D[8:0], controls

D[17:9], controls D[26:18], controls D[35:27]

A

Input-Synchronous

Address Inputs. Sampled on tile rising edge of the K clock during active read and write operations. These address inputs are multiplexed for both Read and Write operations. Internally, the device is organized as 8M x 8 (4 arrays each or 2M x 8) for CY7C151W18, 8M x 9 (4 arrays each of 2M x 9) for CY7C1526V18, 4M x 18(4 arrays each of 1M x 18) for CY7C1513V18 and 2M x 36 (4 arrays each or 512K x 36) for CY7C 1515V18. Therefore, only 21 address inputs are needed to access the entire memory array of CY7C 1511Vl8 and CY7C1526V18, 20 address Inputs for CY7C1513V18 and 19 address inputs for CY7C1515V18.These inputs are ignored when the appropriate port is deselected

Q[x: 0]Outputs-

Synchronous

Data Output signals. These pins drive out the requested data during a Read operation. Valid data is driven out on the rising edge of both the C and C clocks during Read operations or and K. when in single clock mode, When the Read port is deselected, Q[x: 0] are automatically tri-stated. CY7C1511V18 -Q[7:0] CY7C1525V18 -Q[18:0] CY7C1513V18-Q[17: 0] CY7C1515V18-Q[35:0]

Input-Synchronous

Read Port Select, active LOW. Sampled on the rising edge of Positive Input Clock (K). When active, a Read operation is initiated. Deasserting Will cause the Read port to be

of 82

deselected. When deselected, the pending access is allowed to complete and the output drivers are automatically tri-stated following the next rising edge of tile C clock. Each read access consists of a burst of four sequential transfers.

C Input-Clock

Positive Input Clock for Output Data. C is used in conjunction with to clock out the Read data from the

device. C and can be used together to deskew the flight tunes of various devices on the board back to the controller. See application example for further details

Input-Clock

Negative Input Clock for Output data. is used in conjunction with C to clock out the Read data from the device. C and can be used together to deskew the flight times or various devices on the board back to the controller. See application example for further details

K Input-ClockPositive Input Clock Input: The rising edge of k is used to capture synchronous inputs to the device and to drive out data through Q[x: 0] when in single clock mode. All accesses are initiated on the rising edge of K.

Input-ClockNegative Input Clock Input: is used synchronous inputs being presented to the devices and to drive out data through Q[x: 0] when in single clock mode.

CQ Echo Clock

CQ is referenced with respect to C. This is a free running clock and is synchronized to the input clock for output data (C) of the QDR-II. In the single clock mode. CQ is generated with respect to K. The timings for the echo clocks are shown in the AC timing table.

Echo Clock

is referenced with respect to : This is free running clock and is synchronized to the Input clock for output data (

) of the QDR-II. In the Single clock mode. is

generated with respect to . The timings for the echo clocks are shown in the AC Tuning table.

ZQ Input

Output Impedance Matching Input. Thus input is used to turn the device outputs to the system data bus impedance. CQ, and Q[x: 0] output impedance are set to 0.2 x RQ, where RQ is a resistor connected between ZQ and ground. Alternately, this pin can be connected directly to VDDQ, which enables the minimum impedance mode. This pin cannot be connected directly to GND or left unconnected.

InputDLL Turn Off- Active LOW. Connecting this pin to ground will turn off the DLL inside the device. The timing in the DLL turned off operation will be different from those listed in this data sheet

TDO Output TDO for JTAG.

of 82

TCK Input TCK pin for JTAG.

TDI Input TOI pin for JTAG.

TMS Input TMS pin for JTAG.

NC N/A Not connected to the die. Can be tied to any voltage level.

Vss/144M Input Address expansion for 144M. Can be tied to any voltage level.

Vss/288M Input Address expansion for 288M. Can be tied to any voltage level.

Vref Input ReferenceReference Voltage Input. Static input used to set the reference level for HSTL inputs and Outputs as well as AC measurement points.

VDD Power Supply Power supply inputs to the core of the device.

Table 2.1: Pin definitions

2.5 Functioning Mechanism of QDRII SRAM

The CY7CI511V18, CY7C1526V18, CY7C1513V18, CY7C1515V18 are synchronous pipelined

Burst SRAMs equipped with both a Read Port and a Write Port. The Read port is dedicated to Read

operations and the Write Port is dedicated to Write operations. Data flows into the SRAM through the

Write port and out through the Read Port. These devices multiplex the address inputs in order to

minimize the number of address pins required. By having separate Read and Write ports, the QDRII

completely eliminates the need to" turn-around" the data bus and avoids any possible data contention,

thereby simplifying system design. Each access consists of four 8-bit data transfers in the case of

CY7C1511V18, four 9-bit data transfers in the case of C17CI526VI8, four 18-bit data transfers in the

case of CY7CI513VI8, and four 36-bit data in the case of C17C1515V18 transfers in two clock cycles.

Accesses for both ports are initiated on the Positive Input Clock (K). All synchronous input

timing is referenced from the rising edge of the input clocks (K and K) and all output timings referenced

to the output clocks (C and C or K and K when in single clock mode).

All synchronous data inputs (D[x:0]) inputs pass through input registers controlled by the input

clocks (K and K), All synchronous data outputs (Q[x:0]) outputs pass through output registers controlled

by the rising edge of the output clocks (C and C or K and K when in single-clock mode).

All synchronous control (RPS, WPS, BWSx:O) inputs pass through input registers controlled

by the rising edge of the input clocks (K and K).CY7CI513VI8 is described in the following sections.

2.5.1 Read Operations

of 82

The CY7CI513VI8 is organized internally as 4 arrays of 1M x18. Accesses are completed in a

burst of four sequential 18-bitdata words. Read operations are initiated by asserting RPS active at the

rising edge of the Positive Input Clock (K). The address presented to Address inputs is stored in the

Read address register. Following the next K clock rise, the corresponding lowest order 18bit word of

data is driven onto the Q [17:0] using C as the output timing reference. On the subsequent rising edge of

C the next 18-bit data word is driven onto the Q [17:0]. This process continues until all four 18-bit data

words have been driven out onto Q [17:0]. The requested data will be valid 0.45 ns from the rising edge

of the output clock (C or C or (K or K when in single-clock mode)). In order to maintain the internal

logic, each read access must be allowed to complete. Each Read access consists of four 18-bit data

words and takes 2 clock cycles to complete. Therefore, Read accesses to the device cannot be initiated

on two consecutive clock rises. The internal logic of the device will ignore the second Read request

Read accesses can be initiated one very other K clock rise. Doing so will pipeline the data flow such that

data is transferred out of the device on every rising edge of the output clocks (C and C or K and K when

in single-clock mode).

When the read port is deselected, the CY7CI5I3VI8 will first complete the pending read

transactions. Synchronous internal circuitries will automatically tri-state the outputs following the next

rising edge of the Positive Output Clock (C). This will allow for a seamless transition between devices

without the insertion of wait states in a depth expanded memory.

2.5.2 Write Operations

Write operations are initiated by asserting WPS active at the rising edge of the Positive input

Clock (K). On the following K clock rise the data presented to D[I7:0] is latched and stored into the

lower I8-bit Write Data register, provided BWS[1:0] are both asserted active. On the subsequent rising

edge of the Negative Input Clock (K) the information presented to D [I7:0] also stored into the Write

Data Register, provided BWS [1:0] are both asserted active. This process continues for one more cycle

until four I8-bit 'words (a total of 72 bits) of data are stored in the SRAM. The 72 bits of data are then

written into the memory array at the specified location. Therefore, Write accesses to the device cannot

be initiated on two consecutive K clock rises. The internal logic of the device will ignore the second

Write request. Write accesses can be initiated on every other rising edge of the Positive Input Clock (K).

Doing so will pipeline the data flow such that 18bits of data can be transferred into the device on every

rising edge of the input clocks (K and K).

When deselected, the write port will ignore all inputs after the pending Write operations have

been completed.

2.5.3 Byte Write Operations

of 82

Byte Write operations are supported by the CY7CI 513VI8. A write operation is initiated as

described in the Write Operation section above. The bytes that are written are determined by BWS0 and

BWS1, which are sampled with each set of 18-bitdata words. Asserting the appropriate Byte Write

Select input during the data portion of a write will allow the data being presented to be latched and

written into the device. Deasserting the Byte Write Select input during the data portion of a write 'will

allow the data stored in the device for that byte to remain unaltered. This feature can be used to simplify

Read/Modify/Write operations to a Byte Write operation. Even CY7C1515V18 also supports byte write

which is determined by BWS0, BWS1, BWS2, and BWS3.

2.5.4 Single Clock Mode

The CY7CJ513VI8 can be used with a single clock that controls both the input and output

registers. In this mode the device will recognize only a single pair of input clocks (K and K) that controls

both the input and output registers. This operation is identical to the operation if the device had zero

skew between the K/K and C/C clocks. All timing parameters remain the same in this mode. To use this

mode of operation, the user must tie C and C HIGH at power on. This function is a strap option and not

alterable during device operation.

2.5.4 Concurrent Transactions

The Read and Write ports on the CY7Cl5J3V18 operate completely independently of one

another. Since each port latches the address inputs on different clock edges, the user can Read or Write

to any location, regardless of the transaction on the other port. If the ports access the same location when

a read follows a write in successive clock cycles, the SRAM will deliver the most recent information

associated with the specified address location. This includes forwarding data from a Write cycle that was

initiated on the previous K clock rise.

Read accesses and Write access must be scheduled such that one transaction is initiated on any

clock cycle. If both ports are selected on the same K clock rise, the arbitration depends on the previous

state of the SRAM If both ports were deselected, the Read port will take priority, If a Read was initiated

on the previous cycle, the Write port will assume priority (since Read operations cannot be initiated on

consecutive cycles). If a Write was initiated on the previous cycle, the Read port will assume priority

(since Write operations cannot be initiated on consecutive cycles). Therefore, asserting both ports selects

active from a deselected state will result in alternating Read/Write operations being initiated, with the

first access being a Read

2.5.6 Depth Expansion

The CY7C1513V18 has a Port Select input for each port. This allows for easy depth expansion.

Both Port Selects are sampled on the rising edge of the Positive Input Clock only (K).Each port select

of 82

input can deselect the specified port. Deselecting a port will not affect the other port. All pending

transactions (Read and Write) will be completed prior to the device being deselected.

2.5.7 Programmable Impedance

An external resistor, RQ, must be connected between the ZQ pin on the SRAM and VSS to allow

the SRAM to adjust its output driver impedance. The value of RQ must be 5X the value of the intended

line impedance driven by the SRAM, The allowable range of RQ to guarantee impedance matching with

a tolerance of ±15% is between 175Ω and 350Ω, with VDDQ = 1.5V. The output impedance is

adjusted every I024 cycles upon power up to account for drifts in supply voltage and temperature.

2.5.8 Echo Clocks

Echo clocks are provided on the QDR-II to simplify data capture on high speed systems. Two

echo clocks are generated by the QDR-II. CQ is referenced with respect to C and CQ is referenced with

respect to C. These are free running clocks and are synchronized to the output clock of the QDR-II. In

the single clock mode, CQ is generated with respect to K and CQ is generated with respect to K. The

timings for the echo clocks are shown in the AC liming table.

2.5.9 Delay Lock loops

These chips utilize a Delay Lock Loop (DLL) that is designed to function between 80 MHz and

the specified maximum clock frequency. During power-up, when the DOFF is tied HIGH, the DLL gels

locked after 1024 cycles of stable clock. The DLL canal so be reset by slowing or stopping the input

clock K and K for a minimum of 30 ns. However, it is not necessary for the DLL to be specifically reset

in order to lock the DLL to the desired frequency. The DLL will automatically lock 1024 clock cycles

after a stable clock is presented. The DLL may be disabled by applying ground to the DOFF pin. For

information refer to the application note "DLL Considerations in QDRIITM/DDRII/QDRII+/DDRII+”.

of 82

CHAPTER 3

XILINX AND MODEL SIM

3.1 Xilinx Overview

The Integrated Software Environment (ISE'"'V is the Xilinx® design software suite that allows

you to take your design from design entry through Xilinx device programming. The ISE Project

Navigator manages and processes your design through the following steps in the ISE design flow.

3.2 Project Navigator Overview

Project Navigator organizes design files and runs processes to move the design from design entry

through implementation to programming the targeted Xilinx® device. Project Navigator is the high-level

manager for Xilinx FPGA and CPLD designs, which allows doing the following:

1. Add and create design source files, which appear in the Sources window

2. Modify your source files in the Workspace

3. Run processes on your source files in the Processes window

4. View output from the processes in the Transcript window

Optionally, we can run processes from a script created or from a command line prompt.

However, it is recommended that we first become familiar with the basic use of the Xilinx Integrated

Software Environment (ISETM) software and with project management.

Project navigator main window is divided into four (4) types of sub windows, they are as follows:

1. Tool bar

of 82

2. Sources window

3. Processes window

4. Workspace

5. Transcript window

From the figure below on the top left is the Sources window which hierarchically displays the

elements included in the project. Beneath the Sources window is the Processes window, which displays

available processes for the currently selected source. The third window at the bottom of the Project

Navigator is the Transcript window which displays status messages, errors, and warnings and also

contains interactive tabs for Tcl scripting and the Find in Files function. The fourth window to the right

is a multi-document interface (MDI) window referred to as the Workspace. It enables you to view html

reports, ASCII text files, schematics, and simulation waveforms.

3.2.1 Project Navigator Main Window

Fig 3.1: Project Navigator Main Window

of 82

3.3 ISE Design flow

The ISE Project Navigator manages and processes the design through the following steps in the

ISE design flow.

3.3.1 Design Entry

Design entry is the first step in the ISE design flow. During design entry, one creates the source

files based on design objectives. Also we can create the top-level design file using a Hardware

Description Language (HDL), such as VHDL, Verilog, or ABEL, or using a schematic. Use multiple

formats for the lower-level source files in the design.

If we are working with a synthesized EDIF or NGCINGO file, then skip design entry and

synthesis and start with the implementation process.

3.3.2 Synthesis

After design entry and optional simulation, run synthesis. During this step, VHDL, Verilog, or

mixed language designs become netlist files that are accepted as input to the implementation step.

3.3.3 Implementation

After synthesis, run design implementation, which converts the logical design into a physical file

format that can be downloaded to the selected target device. From Project Navigator, run the

implementation process in one step, or run each of the implementation processes separately.

Implementation processes V(fly depending on whether we are targeting a Field Programmable Gate

Array (FPGA) or a Complex Programmable Logic Device (CPLD).

3.3.4 Verification

It verifies the functionality of the design at several points in the design flow. Then we can use

simulator software to verify the functionality and timing of the design or a portion of design. The

simulator interprets VHDL or Verilog code into circuit functionality and displays logical results of the

described HDL to determine correct circuit operation. Simulation allows creating and verifying complex

functions in a relatively small amount of time and also run in circuit verification after programming

device.

3.3.5 Device Configuration

After generating a programming file, configure the device. During configuration, generate

configuration files and download the programming files from a host computer to a Xilinx device.

IMPACT tool Overview

of 82

IMPACT, is a tool featuring batch and graphical user interface (GUI) operations, allows you to

perform the following functions: Device Configuration and File Generation.

The Device Configuration enables you to directly configure Xilinx® FPGAs or program Xilinx

CPLDs and PROMs with the Xilinx cables (MutiPRO Desktop Tool, Parallel Cable IV, or Platform

Cable USB) in various modes. In the Boundary-Scan mode, Xilinx FPGAs, CPLDs, and PROMs com be

configured or programmed. In the Slave Serial or Select MAP configuration modes only FPGAs can be

configured directly. In the Desktop Configuration mode Xilinx CPLDs or PROMs can be programmed.

In the Direct SPI Configuration mode select SPJ serial flash (STMicro: M25P, M25PE, M45PE or

Atmel: AT45DB) can be programmed.

File Generation enables you to create the following types of programming files; System ACE

CF, PROM, SVF, STAPL, and XSVF files.

IMPACT also enables us to do the following:

1. Read back and verify design configuration data

2. Debug configuration problems

3. Execute SVF and XSVF files

Fig 3.2: Hardware interconnection

3.3.6 FPGA Design flow

of 82

Design VerificationDesign Entry

Design Synthesis

Design Implementation

Behavioral Simulation

Functional Simulation

Static Timing Analysis

Timing Simulation

Back Annotation

Fig 3.3: FPGA Design Flow

3.4 Core Generator

The CORE Generator TM is a design tool that delivers parameterized Intellectual Property (IP)

optimized for Xilinx-FPGAs.

The CORE Generator provides ready-made functions which include:

1. FIFOs and memories

2. Reed-Solomon Decoder and Encoder

3. Fir filters

4. FFTs

5. Standard bus interfaces such as PCI and PCI-X,

Connectivity and networking interfaces (Ethernet, SPJ-4.2, Rapid IO, CAN and PCI Express).

3.4.1 Memory Interface Generator

This Memory Interface Generator (AIIG) is a simple menu driven tool to generate advanced

memory interfaces. DDR2 SDRAM, DDR SDRAM DDRII SRANM, QDRII SRAM, and RLDRAM II

are supported. This tool generates HDL and pin placement constraints that will help us design our

application

3.4.2 Memory Interface Generator

Interfacing QDRII SRAM with MIG

The Figure below shows a top-level block diagram a/the QDRII memory controller. One side of

the QDRII memory controller connects to the user interface denoted as Block Application. The other

side of the controller interfaces to QDRII memory. The memory interface data width is selectable.

of 82

Xilinx Device Programming

In-circuit Verification

QDR-II Memory Controller

Block Application

QDR-II Memory

Fig. 3.4: QDR-II Memory Controller

Data is double-pumped to QDRJJ SRAM on both the positive and the negative clock edges. The

HSTL_18 Class I/O standard is used for the data, address, and control signals. QDR-II SRAM interfaces

are source-synchronous and double data rate like DDR SDRAM interfaces. The key advantage to QDR-

II devices is they have separate data buses for reads and writes to SRAM. These rams are faster and

more protected from error and faults.

Interface model

The memory interface is layered to simplify the design and make the design modular-The Figure

below shows the layered memory interface in the QDRII memory controller-The three layers are the

application layer, the implementation layer, and the physical layer

The application layer comprises the user interface, which initiates memory

writes and reads by writing data and memory addresses to the User Interface

FIFOs. The implementation layer comprises the infrastructure, datapath, and

control logic.

1. The infrastructure logic consists of the DCM and reset logic generation circuitry.

2. The datapath logic consists of the calibration logic by which the data from the

memory component is captured using the FPGA clock.

3. The control logic determines the type of data transfer that is, read/write with

the memory component, depending on the User Interface FIFO’s status signals.

of 82

User Interface

Implementation Layer

Infrastructure Data path Control

Physical Layer

Fig. 3.5: Interface layering model

The physical layer comprises the I/O elements of the FPGA. The controller

communicates with the memory component using this layer. The I/O elements

(such as IDDRs, ODDRs, and IDELAY elements) are associated with this layer.

Hierarchy

The above figure shows the hierarchical structure of the QDRII SRAM design generated by MIG

with a test bench and a DCM. The modules are classified as follows:

Design modules

1. Test bench modules

2. Clocks and reset generation modules parameters selected from MIG.

MIG can generate QDRII SRAM designs in four different ways:

1. With a test bench and a DCM

2. Without a test bench and with a DCM

3. With a test bench and without a DCM

4. Without a test bench and without a DCM

Design clocks and resets are generated in the infrastructure_top module. When the use DCM

option is checked in MIG, a DCM primitive and the necessary clock buffers are instantiated in the

infrastructure_top module. The inputs to this module are the differential design clock and a 200 MHz

differential clock required for the IDELAYCTRL module. A user reset is also input to this module.

Using the input clocks and reset signals, the system clocks and the system resets used in the design are

generated in this module. When the Use DCM option is unchecked in MIG, the infrastructure _top

module does not have the DCM and the corresponding clock buffer instantiations; therefore, the system

operates on the user-provided clocks. The system reset is generated in the infrastructure top module

using the DCM_LOCK signal and the ready signal of the IDELAYCTRL element.

of 82

Fig. 3.8: QDR·II SRAM Controller Hierarchy

3.5 Chip Scope Pro

After configuring the device, debug the FPGA design using Chip scope™ Pro software. From the

Project Navigator Processes tab, double-click Analyze Design Using Chip scope to launch the Chip

scope Pro Analyzer. To use this process, purchase the Xilinx@ Chip scope Pro software and must design

with debug and verification in mind, as described in the following sections. Chip Scope Pro comprises

the Chip Scope Pro cores in the CORE Generator, the Chip Scope Pro Core Inserter, and the Chip Scope

Pro Analyzer.

We use Chip Scope Pro to test the interfacing logic on the hardware i.e., Virtex 5 FPGA by

analyzing the user interface signals which include PPD interface and PLB interface. These signals are

captured using the FIFOs implemented in FPGA and sent to the display interface on PC using JTAG.

3.5.1 Chip Scope Pro Design Flow Overview

To use the Chip Scope Pro software to perform in-circuit verification, we should do the

following:

of 82

1. Insert Chip scope Pro cores in the design using the CORE Generator or Core Inserter.

2. Implement the design in Project Navigator and configure device.

3. Analyze the design using the Chip Scope Pro Analyzer

3.5.2 Chip scope Pro Core Insertion

It is used to insert Chip scope Pro cores in the design with the Chip scope Pro tools using one of

the following methods:

1. During design entry using the CORE Generator.

Using the CORE Generator software we create the cores and instantiate those in HDL

source file. Use this software to generate all of the cores available in the Chip scope Pro system. The

wizard provided to create NGC net lists with HDL instantiation templates for any of the supported

synthesis tools. Then use the templates to connect the Chip scope Pro cores to the design logic.

2. After the Synthesize process in Chip scope Pro Core Inserter.

Using the Chip scope Pro Core Inserter to create the ILA, ATC2, and ICON cores and insert

them in a post-synthesis netlist.

Projects saved in the Core Inserter hold all relevant information about source files, destination

files, core parameters, and core settings. This allows you to store and retrieve information about core

insertion between sessions. The project file (.cdc extension) can also be used as an input to the Analyzer

to import signal names.

Fig. 3.9: Core Inserter as Launched from Project Navigator

of 82

2.5.3 Chip scope Pro Cores

Chip scope Pro allows embedding the following cores within design, which assist with

on-chip debugging: integrated logic analyzer (ILA), integrated bus analyzer (IBA), and virtual

input/output (VIO) low-profile software cores. These cores allow viewing internal signals and

nodes in FPGA, including the IBM® Core Connect" processor local bus (PLB) that supports the

IBM PowerPC TM 405. Following are the Chip scope Pro cores and their functions:

1. ICON

The Integrated Controller (ICON) core provides the communication between the

embedded ILA, IBA, and VIO cores and the computer running the Chip scope Pro

Analyzer software.

2. ILA

The ILA core is a customizable logic analyzer core that can be used to monitor the

internal signals in design. Because the ILA core is synchronous to the design being

monitored, all design clock constraints applied to design are also applied to the

components inside the ILA core.

3. ATC2

The Agilent Trace Core 2 (ATC2) is a customizable logic analyzer core. This is similar to

the ILA core but does not use on-chip Block RAM resources to store captured trace data. The

ATC2 core synchronizes Chip scope Pro to the Agilent FPGA dynamic probe technology,

delivering the first integrated application for FPGA debug with logic analyzers.

4. VIO

The virtual input/output core is a customizable core that can both monitor and

drive internal FPGA signals in real lime. Unlike the ILA and IBA cores, the VIO core

does not require on chip RAM.

of 82

Fig. 3.10: Chip scope Pro Cores

3.5.4 Chip scope Pro Analyzer

The Chip scope Pro Analyzer tool interfaces directly to the Chip scope Pro cores. Using this

software to download designs, set Trigger conditions, and display data. The waveforms, lists, or graphs,

can be shown and values can be tokenized.

3.6 ModelSim

ModelSim provides a comprehensive simulation and debug environment for complex ASIC and

FPGA designs. Support is provided for multiple languages including Verilog, System Verilog, VHDL

and SystemtC. It also provides an integrated flow with the Model Technology ModelSim simulator

which enables simulation to run from the Xilinx Project Navigator graphical user interface.

Pathnames pane values pane waveform pane

of 82

Cursor name pane cursor value pane cursor pane

Fig. 3.11: Panes of Wave Window

of 82

CHAPTER 4

HARDWARE BOARD DESCRIPTION

4.1 Board Overview

The hardware on which we are working is a subsystem on single board which is used in the

processing of signals intercepted. II consists of Xilinx" FPGAs, Optical transceivers, cPCI Interface,

Memories (DDR SDRAM and QDRII SRAM) and Ethernet Interface.

4.1.1 Requirement of this Board

Before designing this board as many as 16 independent boards were used for the purpose. But

due to the advances in VLSI technology, all these are now integrated onto a single board. This is highly

advantageous as the board thus developed is smaller in size and the speed of operation is faster.

IV.1.2Board Block Diagram

cPCI Backplane

Fig. 4.1: Board Block Diagram

of 82

Virtex-II Pro

XC2VP7

PPD Main

Address Control & Data

De interleavedPDW

36MB QDR-II SRAM

Virtex-4XC4LX100

Virtex-II Pro XC2VP7

10/100 Ethernet

PHY

8MB Flash

Memory

128MB DDR

SDRAM

cPCI Bridge

4.2 Signal Interception

4.2.1 Block Diagram

Fig. 4.2: Signal Reception Block diagram

4.2.2 Signal Reception

An ESM system comprises of a receiver which intercepts pulses from various sources of

emission in the environment and determines the pulse parameters which mainly include Frequency,

Pulse Width, Direction of Arrival (DOA) and Amplitude, Using these parameters, it builds the Pulse

Descriptor (PD) Word which is a digitized form of the pulse information. The PD Words are transmitted

by the receiver over optical fiber to the ESM Processor, The function of the ESM Processor is to receive

the interleaved PD Words, de-interleave them, build the emitter file and send the information to display.

This de-interleaving is done at two levels, which are operated independently.

The PD Words received over optical fiber in serial form is converted to parallel form by the Multi

Gigabit Transceiver (MGT) core of the Virtex-II Pro FPGA. The speed of operation of this MGT core is

3.125 Gbits/sec. This Virtex-II Pro FPGA then sends parallel data to Virtex-4 FPGA for 1st level de-

interleaving.

4.3 1st level De-interleaving

In first level de-interleaving, the PD Words are de-interleaved by Virtex-4 Pro FPGA based on

Intra pulse parameters-Frequency, DOA, and Pulse Width and stored in memory. Here, Virtex-4 Pro

used is a logic extensive FPGA. The de-interleaved PD Words are stored in the dual port memory

(SRAM) using the concept of Content Addressable Memory (CAM). This memory has both write and

read independent ports so we can do both read and/or write operation at a time.

Here we are using quad data rate SRAM which does 4 operations in one clock cycle i.e., two

write and two read operation here one of the operations is done at the rising edge of the clock pulse and

other one at the falling edge of the clock pulse.

of 82

PPD Rear IO Module


The simple block diagram of 1st level of de-interleaving is shown below

PDW

Fig 4.3: 1st level de-interleaving

4.4 Memory details

4.4.1 Content Addressable Memory

Content-addressable memory (CAM) is a special type of computer memory used in certain very

high speed searching applications, Unlike standard computer memory (RAM) in which the user supplies

a memory address and the RAM returns the data word stored at that address, a CAM is designed such

that the user supplies a data word and the CAM searches its entire memory to see if that data word is

stored anywhere in it. If the data word is found, the CAM returns a list of one or more storage addresses

where the word was found. In case the word is not found, then it stores it in a new location and returns

the location address. Thus, a CAM is the hardware embodiment of what in software terms would be

called an associative array.

4.4.2 Why is memory required?

Memory is used to solve the speed synchronization problems caused due to a mismatch in the

access rates of the lst and 2nd levels of de-interleaving. Ist level deinterleaving uses. Hardware which

processes and stores one PD Word at a time, hence it is faster. A 2nd level de-interleaving use Software

which reads a group of de-interleaved PD Words at a time and processes them. Hence it is slower. .

So, a memory is required to store the de-interleaved PD words outputted by the 1st stage. For the

purpose of storage of PD Words de-interleaved in Ist level, dual port memory is used, which is required

of 82

36MB QDR-II SRAM

Virtex-4XC4LX100


to be independently accessed by the two processes. This type of a memory improves performance by

reducing the memory access conflicts between the two levels, and thus Increases the speed of operation.

Due to high speed memory access requirements, Quad Data Rate SRAMs are used which have

independent read and write ports. This SRAM ideally suits the requirement as there are independent

ports for writing and reading.

4.5 2nd level De-interleaving

In second level de-interleaving, the de-interleaved PD Words are read from memory and

processed to extract the emitter parameters which mainly include Frequency, Pulse Width, DoA,

Amplitude, and Pulse Repetition Frequency (PRF). With the help of these parameters, the emitter file is

built and sent to display via Ethernet.

Fig. 4.4: 2nd level de-interleaving

4.6 Scope of the Project

Our Project involves implementation of an algorithm in VHDL, to control SRAM

memory access using Virtex-4 FPGA. So, we here develop a logic for interfacing Virtex-

4 Pro FPGA with QDR-II SRAM For designing and simulation testing the logic, we will

be using XilinxTM ISE vI0.l.

Interleaved

PD Data

of 82

Virtex-4XC4LX100

36MB QDR-II SRAM


Viterx-4 Pro FPGA PPD

SRAM Interfacing Logic

in VHDL

SRAM

Fig. 4.5: SRAM interfacing logic

The code is implemented in two phases,

1. Write Cycle: Interfacing QDRII with PPD.

2. Read Cycle: Interfacing QDRII with Emitter Processor Software

Virtex-4 Pro

Fig. 4.6: Software interface developed in VHDL

of 82

PPDS/W

Interface QDR-II Memory

Controller

Emitter Processor PLB i/f

QDR-II Memory

V2 PRO

CHAPTER 5

VHDL INTERFACE DESIGN AND STATE DIAGRAM

5.1 Write-Read State Machines

Fig. 5.1: Read-Write Interfaces developed

The above diagram tells about the read write interface developed. The VHDL language is written

in the s/w interface. PPD is mainly used for writing into the QDR-II memory by the QDR-II memory

controller, and PLB is for reading the data inside the QDR-II memory.

VHDL Language – s/w interface

PPD – Writing

PLB – Reading

The PowerPC™ 405 core accesses high speed and high performance system resources through

Processor Local Bus (PLB) interfaces on the instruction and data cache controllers. The PLB interfaces

provide separate 32-bit address and 64-bit data buses for the instruction and data sides.

The PLB supports read and write data transfers between master and slave devices equipped with

a PLB bus interface and connected through PLB signals. Bus architecture supports multiple master and

slave devices. Each PLB master is attached to the PLB through separate address, read-data, and write-

data buses. PLB slaves are attached to the PLB through shared, but decoupled, address, read-data, and

write-data buses and a plurality of transfer control and status signals for each data bus.

of 82

PPDS/W

Interface

Emitter Processor PLB i/f

QDR-II Memory

Controller

QDR-II Memory

V2 PRO

5.2 State Diagrams

Invalid State del_cal=’1’

hw_fifo_empty=’0’ and

user_wr_full=’0’

Fig. 5.2: Write cycle state diagram

reset = 1 dly_calc=1

proc_rd=0

proc_rd=1 proc_rd=1 proc_rd=1

proc_addr(1 down to 0)=11 proc_addr(1 downto 0_=01

proc_addr(1 downto)=10

proc_rd=0 proc_rd=0 proc_rd=0 test_w_n =1

of 82

INIT_WR

IDLE_WR

LT_PDW_0_1

LT_PDW_2_3

WRFIFO_RD

IDLE_RD

LATCH_RD_ADDR

LT_W3

LT_W2

LT_W1

ACK_W0_GEN

INIT_RD

user_rd_full=0

user_qr_empty=0

Fig. 5.3: Read Cycle state diagram

CHAPTER 6

TEST RESULTS, CONCLUSION AND FUTURE SCOPE OF

WORK

6.1 Simulation Results in ModelSim

6.1.1 Write Cycle

The PPD logic and the processor PLB interface operate with external clock as reference, whereas

the QDRII SRAM Memory Controller operates at 166MHz which is the operating frequency of QDRII

SRAM device. The reset signal used is synchronous with respect to QDRII SRAM reference clock.

A signal with name, ‘dly_cal_done’ is an indicator signal which will indicate when the QDRII

SRAM device calibration is completed and is ready for access.

The logic uses a FIFO interface to store the processed PD Word which are written with a

minimum time of 200ns, which are to be written into QDRII SRAM device. We simulated this

requirement by generating a signal ‘wr_pulse’ every 200ns. We employed a counter generate the 128 bit

PD Word to be written the hardware address and hardware data into 5 FIFOs, 1for address and 4 for

data(32 bit each).

The QDRII SRAM operates at a clock rate of 166 Mhz. So, we take a user_clk equals to

166MHz. The write state machine remains idle till ‘dly_cal_done = 1’ condition has occurred. Once the

data is written into the FIFOs, the ‘hw_fifo_empty’ signal goes low signifying that there is a data present

in the PPD FIFO interface. As it goes low, at the next rising edge of the user clock, the state machine

mves into the ‘wrfifo_rd’ state and hardware read, ’hw_rd’ becomes high. The data and address are now

read from the FIFOSs into ‘qdr_wrdata’ and ‘hw_addr_out’ respectively. ‘qdr_wrdata’ which is a data

output of FIFOs is a 128 bit data line. The state machine next moves into ‘lt_pdw_0_1’ and subsequently

into ‘lt_pdw_2_3’ states. ‘user_w_n_i’ is an active low signal to latch the 128bit PD Word. ‘lt_pdw is a

of 82

LATCH_EPW_2_3

RD_ADDR_Wr

WAIT_QR_EMPTY

LATCH_EPW_0_1

ACK_W3_GEN

ACK_W2_GEN

ACK_W1_GEN

2 bit vector which is ‘01’ for lower 64 bit data and ‘10’ for higher 64 bit data. ’user_dwl’ and

’user_dwh’ are two 32 bit data lines. Of the 64 bit data, the lower 32 bits are latched to ‘user_dwl’ and

higher 32 bits are latched to ‘user_dwh’. ’test_w_n_i’is the active low signal used to inhabit generation

of ‘user_r_n_i’ active low signal for read operation at the same time of ‘user_w_n_i’ signal generation.

of 82

Fig 6.1: Write Cycle Simulation Results (1)

of 82

Fig. 6.2: Write Cycle Simulation Results (2)

of 82

6.1.2 READ CYCLE

The read cycle is initiated by the Processor Local Bus (PLB). This bus is a 32-bit data bus. A

Read signal is generated every time a read operation is initiated by embedded PowerPc processor of

Virtex-II FPGA. These read request are simulated using VHDL and implemented in vertex-4 FPGA.

The read requests are generated every 2 microseconds. The ‘proc_rd’ signal goes high along with

address ‘proc_addr’, including the PLB for reading data from the QDR-II SRAM. Once the user

interface receives the address from PLB, it starts reading the data from the specified location onto its bus

(user interface). Once the data is present in user interface bus then it is latched onto the PLB data once

the ‘fifo_empty’ signal goes low. Then an acknowledgement signal is generated by the SRAM

suggesting that the data has been latched onto the PLB bus. Since the PLB bus is 32 bit data bus, unlike

in write cycle, only one word at a time is latched onto the PLB bus. As there are four PD Words (PDW)

to be read, it takes 4 read cycles to read them. When the ‘user_qr_empty’ signal low the first two words

(W0 and W1) are ready present on ‘user_qrl’ and ‘user_qrh’ respectively. This condition is known as

‘first word fall through’. So, the word W0 and W1 are latched onto the 128 bit ‘qdr_rddata’ bus when

‘user_qr_empty’ signal goes low. In the next clock cycle, W2 and W3 are latched onto ‘qdr_rddata’ bus,

and W0 is latched on the PLB data bus. PD Words W1, W2 and W3 are then latched in the next read

request cycles onto the PLB data bus, from ‘qdr_rddata’ bus. An acknowledgement signal ‘rd_ack’ is

generated by read state machine every time the data is latched onto the PLB bus.

of 82

Fig. 6.3: Read Cycle Simulation Results (1)

of 82

Fig. 6.4: Read Cycle Simulation Results (2)

of 82

6.2 Hardware Verification using Chip Scope Pro

We use Chip scope Pro to test the interfacing logic on the hardware i.e., Virtex 5 FPGA by

analyzing the user interface signals which include PPD interface and PLB interface. These signals are

captured using the FIFOs implemented in FPGA and sent to the display interface on PC using JTAG.

PPD interface include the signal hw_data, hw_data and hw_addr. PLB interface include the signal

proc_data, proc_data, proc_addr, rd_ack. The debugging of this signal is done using Chip scope Pro

inserter by creating a definition and connection file (.cdc) to synthesized VHDL code.

The in-circuit verification of this signal is done using Chip scope Pro Analyzer (Refer fig.6.7). The main

windaw area can display multiple child windows ( such as trigger, waveform, listing, plot windows) at

the same time. Each window can be maximized, minimized resized and moved as needed. The signals

attached to Chip scope Pro Inserter Core as shown in the signal browser. The trigger setup window is

used to specify the condition for triggering and storing the data. The waveform window displays all the

signal which are sampled with respect to system clock of FPGA. The window is useful to analyze the

timings of the interface signals. Another window is the listing window (Refer fig.6.8) which display

interface buses which are stored using the storage qualification in the trigger setup.

6.2.1 Write Interface

To capture the data in the listing window for the PPD interface, we have added the signals hw_data1,

hw_data2, hw_data3, hw_data4. The storage condition used is the falling edge of hw_wr. The trigger

condition is an initial value of hw_data which is common for bath PPD and PLB interface. Using these

condition we captured the data which and exported it into an excel sheet for future reference.

6.2.2 Read Interface

To capture the data in the listing window for the PLB interface, we have added the signals proc_data,

proc_addr. The storage condition used is the falling edge of rd_ack. The trigger is an initial value of

hw_data which is common for both PPD and PLB interface. Using these condition we captured the data

and exported it into an excel for future reference.

From the two listing, we infer that the data written into the QDR-II memory and the data read from the

QDR-II memory match with one another. Hence, the interface designed by us fulfills the requirement of

the project.

of 82

Fig: 6.5: Chip scope pro verification waveform

of 82

Fig: 6.6: Chip scope Pro Verification listing of Write cycle

of 82

Fig. 6.7: Chip scope Pro Verification Listing of Read cycle

of 82

6.3 Conclusion

The VHDL code is written for the interface to control the SRAM memory access using Virtex-4

FPGA. The same has been verified using Modelsim simulation graphs and chip scope pro hardware

simulation in XilinxTM ISE 10.1. The result have been studied and verified. This interfacing logic enables

us to access SRAM with the highest possible speed which supports writing of continuous data input

stream at a rate of 640mbps. This interface logic can be utilized for interfacing QDR memory devices of

upcoming generation with improved technology. The interface enables us to attach to the PLB interface

of embedded PowerPC processor of Virtex family FPGAs with ease.

6.4 Future Scope of Work

The future scope of work for this project includes development of read and write interface

between the QDR-II Memory controller and QDR-II memory. Future projects involve implementation of

ESM processor which is an integration of PPD logic and EP software on a single chip. This helps in the

system on-chip implementation of ESM processor subsystem using a single FPGA (Virtex5 and above).

of 82

APPENDIX-A: Program Code

1.1 Software interface code in VHDL

library IEEE;use IEEE.STD_LOGIC_1164.ALL;use IEEE.STD_LOGIC_ARITH.ALL;use IEEE.STD_LOGIC_UNSIGNED.ALL;use work.QDR2_SRAM_parameters_0.all;

---- Uncomment the following library declaration if instantiating---- any Xilinx primitives in this code.library UNISIM;use UNISIM.VComponents.all;

entity qdr_dpif is port(

user_clk0 : in std_logic;user_reset : in std_logic;dly_cal_done : in std_logic;

--PPD IF--------------------------clk_100 : in std_logic;hw_wr : in std_logic;hw_data : in std_logic_vector( 127 downto 0 );hw_addr : in std_logic_vector( 20 downto 0 );al_full : out std_logic;

proc_rd : in std_logic;proc_addr : in std_logic_vector( 22 downto 0 );proc_data : out std_logic_vector( 31 downto 0 );rd_ack : out std_logic;

user_w_n : out std_logic;user_r_n : out std_logic;user_ad_wr : outstd_logic_vector((ADDR_WIDTH_4D-1) downto 0);user_bwl_n : out std_logic_vector((BW_WIDTH-1) downto 0);user_bwh_n : out std_logic_vector((BW_WIDTH-1) downto 0);user_dwl : out std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);user_dwh : out std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);user_ad_rd : out std_logic_vector((ADDR_WIDTH_4D-1) downto 0);user_qen_n : out std_logic;compare_error : out std_logic;user_wr_full : in std_logic;user_rd_full : in std_logic;user_qrl : in std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);user_qrh : in std_logic_vector((CNTRL_DATA_WIDTH-1) downto 0);user_qr_empty : in std_logic

);end qdr_dpif;

of 82

architecture Behavioral of qdr_dpif is

component synchroport(

reset : in std_logic; clock : in std_logic; sig_in : in std_logic; sig_out : out std_logic

);end component;

signal reset_r : std_logic;

constant unused : std_logic_vector(BW_WIDTH-1 downto 0) := (others => '0');

-- PPD HWDATA & HWADDR FIFO SIGNALSsignal qdr_wrdata : std_logic_vector( 127 downto 0 );signal data_al_full : std_logic_vector(3 downto 0);signal addr_al_full : std_logic;signal hw_data_empty : std_logic_vector(3 downto 0);signal hw_addr_empty : std_logic;signal hw_fifo_empty : std_logic;

signal hw_addr_in : std_logic_vector( 31 downto 0 );signal hw_addr_out : std_logic_vector( 31 downto 0 );

TYPE write_state_type is(INIT_WR,IDLE_WR,WRFIFO_RD,LT_PDW_0_1,LT_PDW_2_3

);

signal write_cs : write_state_type;signal write_ns : write_state_type;

signal hw_rd : std_logic;signal test_w_n_i : std_logic;signal user_w_n_i : std_logic;signal lt_hwdata : std_logic_vector(1 downto 0);

TYPE read_state_type is(INIT_RD,IDLE_RD,LATCH_RD_ADDR,RDADDR_WR,WAIT_Q_EMPTY,LT_EPW_0_1,LT_EPW_2_3_W0,ACK_W0_GEN,LT_W1,ACK_W1_GEN,

of 82

LT_W2,ACK_W2_GEN,LT_W3,ACK_W3_GEN

);

signal read_cs : read_state_type;signal read_ns : read_state_type;

signal lt_rd_ad : std_logic;signal user_r_n_i : std_logic;signal user_qen_n_i : std_logic;

signal lt_q_0_1 : std_logic;signal lt_q_2_3 : std_logic;signal lt_word : std_logic_vector( 3 downto 0 );

signal proc_rd_sync : std_logic;signal proc_addr_sync : std_logic_vector( 22 downto 0 );

signal proc_data_i : std_logic_vector( 31 downto 0 );signal rd_ack_i : std_logic;

signal qdr_rddata : std_logic_vector(127 downto 0 );

signal byte_enb : std_logic_vector(7 downto 0);signal user_ad_rd_i : std_logic_vector((ADDR_WIDTH_4D-1) downto 0);signal user_ad_wr_i : std_logic_vector((ADDR_WIDTH_4D-1) downto 0);

begin

compare_error <= '0';

user_w_n <= user_w_n_i;user_r_n <= user_r_n_i;user_qen_n <= user_qen_n_i;

process (user_clk0) begin if(user_clk0'event and user_clk0 = '1') then

proc_data <= proc_data_i;rd_ack <= rd_ack_i;

end if; end process;

byte_enb <= "00000000";user_bwl_n <= byte_enb((BW_WIDTH-1) downto 0);user_bwh_n <= byte_enb((BW_WIDTH-1) downto 0);

process (user_clk0) begin if(user_clk0'event and user_clk0 = '1') then

of 82

reset_r <= user_reset; end if; end process; --------------WR_SM-----------------------------------------------------------------------------------------

process (user_clk0) begin if(user_clk0'event and user_clk0 = '1') then if(reset_r = '1') then write_cs <= INIT_WR; else write_cs <= write_ns; end if; end if; end process;

process (write_cs, dly_cal_done, user_wr_full, hw_fifo_empty )begin

write_ns <= write_cs;

case write_cs iswhen INIT_WR =>

if(dly_cal_done = '1') thenwrite_ns <= IDLE_WR;

end if;

when IDLE_WR =>if( user_wr_full = '0' and hw_fifo_empty = '0' ) then

write_ns <= WRFIFO_RD;end if;

when WRFIFO_RD =>write_ns <= LT_PDW_0_1;

when LT_PDW_0_1 =>write_ns <= LT_PDW_2_3;

when LT_PDW_2_3 =>write_ns <= IDLE_WR;

when others => write_ns <= INIT_WR;

end case;end process;

with write_cs selecthw_rd <= '1' when WRFIFO_RD, '0' when others;

with write_cs selecttest_w_n_i <= '0' when LT_PDW_0_1,

of 82

'1' when others;

with write_cs selectuser_w_n_i <= '0' when LT_PDW_2_3, '1' when others;

with write_cs selectlt_hwdata <= "01" when LT_PDW_0_1, "10" when LT_PDW_2_3,

"00" when others;

process(user_clk0)begin

if(user_clk0' event and user_clk0 = '1') thenif(reset_r = '1') then

user_dwl <= (others => '0');user_dwh <= (others => '0');

elsecase lt_hwdata is

when "01" => user_dwl <= X"0" & qdr_wrdata( 31 downto 0 );user_dwh <= X"0" & qdr_wrdata( 63 downto 32 );

when "10" => user_dwl <= X"0" & qdr_wrdata( 95 downto 64 );user_dwh <= X"0" & qdr_wrdata( 127 downto 96 );

when others => null;end case;

end if;end if;

end process;

--------------------------------------------------------------------------------------------------------------------------RD_SM-----------------------------------------------------------------------------------------

PROC_RD_SYNC_INST:synchro port map(

reset => reset_r, clock => user_clk0, sig_in => proc_rd, sig_out => proc_rd_sync

);

PROC_ADDR_SYNC_GEN: for i in 22 downto 0 generatePROC_ADDR_SYNC_INST:

synchro port map( reset => reset_r, clock => user_clk0, sig_in => proc_addr(i), sig_out => proc_addr_sync(i)

);end generate PROC_ADDR_SYNC_GEN;

process (user_clk0) of 82

beginif(user_clk0'event and user_clk0 = '1') then

if(reset_r = '1') thenread_cs <= INIT_RD;

elseread_cs <= read_ns;

end if;end if;

end process;

process ( read_cs, dly_cal_done, proc_rd_sync, proc_addr_sync, test_w_n_i, user_rd_full, user_qr_empty )begin

read_ns <= read_cs;

case read_cs iswhen INIT_RD =>

if(dly_cal_done = '1') thenread_ns <= IDLE_RD;

end if;

when IDLE_RD =>if proc_rd_sync = '1' then

case proc_addr_sync(1 downto 0) iswhen "00" => read_ns <= LATCH_RD_ADDR;when "01" => read_ns <= LT_W1;when "10" => read_ns <= LT_W2;when "11" => read_ns <= LT_W3;when others => null;end case;

end if;

when LATCH_RD_ADDR =>if test_w_n_i = '1' and user_rd_full = '0' then

read_ns <= RDADDR_WR;end if;

when RDADDR_WR =>read_ns <= WAIT_Q_EMPTY;

when WAIT_Q_EMPTY =>if(user_qr_empty = '0') then

read_ns <= LT_EPW_0_1;end if;

when LT_EPW_0_1 =>read_ns <= LT_EPW_2_3_W0;

when LT_EPW_2_3_W0 =>read_ns <= ACK_W0_GEN;

when ACK_W0_GEN =>if proc_rd_sync = '0' then

of 82

read_ns <= IDLE_RD;end if;

when LT_W1 =>read_ns <= ACK_W1_GEN;









when others =>read_ns <= INIT_RD;


with read_cs selectlt_rd_ad <= '1' when LATCH_RD_ADDR,

'0' when others;

with read_cs selectuser_r_n_i <= '0' when RDADDR_WR,

'1' when others;

with read_cs selectuser_qen_n_i <= '0' when LT_EPW_0_1 | LT_EPW_2_3_W0,

'1' when others;

with read_cs selectlt_q_0_1 <= '1' when LT_EPW_0_1,

'0' when others;

with read_cs selectlt_q_2_3 <= '1' when LT_EPW_2_3_W0,

'0' when others;

with read_cs select of 82

lt_word <= "0001" when LT_EPW_2_3_W0,"0010" when LT_W1,"0100" when LT_W2,"1000" when LT_W3,"0000" when others;

with read_cs selectrd_ack_i <= '1' when ACK_W0_GEN | ACK_W1_GEN | ACK_W2_GEN | ACK_W3_GEN,

'0' when others;

process (user_clk0) begin if(user_clk0'event and user_clk0 = '1') then if(reset_r = '1') then

qdr_rddata <= (others => '0'); else

if lt_q_0_1 = '1' thenqdr_rddata( 63 downto 0) <= user_qrh(31 downto 0) &

user_qrl(31 downto 0);end if;if lt_q_2_3 = '1' then

qdr_rddata(127 downto 64) <= user_qrh(31 downto 0) & user_qrl(31 downto 0);end if;

end if; end if; end process;


proc_data_i <= (others => '0'); else

case lt_word iswhen "0001" => proc_data_i <= qdr_rddata( 31 downto 0);when "0010" => proc_data_i <= qdr_rddata( 63 downto 32);when "0100" => proc_data_i <= qdr_rddata( 95 downto 64);when "1000" => proc_data_i <= qdr_rddata(127 downto 96);

when others => null;end case;


--------------------------------------------------------------------------------------------------------------------------ADDR_GEN0-------------------------------------------------------------------------------------

user_ad_rd <= user_ad_rd_i;user_ad_wr <= user_ad_wr_i;

of 82


user_ad_wr_i <= (others => '0'); elsif( test_w_n_i = '0' ) thenuser_ad_wr_i <= hw_addr_out((ADDR_WIDTH_4D-1) downto 0); end if; end if; end process;


user_ad_rd_i <= (others => '0'); elsif lt_rd_ad = '1' thenuser_ad_rd_i <= proc_addr_sync((ADDR_WIDTH_4D+1) downto2); end if; end if; end process;--------------------------------------------------------------------------------------------------------------------------PPD_FIFO_IF-----------------------------------------------------------------------------------PPD_DATA : for I in 3 downto 0 generatebegin

DATA_FIFO : FIFO16generic map

(FIRST_WORD_FALL_THROUGH => false,ALMOST_FULL_OFFSET => X"00F",DATA_WIDTH => 36

)port map (

DI => hw_data(I*32+31 downto I*32),DIP => byte_enb(3 downto 0),RDCLK => user_clk0,RDEN => hw_rd,RST => reset_r,WRCLK => clk_100,WREN => hw_wr,ALMOSTEMPTY => open,ALMOSTFULL => data_al_full(I),DO => qdr_wrdata(I*32+31 downto I*32),DOP => open,EMPTY => hw_data_empty(I),FULL => open,RDCOUNT => open,RDERR => open,WRCOUNT => open,WRERR => open

of 82

);end generate PPD_DATA;

hw_addr_in(20 downto 0) <= hw_addr;hw_addr_in(31 downto 21) <= (others => '0');

PPD_ADDR_FIFO : FIFO16generic map

(FIRST_WORD_FALL_THROUGH => false,ALMOST_FULL_OFFSET => X"00F",DATA_WIDTH => 36

)port map (

DI => hw_addr_in,DIP => byte_enb(3 downto 0),RDCLK => user_clk0,RDEN => hw_rd,RST => reset_r,WRCLK => clk_100,WREN => hw_wr,ALMOSTEMPTY => open,ALMOSTFULL => addr_al_full,DO => hw_addr_out,DOP => open,EMPTY => hw_addr_empty,FULL => open,RDCOUNT => open,RDERR => open,WRCOUNT => open,WRERR => open

);

al_full <= data_al_full(0) or data_al_full(1) or data_al_full(2) or data_al_full(3) or addr_al_full;

hw_fifo_empty <= hw_data_empty(0) or hw_data_empty(1) or hw_data_empty(2) or hw_data_empty(3) or hw_addr_empty;

end Behavioral;

1.2 VHDL code to generate input for testing

library IEEE;use IEEE.STD_LOGIC_1164.ALL;use IEEE.STD_LOGIC_ARITH.ALL;use IEEE.STD_LOGIC_UNSIGNED.ALL;

---- Uncomment the following library declaration if instantiating---- any Xilinx primitives in this code.--library UNISIM;

of 82

--use UNISIM.VComponents.all;

entity hwdata_sim isPort (

clk_100 : in std_logic;reset : in std_logic;dly_cal_done : in std_logic;

--debughw_test_led : out std_logic;

hw_wr : out std_logic;hw_data : out std_logic_vector( 127 downto 0 );hw_addr : out std_logic_vector( 20 downto 0 );b0_full : in std_logic

);end hwdata_sim;

architecture Behavioral of hwdata_sim is

signal reset_r : std_logic;signal dly_cal_done_r : std_logic;

constant INIT : std_logic_vector( 5 downto 0 ) := "000001";constant IDLE : std_logic_vector( 5 downto 0 ) := "000010";constant WR_GEN : std_logic_vector( 5 downto 0 ) := "000100";constant DUMMY_ST : std_logic_vector( 5 downto 0 ) := "010000";constant INC_ADDR : std_logic_vector( 5 downto 0 ) := "100000";

signal current_state : std_logic_vector( 5 downto 0 );signal next_state : std_logic_vector( 5 downto 0 );

signal counter : std_logic_vector( 29 downto 0 );signal wr_count : std_logic_vector( 7 downto 0 );signal wr_pulse : std_logic;

signal hw_data1 : std_logic_vector( 31 downto 0 );signal hw_data2 : std_logic_vector( 31 downto 0 );signal hw_data3 : std_logic_vector( 31 downto 0 );signal hw_data4 : std_logic_vector( 31 downto 0 );

signal hw_data_i : std_logic_vector(127 downto 0 );signal hw_addr_i : std_logic_vector( 20 downto 0 );signal hw_wr_i : std_logic;

--debugsignal counter_dbg : std_logic_vector( 29 downto 0 );

signal hw_wr_dbg : std_logic;

signal del_1 : std_logic;

constant zeroes_23 : std_logic_vector( 22 downto 0 ) := (others => '0'); of 82

constant zeroes_30 : std_logic_vector( 29 downto 0 ) := (others => '0');constant zeroes_32 : std_logic_vector( 31 downto 0 ) := (others => '0');

begin

hw_data <= hw_data_i;hw_addr <= hw_addr_i( 20 downto 0 );hw_wr <= hw_wr_i;

process (clk_100) begin if(clk_100'event and clk_100 = '1') then reset_r <= reset;

dly_cal_done_r <= dly_cal_done; end if; end process; process (clk_100) begin if(clk_100'event and clk_100 = '1') then if reset_r = '1' or dly_cal_done_r = '0' or wr_pulse = '1' then

wr_count <= (others => '0'); else

wr_count <= wr_count + '1'; end if; end if; end process; wr_pulse <= '1' when wr_count = X"13" else

'0';

process (clk_100) begin if(clk_100'event and clk_100 = '1') then if(reset_r = '1') then current_state <= INIT; else current_state <= next_state; end if; end if; end process;

process (current_state, dly_cal_done, b0_full, wr_pulse )begin

next_state <= current_state;

case current_state iswhen INIT =>

if(dly_cal_done = '1') thennext_state <= IDLE;

end if;

when IDLE => of 82

if wr_pulse = '1' and b0_full = '0' thennext_state <= WR_GEN;

end if;

when WR_GEN =>next_state <= DUMMY_ST;

when DUMMY_ST =>next_state <= INC_ADDR;

when INC_ADDR =>next_state <= IDLE;

when others =>next_state <= INIT;


hw_wr_i <= current_state(2);

process (clk_100) begin if(clk_100'event and clk_100 = '1') then if(reset_r = '1') then

counter <= (others => '0'); elsif( current_state(5) = '1' ) then

counter <= counter + '1'; end if; end if; end process;

hw_addr_i <= counter( 20 downto 0 );--hw_addr_i <= counter(2 downto 0) & counter( 21 downto 3 );

hw_data1 <= counter & "00";hw_data2 <= counter & "01";hw_data3 <= counter & "10";hw_data4 <= counter & "11";

hw_data_i <= hw_data4 & hw_data3 & hw_data2 & hw_data1;

--debug-----------

process (clk_100)begin

if(clk_100'event and clk_100 = '1') thencounter_dbg <= counter;hw_wr_dbg <= hw_wr_i;

end if;end process;

del_1 <= '1' when counter_dbg = zeroes_30 else'0';

of 82

hw_test_led <= del_1 or hw_wr_dbg;

end Behavioral;

1.3 VHDL code to receive and view Output

library IEEE;use IEEE.STD_LOGIC_1164.ALL;use IEEE.STD_LOGIC_ARITH.ALL;use IEEE.STD_LOGIC_UNSIGNED.ALL;

---- Uncomment the following library declaration if instantiating---- any Xilinx primitives in this code.--library UNISIM;--use UNISIM.VComponents.all;

entity procdata_sim isPort (

clk_100 : in std_logic;reset : in std_logic;dly_cal_done : in std_logic;

proc_rd : out std_logic;proc_addr : out std_logic_vector( 22 downto 0 );proc_data : in std_logic_vector( 31 downto 0 );

test_done : out std_logic;rd_ack : in std_logic

);end procdata_sim;

architecture Behavioral of procdata_sim is

component synchroport(

reset : in std_logic; clock : in std_logic; sig_in : in std_logic; sig_out : out std_logic

);end component;

signal reset_r : std_logic;signal dly_cal_done_r : std_logic;

signal rd_count : std_logic_vector(12 downto 0);signal rd_pulse : std_logic;

constant INIT : std_logic_vector(7 downto 0) := "00000001";

of 82

constant WRP_WAIT : std_logic_vector(7 downto 0) := "00000010";constant IDLE : std_logic_vector(7 downto 0) := "00000100";constant RD_GEN : std_logic_vector(7 downto 0) := "00001000";constant WAIT_RDACK : std_logic_vector(7 downto 0) := "00010000";constant CNT_CHK : std_logic_vector(7 downto 0) := "00100000";constant INC_ADDR : std_logic_vector(7 downto 0) := "01000000";constant RST_BRST : std_logic_vector(7 downto 0) := "10000000";

signal current_state : std_logic_vector(7 downto 0);signal next_state : std_logic_vector(7 downto 0);

signal word_count : std_logic_vector(4 downto 0);

signal counter : std_logic_vector(22 downto 0);signal proc_addr_i : std_logic_vector(22 downto 0);signal proc_rd_i : std_logic;

signal proc_data_sync : std_logic_vector(31 downto 0);signal rd_ack_sync : std_logic;

signal proc_data_i : std_logic_vector( 31 downto 0 );

TYPE ackgen_state_type is(IDLE_ACK,WAIT_FOR_ACK,ACK_GEN

);

signal ackgen_cs : ackgen_state_type;signal ackgen_ns : ackgen_state_type;

signal qdr_mem_ack : std_logic;

--debugsignal proc_rd_dbg : std_logic;signal proc_addr_dbg : std_logic_vector(22 downto 0);signal rd_ack_sync_dbg : std_logic;signal proc_data_dbg : std_logic_vector(31 downto 0);

signal del_1 : std_logic;signal del_2 : std_logic;

constant zeroes_23 : std_logic_vector( 22 downto 0 ) := (others => '0');constant zeroes_32 : std_logic_vector( 31 downto 0 ) := (others => '0');

begin

proc_addr <= proc_addr_i( 22 downto 0 );proc_rd <= proc_rd_i;

process (clk_100) begin if(clk_100'event and clk_100 = '1') then

of 82

reset_r <= reset;dly_cal_done_r <= dly_cal_done;

end if; end process;

RD_ACK_SYNC_INST:synchro port map(

reset => reset_r, clock => clk_100, sig_in => rd_ack, sig_out => rd_ack_sync

);

process (clk_100) begin if(clk_100'event and clk_100 = '1') then if(reset_r = '1' or dly_cal_done_r = '0') then ackgen_cs <= IDLE_ACK; else ackgen_cs <= ackgen_ns; end if; end if; end process;

process (ackgen_cs, proc_rd_i, rd_ack_sync ) begin

ackgen_ns <= ackgen_cs;

case ackgen_cs iswhen IDLE_ACK =>

if proc_rd_i = '1' AND rd_ack_sync = '0' thenackgen_ns <= WAIT_FOR_ACK;

end if;

when WAIT_FOR_ACK =>if rd_ack_sync = '1' then

ackgen_ns <= ACK_GEN; end if;

when ACK_GEN =>if proc_rd_i = '0' then

ackgen_ns <= IDLE_ACK;end if;

when others =>ackgen_ns <= IDLE_ACK;


with ackgen_cs selectqdr_mem_ack <= '1' when ACK_GEN, '0' when others;

of 82

PROC_DATA_SYNC_GEN: for i in 31 downto 0 generatePROC_DATA_SYNC_INST:

synchro port map( reset => reset_r, clock => clk_100, sig_in => proc_data(i), sig_out => proc_data_sync(i)

);end generate PROC_DATA_SYNC_GEN;

process (clk_100) begin if(clk_100'event and clk_100 = '1') then if reset_r = '1' or dly_cal_done_r = '0' or rd_pulse = '1' then

rd_count <= (others => '0'); else

rd_count <= rd_count + '1'; end if; end if; end process; rd_pulse <= '1' when rd_count = X"14D" else

'0';

process (clk_100) begin if(clk_100'event and clk_100 = '1') then if(reset_r = '1') then current_state <= INIT; else current_state <= next_state; end if; end if; end process;

process (current_state, dly_cal_done_r, rd_pulse, qdr_mem_ack, word_count ) begin

next_state <= current_state;

case current_state iswhen INIT =>

if(dly_cal_done_r = '1') thennext_state <= IDLE;

end if;

when IDLE =>if rd_pulse = '1' then

next_state <= RD_GEN; end if;

when RD_GEN =>next_state <= WAIT_RDACK;

of 82

when WAIT_RDACK =>if qdr_mem_ack = '1' then

next_state <= CNT_CHK;end if;

when CNT_CHK =>if word_count(4) = '1' then

next_state <= RST_BRST;else

next_state <= INC_ADDR;end if;

when INC_ADDR =>next_state <= RD_GEN;

when RST_BRST =>next_state <= IDLE;

when others =>next_state <= INIT;


proc_rd_i <= current_state(3) or current_state(4);

process (clk_100) begin if(clk_100'event and clk_100 = '1') then if( reset_r = '1' or current_state(7) = '1' ) then

word_count <= "00001"; elsif( current_state(6) = '1' ) then

word_count <= word_count + '1'; end if; end if; end process;

process (clk_100) begin if(clk_100'event and clk_100 = '1') then if(reset_r = '1' ) then

counter <= (others => '0'); elsif( current_state(6) = '1' or current_state(7) = '1' ) then

counter <= counter + '1'; end if; end if; end process;

proc_addr_i <= counter( 22 downto 0 );--proc_addr_i <= counter(3 downto 2) & counter( 22 downto 4 ) & counter(1 downto 0);

process (clk_100) begin if(clk_100'event and clk_100 = '1') then

of 82

if reset_r = '1' thenproc_data_i <= (others => '0');

elsif proc_rd_i = '1' thenproc_data_i <= proc_data_sync;


process (clk_100)begin

if(clk_100'event and clk_100 = '1') thenproc_rd_dbg <= proc_rd_i;proc_addr_dbg <= proc_addr_i;rd_ack_sync_dbg <= qdr_mem_ack;proc_data_dbg <= proc_data_i;

end if;end process;

del_1 <= '1' when proc_addr_dbg = zeroes_23 else

'0';

del_2 <= '1' when proc_data_dbg = zeroes_32 else'0';

test_done <= proc_rd_dbg or proc_rd_dbg or del_1 or rd_ack_sync_dbg or del_2;

end Behavioral;

of 82

APPENDIX B: NAMES OF DIFFERENT MODULES AND THEIR

PURPOSE IN QDR-IISRAM CONTROLLER

1) Filename: qdr_sram.vhd

Purpose: This is the main module of the controller. It has clock forwarding logic, delay calibration

logic to capture data and synchronize it to the FPGA clock, interface controller logic.

2) Filename: mig_23_idelay_ctrl.vhd

Purpose: This module implements the delay generation for Calibration circuit.

3) Filename: qdr_sram_infrastructure_top.vhd

Purpose: This module incorporates Clock generation module, and Reset logic.

4) Filename: qdr_sram_main_0.vhd

Purpose: Top level example design incorporating QDRII Memory Controller module, an example

Clock generation module, and Reset logic.

5) Filename: qdr_sram_top_0.vhd

Purpose: Top level module for QDR-II memory controller design. This is the main module that

should be instantiated into a new FPGA design (along with all sub-modules) to implement a QDRII

interface.

6) Filename: qdr_sram_user_interface_0.vhd

Purpose: Responsible for storing the Read/Write requests made by the user design. Instantiates, the

FIFOs for Read and Write address, data, and control storage

7) Filename: qdr_sramJd_user_interface_0.vhd

Purpose: Responsible for storing the Read requests made by the user design Instantiates the FIFOs

for Read address, data, and control storage.

8) Filename: qdr_sramJd_addr_interface_0.vhd

Purpose: Responsible for storing the Read requests made by the user design. Instantiates the FIFOs

for Read address and control storage.

9) Filename: qdr_sramJd_data_interface_0.vhd

Purpose: Responsible for storing the Read requests made by the user design. Instantiates, the FIFOs

for Read data storage.

of 82

10) Filename: qdr_sram_data_fifo_mem_0.vhd

Purpose: Responsible for storing the Write/Read requests made by the user design. Instantiates, the

FIFOs for Write/Read data storage

11) Filename: qdr_sram_wr_user_interface_0.vhd

Purpose: Responsible for storing the Write requests made by the user design. Instantiates the FIFOs

for Write address, data, and control storage.

12) Filename: qdr_sram_wr_addr_interface_0.vhd


for Write address and control storage.

13) Filename: qdr_sram_wr_data_interface_0.vhd


for write data storage.

14) Filename: qdr_sram_data_fifo_18_0.vhd


FIFOs for Write/Read data storage.

15) Filename: qdr_sram_data_bw_fifo_0.vhd


FIFOs for Write/Read data storage.

16) Filename: qdr_sram_qdr_mem_sm_0.vhd

Purpose: Monitors Read/Write queue status from User Interface FIFOs and generates strobe

signals to launch Read/Write requests to QDR II device.

17) Filename: qdr_sram_iobs_0.vhd

Purpose: This module implements the physical interface for the Write data path generates the write

path (QDR-II) from the WRITE data FIFOs to the OBUFs.

18) Filename: qdr_sram_c1ockjorward_0.vhd

Purpose: This module implements the physical interface for the clock path generates the forwarded

clocks (K and K) for the QDR-II SRAM Memory device. This scheme is used to match the Clock-to-Out

delays of the data path.

19) Filename: qdr_sram_ctrUobs_0.vhd

Purpose: This module implements the physical interface for the memory control signals.

of 82

20) Filename: qdr_sram_address_burst_0.vhd

Purpose: This module is a part of physical interface. It describes the way the FF's and OBUFT's

need to be instantiated in order to present the address to the external memory:

21) Filename: qdr_sram_qdrJd_enable.vhd

Purpose: This module generates QDR_R_n (Read Enable) and QDR_W_n (Write Enable)

for QDR memory.

22) Filename: qdr_sram_bw_burst_0.vhd

Purpose: This module implements the physical interface for the Byte Write enable path Generates

the byte write path (BW_n) from the WRITE address FIFO to the OBUFs.

23) Filename: qdr_sram_data_path_iobs_0.vhd

Purpose: This module implements the physical interface for the Write data, read data path.

24) Filename: qdr_sram_qdr_d_iob_0.vhd

Purpose: This module transfers the data from memory to FIFO'S.

25) Filename: qdr_sram_qdr_cq_iob_0.vhd

Purpose: This module implements the delaying of echo clock CQ.

26) Filename: qdr_sram_qdr_q_iob_0.vhd

Purpose: This captures data from memory.

27) Filename: qdr_sram_data_path_0.vhd

Purpose: This module acts as an interface between the users and IOBs.

28) Filename: qdr_sramJead_ctrl_0.vhd

Purpose: This module generates QDR_R_n (Read Enable for QDR memory) and strobe for READ

FIFO.

29) Filename: qdr_sram_tap_logic_0.vhd

Purpose: This module implements the tap generation for the Read path (QDR_Q).

30) Filename: qdr_sram_dly_cal_sm.vhd

Purpose: Calibrates the IDELAY tap values for the QDR_Q inputs to allow direct capture of the

read data into the system clock domain.

31) Filename: qdr_sram_data_tap_inc.vhd

Purpose: This module implements the tap selection controller for data bits associated with a strobe.

of 82

32) Filename: qdr_sram_write_burst_0.vhd

Purpose: This module implements the physical interface for the Write path. Generates the write

path (QDR_D)ji-ol17 the WRITE data FIFOs to the OBUFs.

33) Filename: qdr_sram_test_hench_0.vhd

Purpose: This module implements a hardware test bench that will issue interleaved Read and Write

requests to the QDR II memory device.

34) Filename: qdr_sram_wr_rd_sm_0.vhd

Purpose: This module implements a state machine for issuing Read/Write requests to the QDR II

memory device.

35) Filename: qdr_sram_q_sm_0.vhd

Purpose: This module implements a state machine for reading back values from read data FIFO'S

and comparing the values generated in test bench and also serves as an error detection module to make

sure that the data returning from the memory is same as the data written to it.

36) Filename: qdr_sram_data_gen_0.vhd

Purpose: This module implements a data generator that generates data for Read and Write requests

to the QDR II memory device

37) Filename: qdr_sram_addr_gen_0.vhd

Purpose: The module is a part of internal test bench It generates addresses for both read and

write.

of 82

APPENDIX C: CHIP SCOPE PRO LISTING OF WRITE CYCLE

Sample in Window hw_addr_23_2 hw_data1 hw_data2 hw_data3 hw_data4

1 000000 00000000 00000001 00000002 000000032 000001 00000004 00000005 00000006 000000073 000002 00000008 00000009 0000000A 0000000B4 000003 0000000C 0000000D 0000000E 0000000F5 000004 00000010 00000011 00000012 000000136 000005 00000014 00000015 00000016 000000177 000006 00000018 00000019 0000001A 0000001B8 000007 0000001C 00000000 0000001E 0000001F9 000008 00000020 00000021 00000022 0000002310 000009 00000020 00000025 00000026 0000002711 00000A 00000028 00000029 0000002A 0000002B12 00000B 0000002C 0000002D 0000002E 0000002F13 00000C 00000030 00000031 00000032 0000003314 00000D 00000034 00000035 00000036 0000003715 00000E 00000038 00000039 0000003A 0000003B16 00000F 0000003C 0000003D 0000003E 0000003F17 000010 00000040 00000041 00000042 0000004318 000011 00000044 00000045 00000046 0000004719 000012 00000048 00000049 0000004A 0000004B20 000013 0000004C 0000004D 0000004E 0000004F21 000014 00000050 00000051 00000052 0000005322 000015 00000054 00000055 00000056 0000005723 000016 00000058 00000059 0000005A 0000005B24 000017 0000005C 0000005D 0000005E 0000005F25 000018 00000060 00000061 00000062 0000006326 000019 00000064 00000065 00000066 0000006727 00001A 00000068 00000069 0000006A 0000006B28 00001B 0000006C 0000006D 0000006E 0000006F29 00001C 00000070 00000071 00000072 0000007330 00001D 00000074 00000075 00000076 0000007731 00001E 00000078 00000079 0000007A 0000007B32 00001F 0000007C 0000007D 0000007E 0000007F33 000020 00000080 00000081 00000082 0000008334 000021 00000084 00000085 00000086 0000008735 000022 00000088 00000089 0000008A 0000008B36 000023 0000008C 0000008D 0000008E 0000008F37 000024 00000090 00000091 00000092 0000009338 000025 00000094 00000095 00000096 0000009739 000026 00000098 00000099 0000009A 0000009B40 000027 0000009C 0000009D 0000009E 0000009F41 000028 000000A0 000000A1 000000A2 000000A342 000029 000000A4 000000A5 000000A6 000000A743 00002A 000000A8 000000A9 000000AA 000000AB44 00002B 000000AC 000000AD 000000AE 000000AF45 00002C 000000B0 000000B1 000000B2 000000B346 00002D 000000B4 000000B5 000000B6 000000B7

of 82

APPENDIX D: CHIP SCOPE PRO LISTING OF READ CYCLE

Sample in Window proc_addr proc_data

1 000000 00000000 2 000001 00000001 3 000002 00000002 4 000003 00000003

5 000004 000000046 000005 000000057 000006 000000068 000007 000000079 000008 0000000810 000009 0000000911 00000A 0000000A12 00000B 0000000B13 00000C 0000000C14 00000D 0000000D15 00000E 0000000E16 00000F 0000000F17 000010 0000001018 000011 0000001119 000012 0000001220 000013 0000001321 000014 0000001422 000015 0000001523 000016 0000001624 000017 0000001725 000018 0000001826 000019 0000001927 00001A 0000001A28 00001B 0000001B29 00001C 0000001C30 00001D 0000001D31 00001E 0000001E32 00001F 0000001F33 000020 0000002034 000021 0000002135 000022 0000002236 000023 0000002337 000024 0000002438 000025 0000002539 000026 0000002640 000027 0000002741 000028 0000002842 000029 0000002943 00002A 0000002A

44 00002B 0000002B 45 00002C 0000002C 46 00002D 0000002D

of 82

REFERENCES:

[1] Clive Maxfield “The design warrior's guide to FPGAs”

[2] Will R. Moore, Wayne Luk “Field-programmable Logic and Applications”

[3] Marian Adamski, Marek Wegrzyn “Design of embedded control systems”

[4] Sunggu Lee “Advanced Digital Logic Design”

[5] Pong P Chu “RTL hardware design using VHDL “

[6] http://wwwfpga4fun.com

[7] http://www.fpgasummit.com

[8] http://www.fpga.com

[9]http://video.google.comlvideoplay?docid=-5776J46032722J35072

[10]http:/www.xilinx.comlsupport/documentation/virtex-4_userguides.htm

[11]http://www.actel.com/documents/modelsim_tutorial_ug.pdf

[12]http://www.xilinx.com/ise/optionalyrod/cspro.htm

[13]http://japan.xilinx.com/products/ipcenter/DO-CSP-PRO.htm

of 82

http://japan.xilinx.com/products/ipcenter/DO-CSP-PRO.htm

http://www.xilinx.com/ise/optionalyrod/cspro.htm

http://www.actel.com/documents/modelsim_tutorial_ug.pdf

http://www.fpga.com/

http://wwwfpga4fun.com/

esm

Documents

embedded systems control

embedded processors

highvolume embedded

simple embedded devices

system hardware

complete system

physical constraints

simple menu system