programming of digital signal processors and data...
TRANSCRIPT
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
1
Programming of Digital Signal Processors and Data Transmission via the PCI Bus
(Master Thesis)
Martin BARVA August 2002
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
2
CONTENTS
CONTENTS.............................................................................................................................................2
FIGURES .................................................................................................................................................6
PREFACE................................................................................................................................................8
ABOUT CD..............................................................................................................................................9
HARDWARE AND SOFTWARE ................................................................................................................9 DSP Part..........................................................................................................................................9 PCI Part...........................................................................................................................................9
1. INTRODUCTION TO DSP PART............................................................................................10
2. DIGITAL SIGNAL PROCESSING PROCESSORS ..............................................................12
3. TMS320C6000 DSP PLATFORM.............................................................................................14
3.1 TMS320C6000 DSP PROCESSOR ARCHITECTURE ...............................................................14 3.1.1 Key Features of TMS320C62x/TMS320C67x Device ....................................................14 3.1.2 Central Processing Unit Core ........................................................................................15 3.1.3 Memory ...........................................................................................................................18 3.1.4 Peripherals......................................................................................................................19
3.2 TMS320C6701 EVALUATION MODULE...............................................................................20 3.2.1 Key Features of TMS320C6701 Evaluation Module......................................................20 3.2.2 TMS320C6701 Evaluation Module Hardware Functional Overview............................20
3.3 IMPLEMENTATION OF DSP ALGORITHMS .............................................................................22 3.3.1 Low-Level Implementation of DSP Algorithms ..............................................................22 3.3.2 High Level Implementation of DSP Algorithms .............................................................25 3.3.3 Comparison of Low- and High-Level Implementation Approach ..................................29
4. MATHEMATICAL BACKGROUND OF IMPLEMENTED DSP ALGORITHMS..........30
4.1 FINITE IMPULSE RESPONSE (FIR) DIGITAL FILTER ...............................................................30 4.1.1 Properties of FIR filter ...................................................................................................30 4.1.2 Coefficients Calculation by means of Window Method..................................................31
4.2 INFINITE IMPULSE RESPONSE (IIR) DIGITAL FILTER............................................................33 4.2.1 IIR Filter Implementation ...............................................................................................33 4.2.2 Coefficients Calculation using Bilinear Transform Method ..........................................33
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
3
4.3 ADAPTIVE FILTERS...............................................................................................................34 4.3.1 Structure of Adaptive Filter ............................................................................................34 4.3.2 Least Mean Square (LMS) Adaptive Filter.....................................................................35
4.4 FAST FOURIER TRANSFORM .................................................................................................36 4.4.1 Calculation Cost of DFT.................................................................................................36 4.4.2 Mathematical Background of FFT - DIT Algorithm ......................................................36 4.4.3 Computational Cost of FFT with Decimation in Time...................................................39
5. IMPLEMENTATION.................................................................................................................41
5.1 CODEC ................................................................................................................................41 5.1.1 Loopback Example..........................................................................................................41 5.1.2 InAndOut example ..........................................................................................................43 5.1.3 Generator Example.........................................................................................................44
5.2 DSP ALGORITHMS ...............................................................................................................45 5.2.1 Examples of Low-Level Implementation of DSP Algorithms .........................................45 5.2.2 Examples of High-Level Implementation of DSP Algorithms ........................................47
6. SUMMARY OF DSP PART.......................................................................................................50
7. INTRODUCTION TO PCI PART ............................................................................................51
8. PERIPHERAL COMPONENT INTERCONNECT (PCI) BUS............................................52
8.1 INTRODUCTION TO COMPUTER BUSES..................................................................................52 8.1.1 Division of Computer Buses ...........................................................................................52 8.1.2 Computer Buses before PCI ...........................................................................................53
8.2 INTRODUCTION TO PCI BUS.................................................................................................53 8.3 KEY FEATURES OF PCI BUS..................................................................................................54 8.4 PCI SIGNALS ........................................................................................................................54
8.4.1 System Signals.................................................................................................................55 8.4.2 Address and Data Signals...............................................................................................55 8.4.3 Interface Control Signals................................................................................................56 8.4.4 Arbitration.......................................................................................................................56 8.4.5 Error Reporting ..............................................................................................................56 8.4.6 Interrupt Signals .............................................................................................................56 8.4.7 64-bit extension ...............................................................................................................56 8.4.8 JTAG Signals ..................................................................................................................56
8.5 ARBITRATION .......................................................................................................................57 8.5.1 BUS Parking ...................................................................................................................57
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
4
8.6 BUS PROTOCOL.....................................................................................................................57 8.6.1 PCI Bus Command..........................................................................................................57 8.6.2 Byte Enable .....................................................................................................................59 8.6.3 Basic PCI Transactions ..................................................................................................59 8.6.4 Latency ............................................................................................................................61 8.6.5 Error Detection and Reporting.......................................................................................62 8.6.6 Target-Initiated Termination of Transaction .................................................................62
8.7 ADVANCED FEATURES OF PCI BUS......................................................................................63 8.7.1 Interrupt Handling ..........................................................................................................63 8.7.2 Special Cycle...................................................................................................................63 8.7.3 64-bit extension ...............................................................................................................64
8.8 PLUG AND PLAY CONFIGURATION .......................................................................................64 8.8.1 PCI Configuration Space................................................................................................64 8.8.2 Structure of Configuration Space ...................................................................................65 8.8.3 PCI BIOS ........................................................................................................................68
9. PLX HARDWARE AND SOFTWARE DEVELOPMENT TOOLS ....................................69
9.1 PCI 9050 BUS TARGET INTERFACE CHIP.............................................................................69 9.1.1 PCI 9050 Main Features ................................................................................................70 9.1.2 PCI Bus Interface of PCI 9050 Bus Interface Chip .......................................................70 9.1.3 Local Bus Interface of PCI 9050 Bus Interface Chip.....................................................70 9.1.4 Single Cycle Write and Read ..........................................................................................72 9.1.5 PCI Configuration Registers and Local Configuration Registers .................................72 9.1.6 Serial EEPROM..............................................................................................................73 9.1.7 Local Chip Select ............................................................................................................73
9.2 PCI 9050 REFERENCE DESIGN KIT (RDK) ..........................................................................73 9.2.1 Main features of PCI 9050 Reference Design Kit ..........................................................74 9.2.2 PCI 9050RDK Subsystems..............................................................................................74
9.3 PCI 9050 SOFTWARE DESIGN KIT (SDK) AND PLXMON ...................................................76
10. DESIGNED PCI DEVICE.....................................................................................................77
10.1 APPLICATION OVERVIEW .....................................................................................................77 10.2 HARDWARE PART OF DEVICE...............................................................................................78
10.2.1 Latch Circuitry on PCI 9050RDK .............................................................................78 10.2.2 Timing Diagrams .......................................................................................................79 10.2.3 Parallel Port Configuration.......................................................................................82 10.2.4 Application Registers .................................................................................................82
10.3 SOFTWARE PART OF DEVICE.................................................................................................82
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
5
10.3.1 Device Driver .............................................................................................................82 10.3.2 Example Software Application...................................................................................83
11. SUMMARY OF PCI PART ..................................................................................................86
CONCLUSION......................................................................................................................................87
BIBLIOGRAPHY .................................................................................................................................89
APPENDIX ............................................................................................................................................90
A1 EXAMPLE OF EXECUTABLE GENERATION ......................................................................................90 A2 SCHEME OF LATCH CIRCUITRY......................................................................................................93
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
6
FIGURES
Figure 1.1: Digital signal processing chain. ................................................................ 12
Figure 1.2: Definition of real-time DSP processing.................................................... 12
Figure 3.1: C62x/C67x block diagram. ....................................................................... 15
Figure 3.2: Central processing unit core. .................................................................... 16
Figure 3.3: Data paths of 'C67x device. ...................................................................... 17
Figure 3.4: Functional diagram of the 'C6701 EVM. ................................................. 21
Figure 3.5: Software development flow. ..................................................................... 23
Figure 3.6: Graphical interface of the Code Composer Studio................................... 24
Figure 3.7: C6701 EVM simulink library blocks........................................................ 26
Figure 3.8: Example of simulink model designed for executable generation. ........... 27
Figure 3.9: Build process of the executable. ............................................................... 28
Figure 3.10: High-level object oriented view of the executable................................. 29
Figure 4.1: Periodical transfer function of FIR filter. ................................................. 31
Figure 4.2: Coefficients of ideal FIR filter.................................................................. 32
Figure 4.3: Block diagram of adaptive filter. .............................................................. 35
Figure 4.4: Two basic DSP operation.......................................................................... 37
Figure 4.5: DSP flowchart display of equation 4.22................................................... 38
Figure 4.6: 8-point DFT expressed with four 2-point DFT. ....................................... 38
Figure 4.7: FFT butterfly topology.............................................................................. 39
Figure 4.8: Complete 8-point FFT............................................................................... 39
Figure 5.1: Loopback example. ................................................................................... 42
Figure 5.2: InAndOut example. ................................................................................... 43
Figure 5.3: Generator example. ................................................................................... 44
Figure 5.4: FFT example. ............................................................................................ 46
Figure 5.5: FIR graphical user interface...................................................................... 47
Figure 5.6: IIR graphical user interface....................................................................... 49
Figure 8.1: Computer bus. ........................................................................................... 52
Figure 8.2: PCI bus diagram........................................................................................ 55
Figure 8.3: PCI read transaction. ................................................................................. 59
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
7
Figure 8.4: PCI write transaction................................................................................. 60
Figure 8.5: Bus latency. ............................................................................................... 61
Figure 8.6: Target disconnect. ..................................................................................... 62
Figure 8.7: Target abort. .............................................................................................. 63
Figure 8.8: Type 0 configuration header. .................................................................... 65
Figure 8.9: Structure of capabilities list. ..................................................................... 67
Figure 9.1: PCI 9050 bus interface chip...................................................................... 69
Figure 9.2: Single local bus write. ............................................................................... 72
Figure 9.3: Single local bus read. ................................................................................ 72
Figure 9.4: PLX PCI 9050RDK block diagram. ......................................................... 74
Figure 10.1: Block diagram of the application............................................................ 77
Figure 10.2: Scheme of the latch circuitry. ................................................................. 78
Figure 10.3: Data transfer between PCI bus and PCI9050 write FIFO. ..................... 80
Figure 10.4: Data transfer between PCI 9050 FIFO and latch circuitry..................... 81
Figure 10.5: Data transfer between latch circuitry and parallel port. ......................... 81
Figure 10.6: WritePCI program writes data into the PCITOLPT device. .................. 84
Figure 10.7: ReadLPT program reads data stored in the PCITOLPT device............. 84
Figure A1.1: Simulink model to be converted into executable. ................................. 90
Figure A1.2: Setting of the solver. .............................................................................. 90
Figure A1.3: Setting the Real-Time Workshop parameters........................................ 91
Figure A1.4: Press the Build & Run button to execute the build process. ................. 91
Figure A2.1: Octal latch circuitry. ............................................................................... 94
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
8
PREFACE
The aim of this project, which consists of two parts, was to develop
applications that could be used by students to explore the domain of the digital signal
processing and the peripheral component interconnect bus.
The digital signal processing part explains through chapters 1 - 6 different
approaches that can be taken in order to implement DSP algorithms into signal
processors.
The second part, that forms chapters 7 - 11 is focused on the peripheral
component interconnect bus and its possible use for data transfer between two
computer systems.
The project was carried out in a laboratory of Institut National des Sciences
Appliquées de Lyon, France. I would like to especially thank Dr. Philippe
Delachartre for his valuable advice and technical support.
M. Barva
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
9
ABOUT CD
Included CD contains files concerning the DSP and PCI part of the project.
The files are located in following directory structure:
• \DOC: Electronic version of the final report in the Microsoft Word 97 format.
• \DSP: Its subdirectories contain DSP applications that are described in chapter 5.
• \PCI\BIN: Two Win9x programs, WritePCI and ReadLPT described in section
10.3.2.
• \PCI\DRIVER: Win9x device drivers for the PCITOLPT device and parallel port.
• \PCI\SRC: Source files of the WritePCI and ReadLPT application for Microsoft
Visual C++ ver.6.
HARDWARE AND SOFTWARE
This section contains description of the hardware and software configuration
used to develop and test functionality of the designed applications.
DSP Part
• TMS320C6701 EVM: Texas Instruments evaluation module with a
TMS320C6701 signal processor.
• Windows 98: Operating system.
• Code Composer Studio ver.1.0: Integrated development environment for Texas
Instruments DSP processors.
• Matlab ver.6.1, Real Time Workshop, Developer's Kit for TI DSP and Simulink:
Used for high-level implementation of DSP algorithms.
PCI Part
• PLX PCI 9050RDK with latch circuitry: This development kit was used to built
PCI compliant circuitry.
• Windows 98: Operating system.
• Microsoft Visual C++ ver.6: 32-bit programming environment.
• WinDriver: Tool for device driver development.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
10
1. INTRODUCTION TO DSP PART
Digital Signal Processing (DSP) differentiates from other areas of computer
science by the type of data it uses: signals. The signals can, for example, originate
from sensors providing information about the real world such as seismic vibrations,
sound waves, images, etc.
The notion "digital signal processing" comprises the mathematical
background, the algorithms, and the techniques that are used to manipulate and
process input data.
Signals are processed in order to achieve a wide variety of goals, e.g.
compression of data for storage and transmission, recognition and generation of
speech, image enhancement, extracting information encoded in the signal, etc.
Traditional signal processing was achieved by using analogue components
such as resistors, capacitors and inductors. However, the tolerances associated with
these components, or temperature can affect the effectiveness of analogue circuitry.
The main objective of the DSP part of this project is to allow the reader to
familiarize himself / herself with the fundamentals concerning implementation of
DSP algorithms into DSP processors. As the developed examples were implemented
into the TMS320C6701 Evaluation Module (EVM), this report likewise contains
more detailed description of the evaluation module and the ‘C67x digital signal
processor.
The DSP part of this report consists of five chapters:
Chapter 2 introduces the notion “digital signal processor. It describes
differences in architecture from other general-purpose processors and gives typical
examples of applications, where DSP processors are used.
Chapter 3 is focused on TMS320C62x/67x DSP processors by Texas
Instruments. Further, it contains description of the TMS320C6701 EVM. Finally,
different approaches of DSP algorithm implementations are presented.
Chapter 4 provides mathematical background of the DSP algorithms that
were implemented within the framework of this project.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
11
Chapter 5 describes created applications.
Chapter 6 contains summary of the DSP part.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
12
2. DIGITAL SIGNAL PROCESSING
PROCESSORS
In most cases, digital signal processing applications are implemented as
algorithms that run on a special processors called the digital signal processing (DSP)
processors.
The block diagram in figure 1.1 shows a typical digital signal processing
chain, where both, the input and output signals are analog.
Figure 1.1: Digital signal processing chain.
The input signal passes through a low-pass filter before it enters an analog-to-
digital converter, where it is sampled with a constant sampling frequency. Every
sample is then processed with the DSP algorithm that is implemented in the DSP
processor. The result of the operation then passes through a digital-to-analog
converter and a low-pass filter.
From the DSP processor is often required a real-time performance, that is to
say that the processor must be able to process a sample, before the next one comes.
An example of a real-time data processing is shown in figure 1.2.
Figure 1.2: Definition of real-time DSP processing.
A signal is sampled with sampling frequency of 40 kHz (time between two
samples is 1/40000 = 25 µs ). Upon the signal is applied an algorithm that needs 100
instructions to process one sample. If a DSP processor with 30 ns cycle time is
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
13
considered, then the waiting time can be determined by deduction of the process time
(30 ns x 100) from the time between samples (25 µs). If the waiting time is greater or
equal to zero, then the application meets the real-time demands.
Nowadays, a wide variety of digital signal processing algorithms are
implemented. Yet, among the most common DSP techniques belong the Finite
Impulse Response (FIR) filter, Infinite Impulse Response (IIR) filter, convolution
and Fast Fourier Transform. From the mathematical theory concerning these DSP
algorithms implies that they require two basic operations in form of the sum of
products (S = Σaibi). Due to this fact, DSP processors, compared to general-purpose
processors, usually have many specialized arithmetic units that can operate
simultaneously. The key features of DSP processors are:
• Arithmetic unit: To calculate the sum of products, all DSP processors have
hardware multiplier and accumulator so two operations, multiplication and
addition can be completed during one cycle. Some DSP processors can fulfil
simultaneously even one DFFT butterfly.
• Bus architecture: DSP processors have Harvard architecture with the two separate
buses, one for program and the other for data. This bus architecture enables the
DSP processor to read an instruction and data from memory simultaneously.
• Addressing: Hardware supported address generation speeds up the calculation of
address and thus reducing the computational time. Some DSP processors support
binary inverse addressing, which is convenient for the DFFT algorithm.
Whereas the first 16-bit DSP processors operated with the speed of 5 MIPS
(Million Instructions Per Second), at present the combination of high speed and
multiple units increased the performance up to 2400 MIPS.
The most important producers of DSP processors are Texas Instruments,
Motorola, Analog Devices, AT&T and NEC.
DSP processors are used in a wide variety of domains such as telecommunication,
control engineering, space, medicine, etc.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
14
3. TMS320C6000 DSP PLATFORM
This chapter is divided into several sections that gradually describe the
TMS320C62x/TSM320C67x processor, TMS320C6701 EVM board and finally
software development.
3.1 TMS320C6000 DSP PROCESSOR ARCHITECTURE
TMS320C6000 devices are first DSP processors that use an enhancement of
the Very Long Instruction Word (VLIW) architecture, which allows achieve high
performance through instruction level parallelism being the key feature for
increasing the performance.
TMS3206000 devices can be separated in two main categories: fixed point
DSP processors TMS320C62x ('C62x) and floating point DSP processors
TMS320C67x ('C67x). Not only have these two types very similar architecture, but
they are also pin compatible, which means that hardware developers do not have to
make two different board designs to support both, the 'C62x and 'C67x processor.
3.1.1 Key Features of TMS320C62x/TMS320C67x Device
The most important features of 'C62x/C67x DSP processors can be
summarized as follows:
• TMS320C62x/TMS320C67x devices operate at 150, 167, 200 and 250 MHz (6,67
ns, 6 ns, 5 ns, and 4 ns cycle time).
• Peak 1336 MIPS (Million Instructions Per Second) at 167 MHz. 'C67x has peak
performance of 688 MFLOPS (Million Floating Point Operations Per Second) at
167 MHz.
• Advanced VLIW CPU architecture with eight functional units, including two
multipliers and six arithmetic units. Up to eight 32-bit instructions can be
executed every cycle.
• 8/16/32-bit data support.
• Large on-chip RAM of 2x64 kB for program and data.
• 32-bit external memory interface supports external memories.
• Host port access to 'C62x/C67x memory and peripherals.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
15
• Direct memory access.
• Multichannel buffered serial port.
• 32-bit timers.
As can be seen in figure 3.1, the 'C62x/C67x DSP processor consists of three
main parts:
• CPU core: Executes the instructions.
• Memory: 2x64 kB RAM for program and data.
• Peripherals: External memory and host port interface, direct memory access, serial
ports and timers.
Figure 3.1: C62x/C67x block diagram.
3.1.2 Central Processing Unit Core
From the block diagram of the Central Processing Unit (CPU) core shown in
figure 3.2 is obvious that the CPU core is composed of several components, whose
functionality is further described.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
16
Figure 3.2: Central processing unit core.
3.1.2.1Program Control Unit
The task of the program control unit is to retrieve a fetch packet of eight
instructions, dispatch them to appropriate units and finally decode these instructions.
One operation cycle of the program control unit can be described as follows.
• PG phase: CPU generates the address of first instruction in the first fetch packet.
• PS phase: Generated address is sent to the program memory, which can be either
external or internal.
• PW phase: CPU retrieves the fetch packet.
• DP phase: Dispatch unit sends each instruction to its unit. Functional units are
designed only for certain instructions.
• DC phase: Instructions are decoded and executed in functional units.
3.1.2.2Data Paths
Figure 3.3 shows a detail of data paths of the 'C62x DSP processor. As can be
observed two data paths, denoted A and B are presented. Each data path contains a
register file of sixteen 32-bit registers and four functional units. Moreover, data paths
include one control register file, which can be accessed only from functional
unit .S2.
3.1.2.2.1General Purpose Register File
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
17
There are two groups of general-purpose register files in the 'C62x/C67x
device, each containing sixteen 32-bit registers, which are read from, or written to by
functional units. 32-bit and 40-bit
fixed-point data are supported by the
registers. In case of the 'C67x, they
can likewise store 64-bit double
precision floating-point value.
The main function of the two
register files is to store operands for
functional units. As can be seen from
figure 3.3, register file A can be read
from, or written to by functional units
.L1, .S1, .M1 and .D1. Similarly,
register file B can be accessed by
functional units .L2, .S2, .M2 and .D2.
Data cross paths denoted in figure 3.3
as 1x and 2x enables functional units
to access an operand from the other
side of the CPU.
Paths .ST1, .ST2, .LD1 and
.LD2 serve for data transfer between
register files and memory.
Figure 3.3: Data paths of 'C67x device.
3.1.2.2.2Functional Units
In functional units instructions are finally executed and the results are written
to register files, from where they can be moved into memory. There are in total eight
functional units in both data paths (four units for each data path), each of them
having its own port for read and write, which gives the CPU core the ability of
executing up to eight instructions in one cycle.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
18
As was already mentioned, each functional unit executes a specific set of
operations, so for example, only two multiplication are possible per cycle. Functional
units and their supported operations are summarized in table 3.1.
FUNCTIONAL UNIT FIXED-POINT OPERATIONS FLOATING-POINT
OPERATIONS
.L unit (.L1, .L2) 32/40-bit arithmetic and compare
operations
arithmetic operations
.S unit (.S1, .S2) 32/40 shifts and 32-bit-field
operations
absolute value operations
.M unit (.M1, .M2) 16x16 multiply operations 32x32 bit multiply oper.
.D unit (.D1, .D2) linear and circular address
calculation
load double word with a 5-bit
offset
Table 3.1: Functional units and supported operations.
3.1.2.2.3Control Register File
The 'C62x devices have ten registers for control purposes, while the 'C67x
have thirteen registers. The three extra registers in 'C67x DSP processor are there to
support floating-point operations. Control registers can be accessed only by the
functional unit .S2.
3.1.2.3Interrupts
'C62x/C67x devices allow normal program flow to be interrupted by an event
that comes either from an external peripheral, internal peripheral, or special
instruction in the program.
There are two types of interrupts: non-maskable interrupts (reset and NMI)
and maskable interrupts. Interrupt mechanism is controlled by registers in the control
register file.
3.1.3 Memory
Due to 32-bit wide address bus, the 'C62x/C67x DSP processors have 4 GB
addressable memory space, which is divided into four regions: internal program
memory, internal data memory, internal peripheral and external memory space. Exact
location of each region depends on the memory map used.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
19
The internal 64 kB program memory can be used either to store program or
serve as a cache if the program is in an external memory.
The internal data memory has as well capacity of 64 kB and is used to store
data during program execution.
The external memory connected to the CPU through the External Memory
Interface (EMIF) can be both, synchronous and asynchronous. External memory
extends available storage capacity for program and data.
3.1.4 Peripherals
Peripherals located on the 'C62x/C67x devices include DMA controllers,
multichannel buffered serial ports, timers and interfaces that enable to connect
external memory and external devices such as microprocessors or PCI bridge chips.
3.1.4.1DMA Controller
DMA controller controls data flow between the internal memory and external
memory, host port interface or external peripheral. As the DMA controller performs
data transfer with zero overhead, it can operate together with the CPU independently.
3.1.4.2Multichannel Buffered Serial Ports
Two multichannel buffered serial ports support full-duplex communication at
the maximum speed of 40 Mb/s per channel. This feature allows easily connect
external peripherals such as codec for real-time analog-to-digital and digital-to-
analog conversion.
3.1.4.3Timers
Two 32-bit programmable internal timers are available, each of them being
able to trigger an interrupt. The countdown registers can be clocked internally or
externally.
3.1.4.4Host port interface
The Host Port Interface (HPI) is a parallel interface through which a host
processor can directly access the CPU's entire memory space (internal, external and
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
20
memory mapped peripherals), thus allowing data exchange between the host and
DSP processor.
3.2 TMS320C6701 EVALUATION MODULE
Within the framework of this project, the DSP applications that are described
in chapter 5 were implemented and tested on the TMS320C6701 Evaluation Module
(C6701 EVM). The 'C6701 EVM is a demonstration board with a TMS32067x DSP
processor, that is designed for development and real-time testing of digital signal
processing algorithms. External peripherals such as external memory, codec, PCI
controller are as well located on the 'C6701 EVM to allow easy testing of DSP
algorithms in real-time conditions.
3.2.1 Key Features of TMS320C6701 Evaluation Module
The C6701 EVM has the following features:
• TMS320C67x floating-point digital signal processor.
• Quad clock support of 25 MHz, 33,25 MHz, 100 MHz and 133 MHz.
• Peripheral Component Interconnect (PCI) interface with master/slave support.
• 256 kB of 133 MHz synchronous burst static random-access (SBRAM) memory.
• 8 MB of 100 MHz synchronous dynamic random-access (SDRAM) memory.
• Access to all DSP memory from the PCI bus via the host port interface.
• 16-bit stereo codec.
• Three light emitting diode (LED) indicators.
• Plug and play PCI device.
3.2.2 TMS320C6701 Evaluation Module Hardware Functional Overview
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
21
Figure 3.4 shows a basic functional diagram of the 'C6701 EVM.
Figure 3.4: Functional diagram of the 'C6701 EVM.
From figure 3.4 is evident that the 'C6701 EVM can be divided into following
functional blocks:
• DSP: The TMS320C6701 evaluation module is built around the 'C67x floating-
point digital signal processor. Refer to the section 3.1 for more information about
the 'C67x DSP device.
• DSP clock: The C6701 EVM supports operation with two different on-board
clock sources and two different clock modes (multiply-by-1 and multiply-by-4).
As a result, the DSP can operate at four different clock rates: 25 MHz, 33,25
MHz, 100 MHz and 133 MHz.
• External memory: The C6701 EVM provides one bank of 256 kB of 133 MHz
SBSRAM and 8 MB of 100 MHz SDRAM memory. Additional memory can be
added using the expansion memory interface.
• Audio interface: The C6701 EVM includes a 16-bit stereo codec that supports
sample rates from 5,5 kHz - 48 kHz. The audio codec has two stereo inputs,
microphone and line-level and a stereo line-level output, which are located on the
board's mounting bracket.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
22
• PCI Interface: PCI local bus revision 2.1 compliant interface allows the host
processor to access whole DSP memory and control/status registers on the board.
• JTAG emulation: Allows source debugging from the host processor via the PCI
bus.
• Programmable logic: The C6701 EVM uses a programmable logic to control the
board system such as reset control, dual CPU clock oscillator, PCI controller, DSP
interface control, etc.
• User options: With twelve DIP switches, the user can choose the boot mode, clock
frequency, JTAG mode and memory map.
• LED indicators: The C6701 EVM provides three LED indicators. One LED is
illuminated whenever 5 V is applied to the board. The other two LEDs are user
defined.
More details about the TMS320C6x device and TMS320C6701 Evaluation Module
can be found in bibliography reference [1].
3.3 IMPLEMENTATION OF DSP ALGORITHMS
Basically, two ways of implementing DSP algorithms into the TMS320C6701
EVM exist. First approach, described in section 3.3.1 is to directly write a source
code of the DSP algorithm in a programming language such as assembler or C/C++
and then from the source code make an executable for the 'C67x DSP processor. This
approach may be denoted as a low-level implementation of DSP algorithms.
Digital signal processing algorithms can be likewise implemented using the
Matlab v.6, which together with simulink and real-time workshop supports
executable generation for the TMS320C6701 EVM. This way can be called high-
level implementation of DSP algorithms.
3.3.1 Low-Level Implementation of DSP Algorithms
This section describes the process of implementing DSP algorithms into the
'C67x device using the Code Composer Studio (CCS) environment.
3.3.1.1Software Development Flow
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
23
Typical software development flow consists of steps that can be describe as
follows, see figure 3.5.
Figure 3.5: Software development flow.
1. The C compiler accepts C source code of the DSP algorithm and produces
assembly language source code.
2. The assembly optimizer allows write linear assembly code (assembly code
without register assignment) without being concerned with the registers. The
assembly optimizer assigns registers and turns the linear assembly into highly
parallel assembly code.
3. The assembler translates assembly language source files into machine language
object files based on common object file format (COFF).
4. The linker accepts COFF object files and object libraries as input to create the
executable module that can be run on 'C67x DSP processor.
3.3.1.2Code Composer Studio (CCS)
The CCS environment supports the whole software development flow shown
in figure 3.5 and furthermore introduces optional features such as debugging,
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
24
DSP/BIOS, JTAG interface or real-time data exchange between host processor and
the 'C67x DSP device.
The graphical environment of the CCS is shown in figure 3.6.
Figure 3.6: Graphical interface of the Code Composer Studio.
3.3.1.2.1Application Debugging Features
The code composer studio provides support for following debugging
activities:
• Setting breakpoints
• Graphical display of variables in the DSP processor
• Watching variables
• Viewing and editing memory and control registers
• Using probe point tools to stream data to and from the DSP
• Profiling execution statistics
3.3.1.2.2DSP/BIOS
During an analysis phase of the software development cycle, traditional
debugging features are ineffective for problems that arise during real-time execution.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
25
The CCS DSP/BIOS plug-ins provides means for real-time analysis with minimal
influence on the performance.
Unlike traditional debugging, which is external to the executing program, the
DSP/BIOS features require the program to be linked with certain DSP/BIOS API
modules, whose functions are declared as external and are called from source
program. Since the functions are performed by the host, they have minimal impact on
the real-time performance of the DSP application.
3.3.1.2.3JTAG Emulation
The on-chip emulation enables the CCS to control the program execution and
monitor real-time activity. The communication with this on-chip emulation occurs
via the JTAG link. The chip emulation takes care of the communication between the
host target concerning:
• Starting, stopping and, and resetting the DSP processor
• Examining the registers and memory of the DSP
• Performance profiling
• Real-time data exchange between the host and the DSP device
3.3.1.2.4Real-Time Data Exchange (RTDX)
The real-time data exchange feature allows transfer data between the host and
DSP processor without stopping the target application. Acquired data can be
analyzed and visualized on the host using any Objet Link Embedding (OLE) client
such as the Matlab or the Microsoft Excel.
3.3.2 High Level Implementation of DSP Algorithms
The Matlab ver.6.1 together with Simulink and, Real-Time Workshop and the
Developer's Kit for TI DSP toolbox enables to create an executable from a simulink
model, which can be run on the C6701 EVM. Furthermore, the Matlab provides
means for RTDX, direct building and loading executables into 'C67x DSP
processors.
With this set of tools one can develop and test very complex DSP in real-time
conditions without having to be well acquainted with the architecture of the 'C6701
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
26
EVM. As the executable is based on a simulink model there's no need for writing
source code of the DSP algorithm.
3.3.2.1The C6701 EVM Library Blocks
The Developer's Kit for TI DSP toolbox provides a simulink library of four
blocks that can be used together with standard blocks in a simulink model, see figure
3.7.
Figure 3.7: C6701 EVM simulink library blocks.
• C6701 ADC Block: Adding this block to a simulink model enables the DSP
application to access the input signals form an external sources. This real signals
can be used to drive and test the DSP algorithms implemented in the simulink
model.
• C6701 DAC Block: This block sends digital data from a simulink model to the
D/A converter and then to the external output connectors of the 'C6701 EVM.
• C6701 LED Block: There are two LEDs on the EVM, one internal (placed
directly on the board) and the other external (on the back of the board). This
functionality can be used to indicate that the algorithm has completed a specific
calculation or reached a certain point in the processing.
• C6701 Reset Block: This is used to reset the 'C6701 and reload the DSP processor
with executable directly from the simulink model window.
3.3.2.2Generation of Executable File
The Real-Time Workshop (RTW), which takes care of the build process
accepts a simulink model as the input and converts it into an executable file that can
be run on the 'C6701 EVM.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
27
The simulink model is required to contain at least two library blocks
representing the input and the output of the C6701 EVM. This blocks are part of the
Developer's Kit for TI DSP toolbox together with other two simulink blocks, see
section 3.3.2.1.
Figure 3.8: Example of simulink model designed for executable generation.
In figure 3.8 is shown an example of simulink model that puts signal from
LineIn connector to the Out connector of the 'C6701 EVM.
The build process of executable is controlled by three files (system target file,
template makefile and make command), which are likewise included in the
Developer's Kit for TI DSP toolbox. As shown in figure 3.9, the build process
consists of several steps.
1. Analysis of the model: The build process begins with this step. During this phase,
the Real-Time Workshop reads the simulink model file ( model.mdl ), that has
been created in the Simulink and creates an intermediate representation of the
model. This description is then stored in a target independent format. The output
of this step is a file called model.rtw.
2. Generation of code by the Target Language Compiler - The Target Language
Compiler converts the model.rtw file into a target specific code. The output of the
Target Language Compiler is a target specific source code version of the simulink
model.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
28
3. Creation of the executable - in this final step, the build process invokes the make
command which in turn runs the compiler to compile the source files. After
successful compilation, the compiled files, libraries and real-time interface are
linked into one executable file.
Figure 3.9: Build process of the executable.
Appendix A1 contains an example that shows step by step the process of
executable generation from a simulink model.
3.3.2.3Execution of Executable Generated by Real-Time
Workshop
The Real-Time Workshop generates a model code based on corresponding
simulink model. It also generates a run-time interface that executes the model code.
The run-time interface and the model code are compiled and linked to create an
executable. Figure 3.10 shows a high level object oriented view of the executable.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
29
Figure 3.10: High-level object oriented view of the executable.
3.3.3 Comparison of Low- and High-Level Implementation Approach
With both approaches, low-level and high-level, DSP algorithms can be
implemented into a DSP processor. Yet, before choosing one of the mentioned
programming technique, the complexity and speed requirements of the DSP
algorithms should be considered.
If the speed is the main goal, then the low-level way of implementing should
be used, since it allows the programmer to highly optimize the code, or even to write
critical parts of the code in assembly to further increase the speed.
Although in the Matlab v.6 the optimization process can not be so advanced,
high-level implementation approach allows to implement very complex DSP
algorithms without the need of profound knowledge of the architecture of DSP
processor. On the other hand the present the Matlab v.6 supports executable
generation only for the TMS320C6701 EVM, which means that high-level
implementation of DSP algorithms can not be used for other DSP modules.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
30
4. MATHEMATICAL BACKGROUND OF
IMPLEMENTED DSP ALGORITHMS
This chapter contains mathematical theory concerning four DSP algorithms
that were implemented within the framework of this project.
4.1 FINITE IMPULSE RESPONSE (FIR) DIGITAL FILTER
Although the FIR filter requires higher order to achieve the same performance
as infinite impulse response filter, it is widely used due to its ability of providing
linear phase characteristic, that neither the analog nor the infinite impulse response
filter can achieve.
4.1.1 Properties of FIR filter
An FIR filter of order N can be defined by equation 4.1.
where x(n) - input
bk - coefficients of the filter
y(n) - output
In the equation 4.2 the input signal is replaced by the Dirac impulse δ(n), that
is defined as:
As b0 = h(0), b1 = h(1), ..., bN-1 = n(N-1), it is obvious that the coefficients
equal to the impulse response of FIR filter.
The frequency response of FIR filter can be determined by taking the z-
transform of h(n):
( ) ( )
)1.4(
1
0∑
−
=
−=N
kk knxbny
( ) ( ) ( )
)2.4(
1
0∑
−
=
=−=N
kk nhknbny δ
( )( )
)3.4(;0;1
knknknkn
≠=−==−
δδ
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
31
To find the frequency response parameter z is in equation 4.4 replaced by
expression ejωT, where T is sampling frequency. Therefore for T = 1s we can obtain:
Since e-j2πn=1, then:
Equation 4.6 proves that the frequency response of an FIR filter is periodical
with period 2π (T = 1 s), see figure 4.1.
Figure 4.1: Periodical transfer function of FIR filter.
4.1.2 Coefficients Calculation by means of Window Method
The ability of FIR filter to achieve a frequency response "identical" to the
specified one depends mainly on the method that was used to calculate its
coefficients. Among the most common method belong window, frequency sampling,
and optimal equiripple method. In this project, the FIR filter was designed with the
windows method.
As was proved, the frequency response of FIR filter is a periodical function.
Hence, we can apply the Fourier series to obtain the coefficients of FIR filter, see
equation 4.7.
( ) ( )
)4.4(0∑
∞
=
−=n
nznhzH
( )
)5.4(
)(0∑
∞
=
−=n
njj enheH ωω
( )( ) ( ) ( ) ( )( )( ) ( )
)6.4(
200
22
ωπω
ωπωπω
jjn
nj
n
jnj
eHeH
enhenheH
=
==
+
∞
=
−∞
=
+−+ ∑∑
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
32
where ωs is the sampling frequency. The frequency response of ideal low-
pass filter is:
where ωc is the cut-off frequency.
From equation 4.7 and 4.8 implies that the coefficients of ideal low-pass FIR
filter can be calculated as follows:
The coefficients obtained from equation 4.9 are shown in figure 4.2.
Figure 4.2: Coefficients of ideal FIR filter.
From equation 4.9 and figure 4.2 implies that the length of ideal FIR filter is
infinite and non-causal. To avoid this problem the impulse response must be shifted
and truncated with a window function, which consequently introduces overshoots
and ripples. This is known as the Gibbs phenomenon. In order to reduce overshoots
( ) ( )
( ) ( )
)7.4(
1;1
1;21
2
2
≠=
==
∫
∫
−
−
TdeeHnTh
TdeeHnh
s
s
nTjTj
s
njj
ω
ω
ωω
π
π
ωω
ωω
ωπ
( )( )
)8.4(2
;0
;1
sc
Tj
cTj
eH
eHωωω
ωω
ω
ω
≤<=
≤=
( ) ( )
)9.4(
,...2,1,0,sin211∫
−
±±==⋅=c
c
nnTcffdenTh c
s
cnTj
s
ω
ω
ω ωωω
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
33
and ripples many window have been investigated. Among the most common
windows belong the Rectangular, Hanning, Hammning and Kaiser window.
If the window's sequence is denoted as w(nT), then the final form for
coefficients calculation is:
4.2 INFINITE IMPULSE RESPONSE (IIR) DIGITAL FILTER
Infinite impulse response filters are computationally more efficient than FIR
filters, since they require fewer coefficients due to the fact that they use feedback or
poles. However, this feedback can result in the filter being unstable if the coefficients
deviate from their values. Furthermore, the phase characteristic of IIR filter is not
linear.
4.2.1 IIR Filter Implementation
The general form of the IIR filter can be expressed as follows:
where ak and bk are the coefficients of IIR filter that fully describe its
properties.
4.2.2 Coefficients Calculation using Bilinear Transform Method
The bilinear transform method, which was used in this project to design a
low-pass IIR filter is based on analog filter design. Other known methods are the
pole-zero placement approach and impulse invariant method.
From the known equation z = ejωT = epT, the following relationship between
the s and z transform can be established:
( ) ( )
( ))10.4(
1,0;0
1,0;2
1
'
'
−∉=
−∈⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠⎞
⎜⎝⎛ −−=
NnnTh
NnnTwTNnhnTh
( )
)11.4(
1...1...
0
02
21
1
22
110
∑
∑
=
−
=
−
−−−
−−−
+=
++++++++= M
k
kk
N
k
kk
MM
NN
za
zb
zazazazbzbzbbzH
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
34
The mapping from s-plane to the z-plane introduces non-linearity between the
analogue and digital frequencies. Therefore, it is necessary to adjust the cut-off
frequency ωc of analog filter prototype according to the desired cut-off frequency ωp
using the equation 4.13.
where T is the sampling period.
When deriving a digital filter with the bilinear transform the following
procedure can be used.
1. Specify the normalized analog filter.
2. Determine the cut-off frequency ωp of the digital filter and using equation 4.13
find its equivalent analog cut-off frequency ωc. This step is known as pre-
warping.
3. De-normalize the analogue filter by ωc. This can be done by replacing s by s/ωc.
4. Finally, using equation 4.12 apply the bilinear transform to the filter obtained in
step 3.
Theory and design procedure concerning FIR and IIR digital filters are described in
details in bibliography reference [2].
4.3 ADAPTIVE FILTERS
Adaptive filters differ from other filters such as the FIR or IIR filter in the
sense that the filter coefficients are not fixed but they are calculated real-time by an
adaptive algorithm.
4.3.1 Structure of Adaptive Filter
Figure 4.3 shows a basic block diagram of adaptive filter.
)12.4(
112ln1
+−≅=
zz
Tz
Ts
s
)13.4(2
tan2⎟⎟⎠
⎞⎜⎜⎝
⎛=
TT
p
sc
ωω
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
35
Figure 4.3: Block diagram of adaptive filter.
From figure 4.3 is obvious that the adaptive filter consists of a FIR or IIR
filter whose coefficients are calculated real-time by adaptive algorithm to provide the
desired performance.
4.3.2 Least Mean Square (LMS) Adaptive Filter
From figure 4.3, following equations can be written.
The basic premise of the LMS algorithm is the use of the steepest descent
algorithm. The coefficients of the FIR filter can be determined as follows.
where β is a positive value known as the step size parameter and ∆n,k is a
gradient vector that makes FIR filter coefficients approach their optimal values. It
can been proved that:
Finally,
( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
)14.4(
1
0
1
0
knxkhndnyndne
knxkhny
N
k
N
k
−−=−=
−=
∑
∑−
=
−
=
( ) ( ))15.4(
,1 knnn khkh ∆+= − β
( ) ( ))16.4(
, knxnekn −=∆
( ) ( ) ( ) ( ))17.4(
1 knxnekhkh nn −+= − β
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
36
4.4 FAST FOURIER TRANSFORM
The Discrete Fourier Transform (DFT) is used to produce frequency analysis
of discrete non-periodic signals. The Fast Fourier Transform (FFT) is another way to
achieve the same result, but with less overhead involved in the calculations.
4.4.1 Calculation Cost of DFT
From equation 4.18 of the DFT, where Wn is called the twiddle factor
implies that the computational cost of N-point DFT requires N2 complex
multiplication and N(N-1) complex additions. In this case a simple eight-sample-
signal would require 64 complex multiplication and 56 complex additions. But a
signal of 1024 samples would require much more computational cost, concretely
20,000,000 complex operations. The FFT is therefore used to decrease the
computational cost. There are several algorithms, but the best know are the two
Radix 2 methods:
• Decimation In Time (DIT)
• Decimation in Frequency (DIF)
4.4.2 Mathematical Background of FFT - DIT Algorithm
If we assume to have a signal whose number of samples is an integer power
of 2 (N = 2v), then we can separate the original sum (equation 4.18) into two sums.
One sum for even samples and the other for odd samples, see equation 4.19.
Equation 4.20 can be obtained by denoting n = 2r for n even and n = 2r+1 for
n odd.
( ) ( )
)18.4(
; /21
0
NjN
N
n
nkN eWWnxkX π−
−
=
==∑
( ) ( ) ( )
)19.4(
1)2/(1)2/(
∑∑−
=
−
=
+=N
oddn
nkN
N
evenn
nkN WnxWnxkX
( ) ( ) ( )
)20.4(
1221)2/(
0
)12(1)2/(
0
2 ∑∑−
=
+−
=
++=N
r
krN
N
r
rkN WrxWrxkX
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
37
As for the twiddle factor applies that
we can rewrite the equation 4.20 into the following equation 4.22.
It's obvious that the original equation for the DFT has been split into two
halves, where the first sum represent a N/2-point DFT of even samples and the
second sum is a N/2 - point DFT of odd samples.
Now, let's calculate the computational cost of each form for 8 samples (N =
8): the original form produces N2 multiplication -> 82 = 64 multiplication, however,
equation 4.22 requires only 2(N/2)2 + N multiplication to calculate the same result ->
2(8/2)2+8=40 multiplication.
In order to further develop this concept it's convenient to adopt a graphic
approach based on the signal flow chart. Two basics DSP operations are addition and
multiplication, see figure 4.4.
Figure 4.4: Two basic DSP operation.
Using the signal flow chart, equation 4.22 can be display as shown in figure
4.5.
)21.4(; 2/
22/
)2//(2/222 rkN
kN
kN
rkNN
NjNjN WWWWWeeW ==== −− ππ
( )
)22.4(
)12()2(1)2/(
02/
1)2/(
02/ ∑∑
−
=
−
=
++=N
r
rkN
kN
N
r
rkN WrxWWrxkX
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
38
Figure 4.5: DSP flowchart display of equation 4.22.
In equation 4.23
G(k) is a N/2-point DFT for even samples and H(k) is a N/2-point DFT for odd
samples. As both, the G(k) and H(k) can be further break up into additional N/4
DFTs, the original 8-point DFT can be viewed as a combination of results of four 2-
point DFTs, see figure 4.6.
Figure 4.6: 8-point DFT expressed with four 2-point DFT.
The expression for the 2-point DFT is
( )
)23.4(
)()( ∑+=k
N
kHWkGkX
)24.4(
)()()(1
0
2/21
02 ∑∑
=
−
=
==n
nkj
n
nk enxWnxkX π
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
39
For k = 0, 1 we obtain
Figure 4.7 shows the 2-point DFT in the signal flow chart.
Figure 4.7: FFT butterfly topology.
The topology in figure 4.7 is referred as the FFT butterfly. If the 2-point
DFTs in figure 4.6 are replaced with FFT butterflies, we obtain a complete 8-point
FFT with decimation in time, see figure 4.8.
Figure 4.8: Complete 8-point FFT.
4.4.3 Computational Cost of FFT with Decimation in Time
If N denotes the number of samples to process, then A = log2(N) is the
number of columns in the signal flow chart. For example, there are A = log2(8) = 3
columns for 8-point DFT, see figure 4.8. In each column there are B = (N/2)
)25.4()1()0()1()0()1(
)1()0()0(2/12 xxexxX
xxXj −=+=
+=− π
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
40
butterflies. As one butterfly requires C = 2 multiplication, it's clear that the total
number of multiplication for N-point FFT is:
A⋅B⋅C = log2(N)⋅(N/2)⋅2 = N⋅log2(N)
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
41
5. IMPLEMENTATION
The main objective of the DSP part was to create several examples that would
allow students to become familiar with the 'C6701 EVM. The examples should as
well outline different ways of DSP algorithms implementation. For this reason the
examples are grouped into two categories.
First part describes the process of analog-to-digital and digital-to-analog
conversion specific to the 'C6701 EVM, whereas the second part shows
implementation of four most common DSP algorithms: FIR filter, IIR filter, LMS
adaptive filter and Fast Fourier Transform.
5.1 CODEC
This part, which is more focused on practical aspects of software
development for the 'C6701 EVM contains three examples of controlling and setting
properties of the 16-bit stereo codec.
These examples show the process of converting input analog signal into
digital samples that are processed by the 'C67x DSP and further demonstrate how the
digital samples from the DSP processor are transformed into output analog signal.
Since most of DSP algorithms operate on samples acquired from analog signal it is
important to well understand the basics concerning analog-to-digital and digital-to-
analog conversion.
5.1.1 Loopback Example
Within the framework of this example, it is described the way in which a
signal is modified by changing codec's parameters (gain, attenuation, sample
frequency). See figure 5.1.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
42
Figure 5.1: Loopback example.
Two stereo signals from the Mic and LineIn connectors serve as input signals.
Before entering the ADCs, the Mic signal goes through a gain controlled by the
variable MicGain. After analog-to-digital conversion, the samples go through the
Loopback block into the DACs, where analog signal is reconstructed.
The other signal from the LineIn connector goes through a mixer. The mixer's
output is then summed with the output of the DACs and the result signal is lead into
the Out connector of the 'C6701 EVM board. The sampling frequencies of the ADCs
and DACs are the same.
• Mic slider (variable MicGain): Controls the gain before ADCs.
• Sample slider (variable SelSmpFreq): Changes the sampling frequency of ADCs
and DACs.
• Loop slider (variable LoopBackAtten): Attenuation of the loopback block.
• LineIn slider (variable LineInGain): Gain of the analog mixer.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
43
5.1.2 InAndOut example
Figure 5.2: InAndOut example.
Unlike the Loopback example, where sampled signal does not go into the
DSP, the InAndOut example demonstrates data transfer between the DSP and an
external source or sink such as a signal generator or oscilloscope). An input signal is
taken from the LineIn connector and is then lead through the ADCs and the codec's
audio data serial port into the DSP's Multichannel Buffered Serial Port (McBSP).
Once a complete sample has been received, the McBSP triggers an interrupt, which
tells the DSP that data are ready to be read from the MCBSP0_DRR register. In
response to the interrupt, the service routine called MyISR() reads the sample and
writes the same value into the McBSP's transmit register MCBSP0_DXR. As soon as
the value has been written , the McBSP starts transmitting this sample to the codec's
audio data serial port, which passes received bits to the DACs, that change them into
analog signal.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
44
A functionality of this example can be tested in two ways:
1. With a breakpoint in the MyISR() function, you can view the input signal, that is
received by the DSP. In this case the application is not real-time since the
algorithm is interrupted each time the breakpoint is reached.
2. Without the breakpoint, the output signal should correspond to the input signal,
provided the sampling theorem is fulfilled (the sampling frequency is set to
5510Hz).
5.1.3 Generator Example
Figure 5.3: Generator example.
The EVM board can also be programmed to generate basic signals, e.g.
sinusoidal, triangular or square signal. This example shows a practical
implementation of generating such signals and at the same time it allows to change
the amplitude and frequency of the signal. Whereas the algorithms for generation of
triangular and square signal were derived from their geometry, for sinusoidal signal
generation a filtering approach has been used, that is based on following transfer
function:
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
45
where R = 1 for a pure sinusoidal signal
fo - desired frequency
fs - sampling frequency
A property of this transfer function is a sinusoidal impulse response. In other
words, the impulse response of such digital filter is a sinusoidal signal of a frequency
fo and the amplitude equal to 1. This can be verified, e.g. in the Matlab.
• Amplitude slider (variable A): Amplitude of the signal, in the units of a 16-bit
signed integer (max. 32767).
• Frequency slider (variable fo): Frequency fo of the signal.
• Signal slider: Sinusoidal, triangular or square signal.
5.2 DSP ALGORITHMS
Four DSP algorithms have been used to introduce different approaches of
implementing DSP algorithms into signal processors. Refer to chapter 4 for
mathematical theory concerning these algorithms and section 3.3 for description of
low-level and high-level implementation of DSP algorithms.
5.2.1 Examples of Low-Level Implementation of DSP Algorithms
Two DSP algorithms, LMS adaptive filter and Fast Fourier Transform were
used to practically show the concept of low-level implementation.
5.2.1.1Least Mean Square Adaptive Filter Example
This application implements the least mean square adaptive filter algorithm,
adjusting coefficients of a finite impulse response filter in such way that the output of
the algorithm traces the desired signal. Mathematical theory regarding LMS adaptive
filter can be found in section 4.3.2.
)1.5(
2;cos21
sin)( 221
1
s
oo
o
o
ff
zRzRzRzH πω
ωω =
+−= −−
−
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
46
Sampling frequency is set to 16 kHz and the order of the FIR filter is 8. The
desired signal is identical with the input signal that is taken from LineIn connector of
the 'C6701 EVM. Out connector provides the output signal.
After successful start of the application, we can see on oscilloscope how the
output signal tries to trace the input signal, which is the result of LMS algorithm
adjusting the taps of the FIR filter.
5.2.1.2Fast Fourier Transform Example
Figure 5.4 shows the result of the Fast Fourier Transform algorithm, whose
mathematical background is described in section 4.4.
Figure 5.4: FFT example.
Sampling frequency is initially set to 44,1 kHz, but can be adjusted with the
Sample slider to a value between 5,5 kHz and 48 kHz. The algorithm executes 256-
point FFT algorithm with decimation in time upon a signal acquired from LineIn
connector of the TMS320C6701 EVM. The magnitude of frequency spectrum of the
input signal is displayed in a graph.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
47
To further increase the speed of the algorithm, the twiddle factor is computed
off-line and during algorithm execution the values of twiddle factor are read from a
look-up table.
• Sample slider (variable ActSmpFreq): Changes the sampling frequency of the
ADCs.
5.2.2 Examples of High-Level Implementation of DSP Algorithms
As mentioned earlier, the Matlab v.6 enables to create an executable for the
'C6701 EVM from a simulink model. This feature of the Matlab provides a
completely new approach of implementing and real-time testing of DSP algorithms.
In this way two DSP filters, FIR and IIR were implemented into the 'C6701 EVM.
For more details concerning the problematic of high-level implementation, refer to
section 3.3.2.
5.2.2.1Finite Impulse Response Digital Filter Example
In order to simplify the procedure of executable generation from a simulink
model, a Matlab Graphical User Interface (GUI) application was created that allows
a design of FIR filter and its implementation into the TMS320C6701 EVM.
Figure 5.5: FIR graphical user interface.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
48
As can be seen from figure 5.5, the order of the filter is set to 17 and the cut-
off frequency can vary between 500 and 3500 Hz. To design FIR filter with window
method three windows are available: the Rectangular, Hamming and Kaiser (β = 14)
window. Further options are as follows:
• Transfer functions for defined window: This option draws magnitude of transfer
function for all defined windows.
• Transfer function for selected window: This option draws magnitude of transfer
function for chosen window.
• Simulink model of FIR filter for select. window: This option creates a simulink
model of FIR filter from given parameters. At this point, the process of executable
generation can be started.
• Info: Provides a short description of the GUI application.
• Exit: Exits the application.
Ones the generated executable has been successfully loaded into the DSP
processor, the FIR filter can be tested by connecting input signal to LineIn connector.
Out connector provides the output signal.
5.2.2.2Infinite Impulse Response Digital Filter Example
Similarly to the FIR filter example, this application is likewise controlled by
means of a GUI application. Figure 5.6 shows the GUI application through which an
IIR filter can be implemented into the TMS320C6701 EVM.
Filter's order is set to 7 and the sampling frequency is 8 kHz. The cut-off
frequency can be set to a value between 500 Hz and 3500 Hz. As the bilinear
transform method is used by the application to design the filter, user can choose
between the Butterworth and Elliptic analog filter prototype. Another options of the
applications are:
• Transfer functions for def. filter prototypes: Displays transfer functions of IIR
filters that are based on the Butterworth and Elliptic analog filter prototype.
• Transfer function for sel. filter prototype: Displays the transfer function of IIR
filter based on the chosen analog prototype.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
49
• Simulink model of IIR filter for sel. prototype: Creates a simulink model
according to the defined IIR filter, which will be further used for executable
generation.
• Info: Shows a short description of the GUI application.
• Exit: Exits the application.
Figure 5.6: IIR graphical user interface.
Once the generated executable has been successfully loaded into the DSP
processor, the IIR filter can be tested by applying a signal to LineIn connector. Out
connector provides the output signal.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
50
6. SUMMARY OF DSP PART
The DSP part of the project describes different approaches to DSP algorithms
implementation into signal processors TMS320C6x whose description is likewise
included in this project.
Introduction to the digital signal processing domain is followed by a chapter
where the TMS320C6x DSP processor and the TMS320C6701 EVM are described.
The TMS3320C6701 EVM was used to practically introduce the concept of low- and
high-level DSP algorithms implementation. Within the framework of the DSP part,
four common DSP algorithms were implemented to show the difference between
classical programming approach and Matlab supported high-level design of DSP
algorithms.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
51
7. INTRODUCTION TO PCI PART
As the computer systems demand high bandwidth architecture, that allows to
implement modern devices such as high resolution graphics, network controller, etc.,
there's growing need for effective interconnects, that also enable devices to be
changed or upgraded with a minimum of effort.
The Peripheral Component Interconnect (PCI) bus, that is the subject of this
report, meets most of the requirements that are imposed by high performance
computer systems. Due to advanced features mentioned in following sections, the
PCI bus is nowadays the most frequently used bus in computer systems.
The aim of the PCI part is to design a PCI based device that would introduce
elementary features of the PCI specification.
The PCI part is divided into four chapters:
Chapter 8: Explores the PCI specification and its main characteristics.
Chapter 9: Introduces the PCI 9050 bus interface chip, which was used in
conjunction with the PLX PCI 9050 Reference Deign Kit to design a PCI device.
Chapter 10: Describes the designed PCI device. Timing diagrams of the
implemented communication protocol are included together with the description of
the software application.
Chapter 11: Contains a summary of the PCI part.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
52
8. PERIPHERAL COMPONENT
INTERCONNECT (PCI) BUS
This chapter describes basics concerning the PCI bus and summarizes its
main features. Furthermore, examples of PCI read and write are included to
practically show the bus protocol. Finally, there's an introduction into the
problematic of the plug and play configuration mechanism.
8.1 INTRODUCTION TO COMPUTER BUSES
As shown in figure 8.1, a computer bus represents a set of parallel lanes to
which several peripherals boards can be attached with the processor at one end.
Figure 8.1: Computer bus.
According to their purpose, buses can be divided into three main categories:
• Address bus: Values on address bus specifies which PCI bus segment, peripheral
and register is being accessed.
• Data bus: The information that is being conveyed.
• Control bus: Controls data transfer operation with a set of rules that is called bus
protocol.
Together with these signals, others can be presented in order to implement
advanced features such as interrupts, DMA or power distribution.
8.1.1 Division of Computer Buses
Computer buses can be divided into following categories.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
53
8.1.1.1Timing
• Synchronous buses: All operations occur on a specified edge of the master clock
signal.
• Asynchronous buses: Operations are driven by signals on control buses without
regard to the master clock signal.
8.1.1.2Architecture
• Non-multiplexed buses: Have separate lanes for address and data.
• Multiplexed buses: As address and data share the same lanes, an address phase is
followed by one or more data phases, which are mutually identified with control
signals.
Further, computer buses can be classified according the number of address
lanes (8, 16, 32, 64), data lanes (1,8,16,32,64), transfer rate, maximum length, or the
number of devices that can be connected to the bus at the same time.
8.1.2 Computer Buses before PCI
The first widely used computer bus was the Industry Standard Architecture
(ISA) bus, which is with its maximum transfer rate of 8 MB/s and 16 MB address
space inadequate for today's computer systems.
Following computer bus denoted as the VESA Local bus increased the data
transfer up to 132 MB/s due to 32-bit wide bus operating at 33 MHz. As the bus was
attached to the processor's local bus directly or though a bus buffer, it was a
processor specific (486 CPU) and with the arrival of the Pentium it was no more
relevant.
8.2 INTRODUCTION TO PCI BUS
The Peripheral Component Interconnect bus is described by a set of
specifications that are maintained by the PCI Special Group Interest (PCI SIG,
www.pcisig.com). This organization provides all necessary information regarding
PCI bus and its implementation.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
54
There are three revisions of the PCI specification available from PCI SIG:
revision 1, revision 2.0, revision 2.1 and revision 2.2 released in 1999.
8.3 KEY FEATURES OF PCI BUS
The main features of PCI bus can be summarized as follows.
• Multiplexed, synchronous 32-bit (64-bit) wide bus operating on 33 MHz (PCI
revision 2.2).
• The maximum theoretical transfer rate is 132 MB/s. Currently defined revision
2.2 can move data with speed of 528 MB/s and the most resent PCI-X with the
speed up to 1 GB/s.
• Any device on the PCI bus with master capabilities can initiate data transfer with
other devices.
• Blocks of data can be moved.
• PCI implements the plug and play configuration. Every device in a system is
automatically configured each time the system is turned on.
• PCI is a 'green architecture' supporting both 3,3 and 5 V signaling environment.
8.4 PCI SIGNALS
Concerning computer buses a number of frequently used terms exist, from
which the most important are:
• Agent: A device that operates on a computer bus.
• Master: An agent that is capable of initiating a data transfer.
• Transaction: A data transfer consisting of one address phase followed by one or
more data phases, known as the burst transfer.
• Initiator: A master that wanted to access the bus and was granted by the central
arbiter to do so.
• Target: An agent that recognized its address during the address phase. The target
responds to the transaction initiated by the initiator.
PCI bus has in total 98 signals, see figure 8.2.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
55
Figure 8.2: PCI bus diagram.
47 signals, resp. 49 signals is the minimum number of signals that is required
by PCI specification to successfully implement a target, resp. master device. Rest of
the signals is not required and serves for optional features such as 64-bit transfer,
JTAG interface, etc.
PCI signals in figure 8.2 can be divided according to their purpose into
several categories: address and data, control, error reporting, arbitration, system, 64-
bit extension, interrupt and JTAG interface signals.
Note: In following sections, a # sign at the end of a signal name means that
the signal is asserted or active in low-level voltage state.
8.4.1 System Signals
• CLK: Provides timing for all PCI transactions.
• RST#: Resets the device by setting its registers to initial states.
8.4.2 Address and Data Signals
• AD[31::0]: Multiplexed address and data. During address phase, they convey
address, whereas during data phases they convey data.
• C/BE#[3::0]: Multiplexed bus command and byte enables. During address phase
they convey the bus command, whereas during data phases they convey byte
enable information.
• PAR: Even parity across AD[31::0] and C/BE#[3::0] signals.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
56
8.4.3 Interface Control Signals
• FRAME#: Master indicates the beginning and duration of a PCI transaction.
• IRDY#: Initiator Ready indicates that the initiator is ready to read or write data.
• TRDY#: Target Ready indicates that the target is ready to write or read data.
• STOP#: Selected target requests the master to terminate the current transaction.
• LOCK#: Multiple transactions are required to complete the transfer operation.
• IDSEL: Initialization Device Select is a chip select used during configuration
transaction.
• DEVSEL#: Device Select indicates that the target recognizes itself as the target
of the current transaction.
8.4.4 Arbitration
• REQ#: Master indicates with this signal to the central arbiter that it desires to use
the bus.
• GNT#: Central arbiter grants the bus to the master.
8.4.5 Error Reporting
• PERR#: Data parity error during all PCI transactions.
• SERR#: Address parity error, or any other serious system error.
8.4.6 Interrupt Signals
INTA# through INTD# are used by the device to indicate to its driver an
event.
8.4.7 64-bit extension
Device uses this group of signals for 64-bit transactions, which can be
executed only if both, the initiator and target support 64-bit transactions.
8.4.8 JTAG Signals
These signals provide means for testing PCI devices.
Other signals not mentioned through sections 8.4.1 – 8.4.8 can be localized
on the PCI interface in order to support advanced features such as power
management, or 3,3 V signaling environment.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
57
8.5 ARBITRATION
PCI is a multi-master bus, which means that any agent that has master
capabilities can act as a bus master and therefore execute data transfer across the PCI
bus.
To become the bus muster, an agent must be before granted by the central
arbiter. For this purpose, two signals REQ# and GNT# are presented. With REQ#
signal an agent indicates to the central arbiter that it desires to use the bus. The
central arbiter then gives the permission to the master to use the bus by asserting its
GNT# signal. Provided the bus is idle - both signals FRAME# and IRDY# are de-
asserted - the master can start a PCI transaction.
8.5.1 BUS Parking
The PCI specification introduces the notion of "bus parking". This option
allows one master to start a transaction without first asking for the bus access with
the REQ# signal, because idle PCI bus has been before "parked" on this agent.
Although any master can become the default master, it is recommended that the last
master that acquired the bus has the GNT# asserted.
8.6 BUS PROTOCOL
A bus protocol is a set of rules that define how data are moved between the
initiator and target by specifying timing for address, data and control signals. This
chapter explains the bus protocol of PCI transactions.
8.6.1 PCI Bus Command
PCI is a multiplexed bus. Two different phases therefore exist within one
transaction: one address phase followed by one or more data phases. During the
address phase, C/BE# lanes convey information about the type of current transaction,
called the bus command. All PCI bus commands are listed in table 8.1.
Although, a PCI device is not obliged to support all types of PCI transaction,
it is required to respond to configuration read and configuration write, so the PCI
BIOS configuration software can access it during boot-up of the system.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
58
C/BE#3 C/BE#2 C/BE#1 C/BE#0 COMMAND TYPE
0 0 0 0 Interrupt Acknowledge
0 0 0 1 Special Cycle
0 0 1 0 I/O Read
0 0 1 1 I/O Write
0 1 0 0 Reserved
0 1 0 1 Reserved
0 1 1 0 Memory Read
0 1 1 1 Memory Write
1 0 0 0 Reserved
1 0 0 1 Reserved
1 0 1 0 Configuration Read
1 0 1 1 Configuration Write
1 1 0 0 Memory Read Multiple
1 1 0 1 Dual-Address Cycle
1 1 1 0 Memory Read Line
1 1 1 1 Memory Write and Invalidate
Table 8.1: PCI Bus commands.
Read and write operations can be executed upon three address spaces:
memory, I/O and configuration space. Configuration space is used only at boot-up
time to configure all PCI devices. Memory differs from I/O space by being pre-
fetchable, which means that multiple reads from memory give the same results.
Further, there are additional bus commands.
• Memory read line, memory read multiple, memory write and invalidate: PCI
transactions that are optimized for cache reads and writes.
• Interrupt acknowledge: System interrupt controller reads corresponding vector
from the target.
• Special cycle: Message broadcast. All the PCI devices that allow special cycle
transaction receive this message.
• Dual address cycle: With transaction, 32-bit agents can access 64-bit address
space.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
59
8.6.2 Byte Enable
As 8-, 16- and 32-bit data can be moved between the initiator and target, the
initiator has to indicate which byte lanes convey data. For this purpose serve C/BE#
signals, that specify during each data phase which bytes are valid.
Though only some or none byte may be enabled, the agent driving the AD
bus is required to drive all 32-bit AD bus to their stable values.
8.6.3 Basic PCI Transactions
This section explains basic read/write data transfer between the initiator and
target.
8.6.3.1Read Transaction
Figure 8.3 shows data transfer from the target towards the initiator.
Figure 8.3: PCI read transaction.
This transaction consists of following steps:
1. The bus is idle and most signals are tri-stated. The master for following
transaction has received its GNT# and detected that the bus is idle.
2. Address phase: The master drives the FRAME# low and places the address of the
target on the AD bus and the bus command.
3. The master asserts appropriate lanes of C/BE# signals and also asserts IRDY# to
indicate that it is ready to accept data from the target. The target that recognizes
its address on the AD bus asserts DEVSEL#. This is also a turnaround cycle (one
wait state between the address phase and first data phase), because in read
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
60
transaction the master drives the AD lines during the address phase and the target
drives it during the data phase.
4. The target places data on the AD bus and asserts TRDY#. The master latches the
data on the rising edge of clock 4. Data transfer takes place on any clock cycle
during which both IRDY# and TRDY# are asserted.
5. The target de-asserts TRDY# indicating that the next data element is not ready to
transfer. Nevertheless, the target is requires to continue driving the AD bus. This
is a wait cycle.
6. The target has placed the next data item on the AD bus and asserted TRDY#. Both
IRDY# and TRDY# are asserted so the master latches the data bus.
7. The master has de-asserted IRDY# indicating that is not ready for the next data
element. This is another wait cycle.
8. The master has re-asserted IRDY# and de-asserted FRAME# to indicate that this
is the last data transfer. In response the target de-asserts AD, TRDY# and
DEVSEL#. The master de-asserts C/BE# and IRDY#. This is master-initiated
termination.
8.6.3.2Write Transaction
Figure 8.4 shows details of a typical write transaction, where data move from
the initiator to the target.
Figure 8.4: PCI write transaction.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
61
As can be seen, the main difference between the read and write transaction is
the absence of turnaround cycle between the address phase and first data phase, since
in this case, it is the initiator who is driving the AD bus lanes during both phases.
8.6.4 Latency
PCI specification defines several types of latency: arbitration, acquisition and
initial target latency, see figure 8.5.
Figure 8.5: Bus latency.
The length of latency is influenced by the parameters of PCI device that are
described in following sections.
8.6.4.1Latency timer
PCI devices have an internal countdown latency timer. The timer is loaded
with a defined value each time the masters asserts FRAME# signal. This value is
decremented with following clocks and once the counter reaches zero, the master is
obliged to terminate its transaction.
8.6.4.2DEVSEL# Latency
The selected target is required to assert its DEVSEL# signal within three
cycles from assertion of FRAME# signal, otherwise the initiator terminates after
fourth clock from the beginning of the transaction.
8.6.4.3IRDY# / TRDY# Latency
With IRDY# and TRDY# signals, the initiator and the target indicate its ready
condition for data read/write. Yet, there are some restrictions. The initiator must
assert its IRDY# signal within 8 clocks from assertion of FRAME# signal and 8
clocks between following data phases. Similarly, the target is required to assert its
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
62
TRDY# signal within 16 clocks from FRAME# signal assertion and within 8 clocks
between two data phases.
8.6.5 Error Detection and Reporting
All bus agents are required to generate even parity over AD and C/BE# lanes.
The result of even parity is placed on PAR lane, so receiving agent can detect it.
Parity error can be detected during address or data phase.
In case the parity error was detected during data phase, respective receiving
agent may asserts its PERR# signal. If even parity was detected during address
phase, any receiving agent can asserts SERR# signal. The assertion of SERR# signal
should be considered as a fatal condition and handled appropriately with non-
maskable interrupt.
8.6.6 Target-Initiated Termination of Transaction
A transaction can be terminated either by the master or target. In first case,
shown in the previous paragraph, the master uses the signals FRAME# and IRDY#
to terminate the transaction. If it is the target who wants to terminate, it asserts
STOP# signal. There are two types of target-initiated-disconnect:
• Target Disconnect (figure 8.6): DEVSEL# and STOP# are asserted at the same
time. The target is not ready to execute another data phase.
Figure 8.6: Target disconnect.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
63
• Target Abort (figure 8.7): DEVSEL# is de-asserted, STOP# is asserted. The
target has experienced some fatal error condition.
Figure 8.7: Target abort.
8.7 ADVANCED FEATURES OF PCI BUS
PCI specifications define together with the basic bus protocol additional
features, which extend its capabilities. Yet, these optional features are not required to
successfully implement an elementary PCI master/target device.
8.7.1 Interrupt Handling
The PCI bus provides four interrupt signals for each device.
Interrupts are defined as assertion low, level sensitive and asynchronous to
the PCI bus master clock. With an interrupt, PCI device requests attention from its
device driver and stays set until the device clears the condition that caused the
interrupt.
Interrupt acknowledgement bus command is used to read the interrupt vector
(8-bit for x86 processors) from the target.
8.7.2 Special Cycle
The special cycle provides a mechanism to send information to multiple
targets that are enabled to respond to special cycle bus command. An example of
broadcast information may be the processor status such as halt or shutdown.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
64
8.7.3 64-bit extension
64-bit data transfers are executed if both, the target and the master implement
64-bit extension signals. In this case, the maximum data transfer rate can be
increased up to 264 MB/s (33 MHz bus clock).
Another optional mechanism with 64-bit characteristic allows 32-bit agents to
access 64-bit (4 GB) address space. This is accomplished with two address phases
and dual address bus command.
8.8 PLUG AND PLAY CONFIGURATION
Before the PCI bus, devices had to be set manually to determine their
parameters regarding resource requirements such as memory space, I/O space,
interrupts, etc. Incorrect device configuration often led to hardware conflicts, which
were difficult to detect.
In order to simplify system modification, PCI supports the plug and play
feature, allowing a system to be automatically configured at boot time. Each PCI
device provides information about its resource requirements that are used by the PCI
BIOS configuration software to determine system topology. Once the configuration
software has enough information about the system, it assigns non-conflicting
resources to each PCI card.
8.8.1 PCI Configuration Space
PCI specification defines third addressable space called the configuration
space, in which every PCI device gets 256 Bytes. Based on information read
from/written into the device, we can determine its actual status and change
operational mode.
Reads and writes into the configuration space are executed via two registers
CONFIG_ADDRESS and CONFIG_DATA with the configuration read and
configuration write bus command, see section 8.6.1.
CONFIG_ADDRESS identifies bus segment, device, logical function and
configuration register. Configuration data are located in the CONFIG_DATA
register.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
65
8.8.2 Structure of Configuration Space
First 64 Bytes of the configuration space are reserved for a configuration
header that contains identifications and information effecting operational mode of the
device. Remaining 192 Bytes are available for device specific configuration
functions.
8.8.2.1Configuration Header
Although three different types of configuration headers exist, type 0
configuration header, described in this section is used by most of PCI devices.
Structure of type 0 configuration header is shown in figure 8.8.
Figure 8.8: Type 0 configuration header.
8.8.2.1.1Identification Registers
• Vendor ID: This value, assigned by PCI SIG organization identifies the vendor of
the device.
• Device ID: This value assigned by the vendor identifies the device.
• Revision ID: Version of the device.
• Subsystem Vendor ID, Subsystem Device ID: Used in case of multifunctional
device.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
66
• Class code: Defines the basic functional category (storage controller, network
controller, sound card) and specifies its implementation.
8.8.2.1.2Command Registers
Enables to control device behavior. Command register determines to which
PCI cycles the PCI device will respond, or which PCI cycles will be able to generate.
• Respond to PCI memory and or memory access.
• Respond to PCI memory space access.
• Be able to act as a bus master.
8.8.2.1.3Status Register
Status register defines the actual status of the device concerning events such
as target abort, system or parity error. Further, it provides additional information
about device's capabilities, e.g. 66-bit operation support, DEVSEL# timing, etc.
8.8.2.1.4Built-in Self Test Register
This register provides a mechanism for self-testing the device. It enables to
determine self-test support of the device and invoke built-in test and check the result.
8.8.2.1.5Optimization Registers
Latency timer, Cash line size, Max_Lat and Min_GNT belong to a group of
optimizing registers allowing designers to optimally set system performance by
modifying values that effect timing diagrams of PCI transactions.
8.8.2.1.6Base Address Registers (BAS)
The base address registers provide the mechanism, which allows PCI BIOS
configuration software to determine requirements on memory or/and I/O address
space. Once the system topology is determined, the PCI BIOS configuration software
writes the corresponding non-conflicting address ranges into the base address
register. Type 0 configuration header supports up to six base address registers, each
containing the start address of independent address space.
Address space can represent memory space or I/O space. In case of memory
space, it can be placed anywhere in 32-bit or 64-bit space. On top of that, memory
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
67
space can be denoted as prefetchable or non-prefetchable, which means that multiple
reads of the same memory location provides the same result.
If the address space represents I/O space, then it can be located only in 32-bit
space and it can not be denoted as prefetchable.
8.8.2.1.7Expansion Bus ROM Base Register
Similarly to the base address register, the expansion bus ROM base register
contains a value representing the base address of ROM memory.
8.8.2.2Capabilities List
PCI specification revision 2.2 specifies a new mechanism of providing
additional information about a PCI device. The capabilities list resides in the device-
specific portion of function's space, that is in 192 Bytes after 64 Bytes of the
configuration header.
The presence of capabilities list can be determined by respective bit in the
status register. The CapPntr field in the configuration header (see figure 8.8) contains
the offset to the first element of the list.
The capabilities list is implemented as an open-linked list, where each item
consists of 8-bit ID, an 8-bit offset to the next element in the list followed by
additional bytes, see figure 8.9.
Figure 8.9: Structure of capabilities list.
The capabilities list is used to identify new and optional PCI features.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
68
8.8.2.3Vital Product Data
Vital product data is additional information such as part number, serial
number. It can be also used to store data about performance and failure.
Vital product resides in a storage device such as EEPROM on a PCI device.
8.8.3 PCI BIOS
The PCI BIOS provides a system independent means for access into
configuration space of a PCI device. The BIOS is accessible from all operating
modes of the x86 processors.
The PCI BIOS functions enable to identify PCI resources (find PCI device,
find PCI class code), access PCI configuration registers and use PCI functions (e.g.,
generate special cycles).
Bibliography reference [4], [5] and [6] provide information and detailed
description of PCI bus and its use for data transmission.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
69
9. PLX HARDWARE AND SOFTWARE
DEVELOPMENT TOOLS
This chapter describes hardware and software tools that were used to develop
an application that is capable of data transfer across the PCI bus. For this purpose,
the PCI 9050 Reference Design Kit (RDK) and PCI 9050 Software Design Kit
(SDK) were chosen. Both, the PCI 9050RDK and PCI 9050SDK are products of
PLX Technology (www.plxtech.com). The PCI 9050RDK and PCI 9050SDK
provide complex means for development of PCI based applications.
9.1 PCI 9050 BUS TARGET INTERFACE CHIP
The PLX PCI 9050RDK is a complete hardware development tool, which is
suitable for development of PCI based applications. The core of the PCI 9050RDK is
a PCI 9050 bus interface chip, that together with I/O daughter card connector, test
headers and a breadboard area allow the user easy implementation and testing of a
new circuitry.
Figure 9.1: PCI 9050 bus interface chip.
The PCI 9050 bus interface chip provides PCI bus slave interface for adapter
boards. It is designed to connect a wide variety of local bus designs to the PCI bus
and allows local bus circuitry to achieve up to 132 MB/s burst transfers.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
70
The PCI 9050 chip contains read and FIFO between the PCI and a local bus.
The PCI9050 also provides five local address spaces and four chip select signals.
9.1.1 PCI 9050 Main Features
The main features if the PCI 9050 chip, shown in figure 9.1 can be
summarized as follows:
• Compliant with PCI specification revision 2.1, supporting low-cost slave
adapters.
• Support of burst transfers to memory space.
• Interrupt generation.
• Internal local bus clock can run independently of the PCI bus master clock.
• Programmable local bus configuration supports 8-, 16- and 32-bit local bus in
multiplexed or non-multiplexed mode.
• Serial EEPROM interface for a memory that can be used for loading
configuration information.
• Five local independent address spaces.
• Four local chip select signals.
• Possibility of modifying timing diagrams of local bus data transfers.
9.1.2 PCI Bus Interface of PCI 9050 Bus Interface Chip
The PCI 9050 is compliant with PCI specification revision 2.1 and supports
all PCI bus functions as a direct slave interface chip.
As a target, the PCI 9050 chip allows access to its internal registers and local
address spaces. Data transfer can be either 8-, 16- or 32-bit and all bus commands
listed in table 8.1 are supported.
9.1.3 Local Bus Interface of PCI 9050 Bus Interface Chip
The local bus provides a data path between the PCI bus and a non-PCI device.
The PCI 9050 chip, that is the local bus master is responsible for data transfer
between the PCI and local bus.
The local bus can be viewed as a set of lanes with address, data and control
signals to which a user specific circuitry can be attached. The PCI 9050 as the local
bus master mediates communication between the PCI and local bus.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
71
8-, 16- and 32-bit wide data transfers are supported by the local bus and
depending on the settings, it can operate in multiplexed or non-multiplexed mode.
There are four independent address spaces, each one containing a set of
configurations registers that determine local bus characteristics when particular
address space is accessed.
9.1.3.1Local Bus Signals
Similarly to the PCI signals described in section 8.4, local bus signals can be
divide into several groups depending on their purpose: address/data, control/status
and arbitration signals.
9.1.3.1.1Address and Data Signals
• LA[27::2]: Convey address.
• LAD[31::0]: During data phases, they contain data.
9.1.3.1.2Control and Status Signals
• ADS#, ALE#: Local bus access starts when ADS# and ALE are asserted,
indicating valid address on address lanes.
• LBE#[3::0]: Indicate which byte lanes convey valid data.
• LRDY#: If this bus signal is enabled it indicates that the device is ready to be
read from, or written to.
• LW/R#: Indicates data transfer direction.
• WAITO#: Provides status of the internal wait state generator, which can be used
to modify timing diagrams of local bus transactions.
• RD#, WD#: General purpose signals used to indicate to the local device
read/write operation.
9.1.3.1.3Local Bus Arbitration
• LHOLD: Asserted by a device to indicate that it desires to access the local bus.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
72
• LHOLDA: PCI 9050 grants with this signal local bus control to a device, that
before asked for the access with LHOLD signal.
Above listed signals belong to the main ones presented on the local bus. For
more detailed description.
9.1.3.2Modification of Local Bus Timing Diagrams
Write/Read cycle time can extended with internally generated wait states,
and/or with delaying LRDY# signal.
9.1.4 Single Cycle Write and Read
Figures 9.2 and 9.3 shows details of a single write and read on the local bus.
Figure 9.2: Single local bus write.
Figure 9.3: Single local bus read.
9.1.5 PCI Configuration Registers and Local Configuration Registers
PCI configuration registers are grouped in a structure called the configuration
header as described in section 8.8.2.1.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
73
Local configuration registers is a set of twenty one 32-bit registers whose
values determine local bus characteristics of five local address spaces such as local
base address, address range, timing characteristics, chip select, etc.
9.1.6 Serial EEPROM
During POWER-ON, the PCI 9050 RST# signal resets the default values of
the internal registers of PCI 9050 chip. In response to RST# signal, the PCI 9050
outputs the local LRESET signal and checks for a serial EEPROM. If the serial
EEPROM exists, internal registers are set according to the values stored in the
EEPROM. Otherwise, default values are used.
9.1.7 Local Chip Select
The PCI 9050 provides four chip select signals to selectively enable devices
that are attached to the local bus. Each signal is programmable via four chip select
base address registers. Without this feature, external address decoding logic would be
required to implement chip select signals.
9.2 PCI 9050 REFERENCE DESIGN KIT (RDK)
The PCI 9050RDK was used in this project to investigate the possibilities of
PCI bus and its use for PCI based data transfer.
The PLX Technology, the manufacturer of the PCI 9050RDK provides as
well software support, which together with the PCI 9050RDK allows complete
development environment for PCI oriented applications.
The core of the PCI 9050RDK is composed of the PCI 9050 bus interface
chip, that is described in section 9.1. The main purpose of this development board is
to enable developers fast and easy conversion of existing ISA cards (sound cards,
network cards, etc.) into PCI compliant boards, which would have all advantages
resulting from PCI specifications such as high transfer rate, plug and play, etc. For
this reason, one piggyback ISA slot is located on the PCI 9050RDK to which a
functional 8- or 16-bit ISA card can be connected.
Furthermore, on the PCI 9050RDK, there are test headers, I/O daughter card
containing most of the local bus signals from the PCI 9050 chip. These connectors in
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
74
conjunction with a breadboard area can be put to use for PCI implementation of user
built circuitry.
9.2.1 Main features of PCI 9050 Reference Design Kit
The main features of the PCI 9050RDK board can be summarized as follows:
• PCI 9050 bus interface chip: PCI specification revision 2.1 compliant board
based on PLX PCI 9050 bus interface chip.
• Generic and ISA bus: PCI 9050RDK with on-board PCI-to-ISA conversion logic
and a piggyback ISA slot support ISA bus adapters.
• User circuitry support: Large prototype area and test header with a daughter card
connector simplify circuitry development.
• PLXMon: PLXMon provides a comprehensive tool for PCI bus monitoring and
debugging.
Figure 9.4: PLX PCI 9050RDK block diagram.
9.2.2 PCI 9050RDK Subsystems
This section describes subsystems located on the RDK. As can be seen in
figure 9.4, the hardware of the PCI 9050RDK consists of the following subsystems:
• PCI slot interface
• PCI 9050 bus interface chip
• SRAM memory subsystem
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
75
• ISA slot and PCI-to-ISA conversion logic to facilitate the conversion of existing
ISA boards into a PCI platform.
• Daughter card connector to which a new circuitry can be attached.
• Prototyping area and headers for testing and experimenting
9.2.2.1PCI Interface
The PCI 9050RDK, that is fully compliant with PCI specification revision 2.1
can be directly plugged into a PCI slot. As it is target only, the PCI mastering signals
are not used.
9.2.2.2PCI 9050 Bus Interface Chip
The PCI 9050 bus interface chip that represents the RDK core is responsible
for appropriate interface between the PCI bus and the RDK subsystems that are
connected to the local bus. See section 9.1 for more details concerning the PCI 9050
bus interface chip.
With jumpers we can set operational characteristics of the local bus such as:
• Multiplexed or non-multiplexed local bus.
• Local bus clock frequency: Local bus clock that runs asynchronously with
respect to the PCI bus master clock can be set to any frequency up to 40 MHz.
With jumpers the supported frequencies are 8 MHz, 16 MHz and 33 MHz.
• Local bus interrupts: Two local interrupts can be either user defined or routed to
ISA IRQ and ISA bus error signal.
• Local bus chip selects: in total four chip selects can be user mapped or can be
connected to the ISA or SRAM subsystems.
9.2.2.3SRAM Subsystem
There is a 32-bit wide static random access memory (SRAM) supplied on the
RDK to demonstrate memory accesses from the PCI bus. The 32 k DWORD deep
memory operates at 33 MHz and is capable of zero wait state read/write.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
76
9.2.2.4ISA Subsystem
The main purpose of the PCI 9050RDK is to upgrade an existing ISA bus
designs into their PCI counterparts. For this reason, on the PCI 9050RDK, one
piggyback ISA slot is supplied to which an ISA card can be plugged. One MACH210
Programmable Logic Device (PLD) takes care of signal conversion between the PCI
and ISA bus.
9.2.2.5Daughter Card Connector and Prototyping Area
Along with the daughter card area, test headers provide most of the local bus
signals allow easy signal monitoring. In conjunction with a prototyping area, this
subsystem is ideal for simple device development and testing.
9.3 PCI 9050 SOFTWARE DESIGN KIT (SDK) AND PLXMON
PLXMon by PLX Technology is a user interactive program that is designed
not only for working with PCI cards belonging to the PLX family, but also it can be
used for low-level control of non-PLX compliant PCI cards.
Both versions, DOS and Windows 95 based allow for generic PCI cards to:
• Select a PCI device.
• Examine and modify device via its configuration registers.
• Examine and modify memory on the device (32-bit addressing).
• Examine and modify I/O space of the device (16-bit addressing).
With PLX device family, PLXMon provides means for examination and
modification of:
• Local configuration registers
• Serial EEPROM
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
77
10. DESIGNED PCI DEVICE
In order to show a practical use of the possibilities of PCI bus, a simple
device was developed that is capable of data transfer via the PCI bus. The device,
which is described in following section allows data exchange between two
computers. Transfer data are written via PCI bus into the device from where they are
read by another computer via its parallel port (LPT). Two programs show the
functionality of the device.
As the purpose of this elementary example is to practically introduce basic
features of PCI bus, more advanced techniques are not employed.
10.1 APPLICATION OVERVIEW
The block diagram of the application is shown in figure 10.1.
Figure 10.1: Block diagram of the application.
As was mentioned, the example can be used for data transfer between two
computers. To outline the function of the device, data transfer can be divided into
several steps:
• In the software layer of PC1, the program sends data on the PCI bus.
• In the hardware layer of PC1, data are read from PCI bus and stored by the
PCITOLPT device, which is the PCI 9050RDK with added circuitry to latch data
from PCI bus.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
78
• Parallel port of PC2 acquires data from the PCITOLPT device.
• Program in the software layer of PC2 displays and/or processes data gained from
PC1.
The communication is one way, from PC1 (PCI) towards PC2 (LPT) and
transfer data are 8-bit wide.
10.2 HARDWARE PART OF DEVICE
As data written to PCI bus are valid for a certain period time determined by
PCI control signals, the PCI device must be able to recognize when data are valid
and then store them, so they are ready for further processing. The PCI 9050RDK
together with a simple circuitry was used to accomplish this task.
10.2.1 Latch Circuitry on PCI 9050RDK
The designed circuitry described in this section is used to store 8-bit data
acquired from PCI bus.
The simplified scheme of the circuitry that is located between the local bus of
the PCI 9050 chip and parallel port is shown in figure 10.2. See appendix A2 for
complete scheme of the latch circuitry.
Figure 10.2: Scheme of the latch circuitry.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
79
The PCITOLPT device is required to meet following demands:
• When its address appears on the PCI bus, it must recognize it and latch 8-bit data
during subsequent data phase.
• Set a busy flag indicating that data are ready and notify parallel port.
• It must allow parallel port to de-assert the busy flag once data has been read from
the PCITOLPT device.
• Further the device must be able to set its data output pins to high impedance
state.
10.2.2 Timing Diagrams
The communication protocol of data transfer between PCI bus and parallel
port can be divide into several phases, during which data are read by the PCI 9050
chip into its write FIFO, from where they are moved to the local bus and captured by
the octal latch. Latched data are then read by the parallel port of another computer.
10.2.2.1PCI Bus to PCI 9050 Write FIFO Phase
During this phase, data are moved from PCI bus into write FIFO of the PCI
9050 chip, see figure 10.3.
1. PCI bus is idle. Bus master can start the transaction.
2. Bus master puts valid address of the PCI9050 device, bus command and asserts
the FRAME# signal.
3. PCI 9050 chip recognizes its address and asserts its DEVSEL# signal.
4. When the PCI 9050 chip is ready, it asserts TRDY# signal to indicate that is ready
to write data into its internal FIFO. With this clock, the PCI bus to PCI 9050 FIFO
transaction is complete and data are now in the FIFO.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
80
Figure 10.3: Data transfer between PCI bus and PCI9050 write FIFO.
10.2.2.2PCI 9050 Write FIFO to Octal Latch Phase
During this phase, data are moved from write FIFO to local bus and latched in
the latch circuitry, see figure 10.4. This phase is synchronous with the local bus clock
(8 MHz), which runs asynchronously with respect to the PCI bus master clock.
1. Valid address is put on the local address bus.
2. Signal ADS# indicates valid address on the local bus. At the same time PCI 9050
chip uses its internal address decoder to decode the address. As a result it asserts
the chip select signal (CS#) signal.
3. Valid data are presented and WR# signal is asserted. At rising edge of the local
bus clock, data are stored in the octal latch. IRQ signal is asserted to indicate to
the parallel port that data are ready to be read from the latch.
4. Data from write FIFO are stored in the latch circuitry and an interrupt from
parallel port is set.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
81
Figure 10.4: Data transfer between PCI 9050 FIFO and latch circuitry.
10.2.2.3Octal Latch to Parallel Port Phase
During this phase that is asynchronous, data are moved between the octal
latch and parallel port, see figure 10.5.
Figure 10.5: Data transfer between latch circuitry and parallel port.
1. Once IRQ signal is asserted, an interrupt from parallel port indicates to its driver
that data are ready to be read from parallel port. The interrupt service routine
belonging to the interrupt reads data from parallel port.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
82
2. The interrupt service routine indicates with AckIrq signal that data has been
successfully read form PCITOLPT device. AckIrq signal de-asserts IRQ signal.
10.2.3 Parallel Port Configuration
In present configuration, the parallel port works in bi-directional mode and
two pins are used for data flow control. Input signal IRQ is used to trigger interrupt,
whereas output signal AckIrq clears the interrupt after data have been read.
10.2.4 Application Registers
The application uses a set of registers of the PCI9050 and parallel port to
read/write data and to get status information.
10.2.4.1PCITOLPT Registers
• Base Address Register2: This register contains the starting address of the first
local address space of the PCI 9050 chip. Write into this address space asserts the
chip select CS# signal, that controls the octal latch.
• Base Address Register0 + offset (0x50): Local control register, that controls the
pins User 0 and User 1. With this register, the application is able to find out the
status of the input pin User 0 and control output pin User 1.
10.2.4.2LPT Registers
• BASE(0x378): Base address of the parallel port. In the bi-directional mode, it
contains the values on data pins 0 - 7.
• BASE + offset (0x2): Control register of the parallel port. With this register, it is
possible to enable interrupt and set the parallel port into bi-directional mode.
10.3 SOFTWARE PART OF DEVICE
This section describes the software application that was used to demonstrate
communication between two computers.
10.3.1 Device Driver
As the software application is required to run on the Windows operating
system, which does not allow direct access to hardware registers, device drivers for
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
83
the PCITOLPT device and parallel port had to be developed. The WinDriver
program by KRFTech (www.krftech.com) was chosen to accomplish the task.
The process of device driver creation using the WinDriver tool consists of
several steps:
1. After star of the WinDriver, a device can be chosen from a list. This phase also
allows to create an *.ini file, that is used by the Windows to register the device.
2. WinDriver detects all accessible registers of the device.
3. User then chooses which registers, he/she wants to read/write from a Windows
application. Further, it is possible to assign a name to them, which will be used in
the function calls to the device driver. Interrupts are also supported by WinDriver.
4. WinDriver then creates two files *_lib.h and *_lib.c that contains functions for
accessing registers defined in step 3. After including these files into a software
application project, it is possible to control these registers from a Windows
application.
With the WinDriver following functions were created for the PCI2LPT
device (PCI2LPT_lib.*):
• PCI2LPT_WritePciByteLatch: Writes data at the address stored in the Base
Address Register2.
• PCI2LPT_WriteControl, PCI2LPT_ReadControl: Writes/reads data to/from the
local control register. These functions controls both pins, User 0 and User 1.
With the WinDriver following functions were created for the parallel port
(LPT_lib.*):
• LPT_ReadData: Reads data from the Base register of parallel port.
• LPT_WriteControl: Writes into the control register and in this way it can set
operating mode of the parallel mode (interrupt, bi-directional mode).
10.3.2 Example Software Application
Under Windows environment using the Microsoft Visual C++ ver.6, a
software application was developed that demonstrates data transfer between two
computers. To enable others to go through the source code, the software application
was compiled as a Win 32 Console Program.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
84
The software application consists of two stand-alone programs, WritePCI and
ReadLPT. WritePCI program writes byte data into the PCITOLPT device. ReadLPT
program reads data from the PCITOLPT device via parallel port. The programs
communicate through the PCITOLPT device and parallel port in such a way that the
text written in WritePCI appears in the window of the ReadLPT program, see figure
10.6 and 10.7.
Figure 10.6: WritePCI program writes data into the PCITOLPT device.
Figure 10.7: ReadLPT program reads data stored in the PCITOLPT device.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
85
WritePCI program runs on the computer equipped with the PCITOLPT
device, whereas ReadLPT program runs on the other computer with a bi-directional
parallel port. The two computers are connected as shown in figure 10.1.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
86
11. SUMMARY OF PCI PART
The PCI part of this project introduces the Peripheral Component
Interconnect bus and explores its possibilities for data transfer.
To practically show the use of PCI bus, a circuitry was built into the PLX PCI
9050RDK board that together with parallel port allows simplex data transfer between
two computer systems. The functionality of the configuration was successfully
verified with a Windows based application, which enables to send written text from
one computer system to another.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
87
CONCLUSION
The purpose of the DSP part of the project was to develop several examples
that would allow students to familiarize themselves with the domain of digital signal
processing.
One group of examples is more focused on practical aspects of the
TMS320C6701 Evaluation Module, that was used in this project to design and test
DSP algorithms. Applications in this group make possible to change operational
characteristics of 16-bit codec and thus thoroughly investigate the process of analog-
to-digital and digital-to-analog conversion, which is very important, since in many
cases both, the input and output signals are analog and hence we have to be able to
convert them from analog to digital form and vice versa.
The examples in second group introduce different approaches of
implementing DSP algorithms into digital signal processors. There are basically two
ways: low- and high-level implementation. Within the framework of introducing the
low-level approach, the Least Mean Square (LMS) adaptive filter and Fast Fourier
Transform (FFT) algorithms were implemented in C language. The Matlab and its
ability to generate an executable for the TMS320C6701 EVM from a simulink model
was employed with high-level implementation of the finite and infinite response
digital filters. A simple graphical user interface was designed to simplify the task of
converting a simulink model into the executable that can be run on the
TMS320C6701 EVM.
Mathematical theory concerning implemented DSP algorithms together with
a description of the TMS320C6701 EVM and TMS320C6701 DSP processor are as
well part of the project.
In future, more complex DSP algorithms could be tested with the Matlab in
order to fully test the possibilities of high-level implementation. Furthermore,
optional features of the 'C6701 digital signal processor could be exploited. For
instance, the DSP/BIOS and its real-time data exchange might be used for the
purpose of data transfer between the DSP core and an OLE client such as the Matlab,
or Microsoft Excel.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
88
The main goal of the PCI part of the project was to design an PCI based
device that would allow to investigate the possibilities of data transfer via the PCI
bus. Further, it was desired that the application handling the PCI device runs on
Windows environment. For this reason, the report likewise explains the process of
device driver development using the WinDriver tool.
The PCI part starts with a short introduction into the PCI bus, which presents
PCI bus features such as bus arbitration, bus protocol and the plug and play
mechanism. The chapter is followed by description of the PCI 9050 bus interface
chip, which was together with the PLX PCI 9050RDK development board employed
in this project to develop a PCI compliant device.
The designed PCI device exploits the PCI bus for simplex byte data transfer
between two computers. Data are latched through the PCI 9050 chip in the built
circuitry, from where they can be read via the parallel port of another computer. An
implemented communication protocol between the PCI device and parallel port
assures proper data flow between the two computer systems.
With the view of demonstrating the possibilities of the designed PCI device,
two Windows based programs were created. First program transfers typed characters
into the PCI device, whereas the other program reads them out via parallel port.
Developed PCI device is rather elementary example of PCI use that exploits
merely basic functionality of the PCI bus. It is therefore well possible to considerable
enhance this application, or even start a new project that would practically put in use
advanced features of PCI bus, e.g. burst data transfer, 64-bit extension, special cycle,
bus mastering, etc. Yet, these features would require different development kit, since
the PLX PCI 9050 Reference Design Kit does not support all the PCI bus advanced
techniques. On the other hand, the PLX PCI 9050RDK is equipped with ISA
subsystem, that could be used without additional circuitry for ISA card conversion
into its PCI counterpart.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
89
BIBLIOGRAPHY
[1] Dahnoum Naim: “Digital Signal Processing Implementation Using the
TMS3206000 DSP Platform”, Prentice Hall, 2000, ISBN 0-201-61916-4.
[2] Vich Robert, Smekal Zdenek: “Cislicove filtry”, Academia, 2000, ISBN
80-200-0761-X.
[3] Stephen J. Chapman: "Matlab Programming for Engineers", Brooks/Cole,
2002, ISBN 0-534-95151-1.
[4] Solari E., Willse G.: " PCI Hardware and Software", Annabooks, 1996.
[5] Shanley T., Anderson D.: " PCI System Architecture", Addison-Wesley, 1998.
[6] Abbott Doug: “PCI Bus Demystified”, LLH Technology, 2000, ISBN
1-878707-60-4.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
90
APPENDIX
A1 EXAMPLE OF EXECUTABLE GENERATION
This section shows step by step the process of converting a simple simulink
model into an executable file, that can be loaded and run on the 'C6701 EVM.
1. Create a simulink model: Figure A1.1 shows a simulink model that can be
converted into a DSP executable by the Real-Time Workshop.
Figure A1.1: Simulink model to be converted into executable.
2. In the Solver tab of the Simulation Parameters of the Simulation menu, set the
Stop time to inf and the Type to Fixed-point as shown in figure A1.2.
Figure A1.2: Setting of the solver.
3. Go to the Real-Time Workshop tab of the Simulation Parameters of the
Simulation menu.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
91
4. In the Category Target Configuration push the Browse button and choose the
system file called ti_c6701evm.tlc. This step sets the proper compile and build
parameters for the 'C6701EVM target, see figure A1.3.
Figure A1.3: Setting the Real-Time Workshop parameters.
5. In the Category choose TI C6701EVM runtime and in the menu Build action set
Build_and_Execute. With this option the Real-Time Workshop compiles the
simulink model, creates the source files, invokes the Code Composer Studio,
where the source files are compiled and linked into one executable file. After
this, the Real-Time Workshop loads and runs the executable on the 'C6701 EVM.
Figure A1.4: Press the Build & Run button to execute the build process.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
92
6. The build process is started by clicking on the Build & Run button as shown in
figure A1.4.
The correct functionality of this simple application can be verified by putting
a signal to the LineIn input of the EVM board. The same signal should appear on the
Out connector provided the sampling theorem is fulfilled.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
93
A2 SCHEME OF LATCH CIRCUITRY
Figure A2.1 shows a complete scheme of the octal latch circuitry that is
connected to the local bus of the PCI 9050 bus interface chip. See section 10.2 for
more details about the purpose of this circuitry.
The latch circuitry is controlled by the local bus signals of PCI 9050 chip and
one signal from parallel port. In the following list, in/out denotes an input/output
signal with respect to the circuitry.
• D[7::0]: Connected to the LAD[7::0] data lanes of the PCI9050 local bus. (in)
• Q[7::0]: Latched data connected to the data pins of parallel port. (out/tristate)
• LCLK: Local bus clock of the PCI9050 chip. (in)
• CS#: Local bus chip select 0. Asserted when local address space 0 of the
PCI9050 chip is accessed. (in)
• WR#: Indicates that write operation is in progress. (in)
• IRQ: When data are latched, signal IRQ triggers an interrupt from parallel port.
(out)
• AckIrq: Signal from parallel port that clears the interrupt from parallel port. (in)
• User 0: Traces signal IRQ. It can be used to determine, if data have been already
read from the PCITOLPT device. (out)
• User 1: Sets the outputs of the octal latch to high impedance state. (in)
Table A2.1 contains a list of the components that were used to build the latch circuit.
QUANTITY NAME TYPE VALUE DESCRIPTION
8 R1 - R8 R-EU 0204/5 39 kΩ Resistor.
8 R9 - R16 R-EU 0204/5 100 Ω Resistor.
8 D1 - D8 LED 5 mm 1,5 V / 20mA Light Emitting Diode.
8 T1 - T8 BC237 Bipolar NPN transistor.
1 IC1 M74LS573B Three state octal transparent latch.
1 IC2 MC14043B Three state quadruple R/S latch.
1 V1 SN74LS02 Quad 2-input NOR gate.
1 V2 M74LS08 Quad 2-input AND gate.
Table A2.1: List of components for the latch circuitry.
ÚSTAV AUTOMATIZACE A MĚŘICÍ TECHNIKY Fakulta elektrotechniky a komunikačních technologií
Vysoké učení technické v Brně
94
Figure A2.1: Octal latch circuitry.