ee4oi4 engineering design programmable logic technology
TRANSCRIPT
EE4OI4Engineering Design
Programmable Logic Technology
2
Evolution of Silicon Chip• We often measure the size of an IC by the number of logic
gates or the number of transistors that the IC contains.• Example: 100k-gate IC: contains equivalent of 100,000 two-
input NAND gates.• Small-scale integration (SSI) ICs: contains a few (1 to 10)
logic gates (often simple gates NANA, NOT, AND)• Medium-scale integration (MSI): increased the range to
counters and similar larger scale logic functions• Large-scale integration (LSI): packed even larger logic
functions such as the first microprocessor into a single chip• Very large scale integration (VLSI): 64 bit microprocessors
with cache memory and floating point arithmetic units (over a million transistor on a single silicon)
3
Evolution of Silicon Chip• Some digital logic ICs are standard parts.
• These ICs can be selected from catalog and data books and bought and used in different systems
• With the advent of VLSI in the 1980s engineers began to realize the advantages of designing an IC that was customized or tailored to a particular system or application rather than using standard ICs.
4
Digital logic technology• ICs are made on a thin silicon wafer
• The transistors and wiring are made from many layers (between 10 to 15) built on top of one another
• The first half-dozen or so layers define the logic cells (AND , OR, Flip-flop). The last half-dozen or so define the wires between the logic cells (mask layer or interconnect)
5
Full Custom
Standard Logic
Progammable Logic (FPLDs) ASICs
Digital Logic
TTL 74xx
CMOS 4xxx
PLDs FPGAs
Gate Arrays
Microprocessor & RAM
Standard Cell
CPLDs
Digital logic technologies.
Digital logic technologies
6
Digital logic technology• In a full-custom IC some (or all) logic cells are customized
and all the mask layers are also customized• Example: a microprocessor is a full custom • The designer does not use pre-tested, pre-characterized cells • Why?
– No suitable entity cell library available (not fast enough, not small enough, consumes too much power) or no cell library is available (new application)
• Full custom ICs are the most expensive to design and manufacture
• Design time is long• Fewer and fewer full-custom ICs are being designed because
of the above problems
7
Digital logic technology
• Traditional integrated circuits chips: perform a fixed operation defined by device manufacturer
• Internal functional operation is defined by user:– Application Specific Integrated Circuits (ASIC)
– Field Programmable Programmable Logic Devices (FPLD)
8
Digital logic technology• ASIC:
– Gate arrays
– Standard cells
• Gate array: an array of pre-manufactured logic cells– A final manufacturing step is required to interconnect the logic cells in
a pattern created by the designer to implement a particular design
• Standard cell: no fixed internal structure– The manufacturer builds the chip based on the user’s selection of
devices from the manufacturer’s standard cell library
9
Digital logic technology
• Programmable logic devices (PLDs) are standard ICs that may be configured or programmed to create a part customized to a specific application
• Features:– No customized layers or cells
– Fast design time
10
Digital logic technology
Full custom Semi-custom Programmable
Logic cell
Mask Layers
Customized
Customized
Pre-designed
Customized
Pre-designedProgrammed by the user
Pre-designedProgrammed by the user
11
PLDs
ASICs
Full CustomVLSI Design
Speed,Density,Complexity,MarketVolumeneeded forProduct
Engineering Cost, Time to Develop Product
CPLDsFPGAs
Digital logic technology tradeoffs.
12
Programmable Logic Technology
• Simple programmable logic devices (PLDs) such as programmable logic array (PLA) and programmable array logic (PAL) have been in use for over 20 years.
• PLA: the idea is that logic functions can be realized in sum-of products form
13
General structure of a PLA
f 1
AND plane OR plane
Input buffers
inverters and
P 1
P k
f m
x 1 x 2 x n
x 1 x 1 x n x n
14
Gate-level diagram of a PLA
f1
P1
P2
f2
x1 x2 x3
OR plane
Programmable
AND plane
connections
P3
P4
15
Customary schematic of a PLAf 1
P 1
P 2
f 2
x 1 x 2 x 3
OR plane
AND plane
P 3
P 4
16
An example of a PLA
f 1
P 1
P 2
f 2
x 1 x 2 x 3
AND plane
P 3
P 4
17
Programmable Logic Technology• Programmable connections (switches) are difficult to
fabricate and reduce the speed of circuit
• In PALs the AND plane is programmable but the OR plane is fixed.
• To compensate for reduced flexibility, PALs are manufactured in a range
18
Programmable Logic Technology• On many PLAs and PALs the output of the OR gate is
connected to a flip flop whose output can then be feedback as an input into the AND gate array.
• This way simple state machines are implemented
19
Output circuitry
f 1
To AND plane
D Q
Clock
SelectEnable
Flip-flop
20
FPLD• CPLDs and FPGAs are the highest density and most advanced
programmable logic devices.
• These devices are collectively called field programmable logic devices (FPLD).
• Characteristics:– None of the mask layers are customized
– The core is a regular array of programmable logic cells that can implement combinational as well as sequential logic
– A matrix of programmable interconnect surrounds the basic logic cells
– Programmable I/O cells
• For all but the most time critical design applications, CPLDs and FPGAs have adequate speed (clock range 50-400 MHz)
21
FPLD• CPLDs and FPGAs typically contain multiple copies of a
basic programmable logic element (LE) or logic cell (LC).
• Logic element: can implement a network of several logic gates that feed into 1 or 2 flip-flops
• Logic elements are arranged in a column or matrix on the chip
22
FPLD• To perform complex operations, logic elements are connected
using a programmable interconnection network
• Interconnection network contains row and/or column chip-wide interconnections.
• Interconnection network often contains shorter and faster programmable interconnects limited only to neighboring logic elements
23
FPLD
• FPLDs contain:
1. Programmable logic cells
2. Programmable interconnection
3. Programmable I/O cells
24
Structure of a CPLD
CellI/O
blo
ck I/O
blo
ck I/O
blo
ck I/O
blo
ck
Interconnection wires
Cell
Cell Cell
25
A section of a CPLD
D Q
D Q
D Q
PAL-like block (details not shown)
PAL-like block
26
Structure of an FPGA
Logic block Interconnection switches
I/O block
I/O block
I/O b
lock I/
O b
lock
27
FPLD• In large FPLDs the clock arrives at different times at different
flip flops if it is routed through the chip like a normal signal
• The situation in which the clock signal arrives at different times at different flip flops is known as clock skew.
• Clock signals in large FPLDs are normally distributed using an internal high speed bus (global clock line)
• Using global clock line, clock is distributed to all flip-flops in the device at the same time.
28Figure 10.44 An H tree clock distribution network
Clock
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
ff
29
UP2
30
UP3
31
Altera MAX7000
• MAX7000 is a CPLD family with 600 to 20000 gates.
• Configured by an internal electrically erasable programmable read only memory (EEPROM)
• Configuration is retained when power is removed
• The 7000 family contains from 32 to 256 macrocells.
• An individual macrocell contains five programmable AND gates.
• The AND/OR network is designed to implement Boolean equations expressed in sum-of-product form.
32
Product-TermSelectMatrix
ClearSelect
Clock/EnableSelect
VCC
PRN
CLRN
ENA
D Q
GlobalClear
GlobalClock
To I/OControl
Block
To PIA
This respresents amultiplexercontrolled by theconfigurationprogram
ProgrammableRegister
36 Signalsfrom PIA
16 ExpanderProduct
Shared LogicExpanders
LAB Local Array
Parallel LogicExpanders(from othermacrocells)
MAX 7000 macrocell.
33
Altera MAX7000
• Macrocells are combined into groups of 16 called logic array block (LAB)
• Input to the AND gates include product terms from other macrocells in the same block or signals from the chip-wide programmable interconnect array (PIA)
34
Altera MAX7000
• Each I/O pins contains a programmable tri-state output buffer.
• An I/O pin can be programmed as input, output, output with a tri-state driver and tri-state bi-directional.
35
Altera MAX7000• If more than five product terms are required, additional
product terms are generated using the following methods:
1. Parallel expander: product terms can be shared between macrocells. A macrocell can borrow up to 15 product terms from its neighbors
2. Shared expander: one of the product terms in a macrocell is inverted and fed back to the shared pool of product term.
• The inputs to this product term are used in complement form and using DeMorgan’s theorem a sum term is produced.
• Since there are 16 macrocells in an LAB, shared logic expander pool has up to 16 terms
36
Input/GCLK1Input/OE2/GCLK2
Input/OE1
LAB A
Macrocells1-166-
6-16
16
6-16
I/OControlBlock
6-16I/O Pins
3
LAB C
Macrocells33-486-
6-16
16
6-
I/OControlBlock
6-16I/O Pins
3
LAB B
LAB D
Macrocells17-32
Macrocells49-64
6-16
1
3
6-16
1
3
6-16I/O Pins
6-16I/O Pins
I/OControlBlock
I/OControlBlock
6
6
6
6
PIA
6 OutputInput/GCLRn
6 Output
6-
6-16
6-
6-
MAX 7000 CPLD architecture.
37
FLEX 10K
• Flex 10K: an FPGA family with 10,000 to 250,000 gates.
• Configured by loading internal static random access memory (SRAM).
• The configuration is lost whenever power is removed
• Gate logic is implemented using a look-up table (LUT)
38
FLEX 10K• LUT is a high-speed 16 by 1 SRAM.
• Four inputs are used to address the LUT’s memory
• The truth table for the desired gate network is loaded into the LUT’s SRAM.
• A single LUT can model any network of gates with 4 inputs and one output.
39
4 InputLUT
(16 x 1 RAM)
ABCD
F
A
B
C
D
F
RAM Contents Address Data
A B C D F 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 0 0 0 0 1 0 1 0 0 1 1 0 1 0 1 1 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 1 1 0 1 1 0 1 1 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1
Using a lookup table (LUT) to model a gate network.
40
PRN
CLRN
ENA
D Q
Programmable Register
DATA1DATA2DATA3DATA4
LABCTRL1LABCTRL2
Chip-WideReset
LABCTRL3LABCTRL4
Look-UpTable(LUT)
CarryChain
CascadeChain
To FastTrackInterconnect
To LAB LocalInterconnect
Clear/PresetLogic
Clock Select
CarryOut
CascadeOut
Register BypassCarry
InCascade
In
FLEX 10K Logic Element (LE).
41
FLEX 10K• Two dedicated high speed paths are provided in FLEX 10K:
carry chain and cascade chain• They both connect adjacent LEs without using general
purpose interconnect path• Carry chain: supports high speed adders and counters (carry
forward function between LEs)• Cascade chain can implement functions with a more than 4
inputs.• Adjacent LUTs compute portions of the function in parallel
and the cascade chain serially connects the intermediate values
• Cascade chain uses logic AND or OR to connect the outputs of adjacent LEs.
42
Carry chain
43
Cascade chain
44
LE1LE1
LE2
LE3
LE4
LE5
LE6
LE7
LE8
Carry-In andCascade-In Column-to-Row
Interconnect
Row Interconnect
Dedicated Inputs &Global Signals
LogicBlockArray(LAB)
4
4
4
4
4
4
4
4
4
8
616
4
Carry-Out and Cascade-Out2
2
4
8 24
168
FLEX 10K Logic Array Block (LAB).
45
FLEX 10K CPLD architecture.
46
FLEX 10K• The chip also contains embedded array blocks (EAB).
• EABs are SRAM blocks that can be configured to provide memory blocks of various aspect ratios.
• An EAB contains 2048 SRAM cells which can be used to provide memory blocks with a range of aspect ratios: 256x8, 512x4, 1024x2, 2048x1.
47
FLEX 10K
48
Cyclone
• Cyclone: Configured by loading internal static random access memory (SRAM).
• The configuration is lost whenever power is removed
• Cyclone’s logic array consists of LABs, with 10 Logic Elements (LEs) in each LAB.
• An LE is a small unit of logic providing efficient implementation of user logic functions
• Cyclone had between 2,910 to 20,060 LEs
49
Cyclone• RAM blocks are embedded in Cyclone devices
• These blocks are dual-port memory blocks with 4K bits of memory plus parity (4,608)
• These blocks provide dual-port or single port memory from 1 to 36 bits wide at up to 200 MHz.
• These blocks are grouped into columns across the device in between certain LABs
• The Cyclone EP1C6 and EP1C12 contain 92 and 239K bits of embedded RAM
50
Cyclone
51
Cyclone
52
Cyclone
LABCTRL1
LABCTRL2
LABPRE/ALOAD
Chip-Wide Reset
AsynchronousClear/Preset/Load Logic
Clock & ClockEnable Select
LABCTRL1
LABCTRL2
LABCLKENA1
LABCLKENA2
Look-UpTable(LUT)
CarryChain
DATA1DATA2DATA3
DATA4
Addnsub
LAB Carry InCarry In1Carry In0
LAB Carry InCarry In1Carry In0
SynchronousLoad and
Clear Logic
Register ChainRouting fromPrevious LE
LAB-WideSynchronous
Load LAB-WideSynchronous
ClearRegister Bypass
PRNALDD
QADATA
ENACLRN
Register
Feedback
ProgrammableRegister
PackedRegistered
Select
LUT ChainRouting toNext LE
Row, Column,and DirectLink Routing
Row, Column,and DirectLink Routing
Local Routing
Register ChainOutput
53
Cyclone• Gate logic is implemented using a look-up table (LUT)
• The LUT is a high-speed 16 by 1 SRAM
• Four inputs are used to address the LUT’s memory
• The truth table for the desired gate network is loaded into the LUT’s SRAM during programming
54
Cyclone• The output of LUT can be fed into a D flip-flop and then to
the interconnection network.
• More complex gate networks require interconnection with neighboring logic elements.
• A logic array block (LAB) is composed of ten logic elements (LE)
• Both programmable local LAB and chip-wide row and column interconnects are available
• Carry chain are also provided to support faster addition operation
55
Cyclone
Row Interconnect
LocalInterconnect
LAB LocalInterconnect
LAB
Direct LinkInterconnectfromAdjacentBlock
Direct LinkInterconnect
fromAdjacent
Block
Direct LinkInterconnectto AdjacentBlock
Direct LinkInterconnectto Adjacent
Block