cs 194-6 digital systems project laboratory lecture 1...
Post on 27-Jul-2018
215 Views
Preview:
TRANSCRIPT
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
2008-9-6John Lazzaro
(www.cs.berkeley.edu/~lazzaro)
CS 194-6 Digital Systems Project Laboratory
Lecture 1 – Virtex-5 Microarchitecture
www-inst.eecs.berkeley.edu/~cs194-6/
And also, an introduction to the project.
Special Guest: Greg Gibeling
Not a professor. “John” is OK.
1
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Topics for today’s lecture
Project ideas (Greg Gibeling)
Class organization, semester schedule, administrivia ...
Introduction to the project platform (Xilinx ML505-110 evaluation board).
2
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Project platform: Xilinx ML505-110
3
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
FPGA: Xilinx Virtex-5 XC5VLX110TVirtex-5 “die photo”
A die is an unpackaged part
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
4
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
From die to PC board ...
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
2 www.xilinx.com XAPP426 (v1.3) March 3, 2006
Implementing Xilinx Flip-Chip BGA PackagesR
*Xilinx flip-chip packages are not hermetically sealed and exposure to cleaning solvents/ chemicals or excessive moisture during boards assembly could pose serious package reliability concerns. Small vents are kept by design between the heatspreader (lid) and the organic substrate to allow for outgassing and moisture evaporation. Solvents or other corrosive chemicals could seep through these vents and attack the organic materials and components inside the package and hence are strongly discouraged during board assembly of Xilinx flip-chip BGA packages.
Recommended PCB Design Rules
For Xilinx BGA packages, NSMD (Non Solder Mask Defined) pads on the board are suggested. This allows a clearance between the land metal (diameter L) and the solder mask opening (diameter M) as shown in Figure 3. The space between the NSMD pad and the solder mask, and the actual signal trace widths depends on the capability of the PCB vendor. The cost of the PCB is higher when the line widths and spaces are tighter.
Figure 2: Package Construction with Type II Lid
Copper Heatspreader
Solder Ball
Flip Chip Solder Bump
Organic Build-Up Substrate
Underfill Epoxy
Silicon Die
Thermal Interface Material
Adhesive Epoxy*
Figure 3: Suggested Board Layout of Soldered Pads for BGA
Ball Grid Array (BGA)
Flip-Chip Package
5
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Tracing a die pad through a design ...What a pad looks like on the die ...
216 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 6: SelectIO ResourcesR
SelectIO Resources IntroductionAll Virtex-5 FPGAs have configurable high-performance SelectIO™ drivers and receivers, supporting a wide variety of standard interfaces. The robust feature set includes programmable control of output strength and slew rate, and on-chip termination using Digitally Controlled Impedance (DCI).
Each IOB contains both input, output, and 3-state SelectIO drivers. These drivers can be configured to various I/O standards. Differential I/O uses the two IOBs grouped together in one tile.
! Single-ended I/O standards (LVCMOS, LVTTL, HSTL, SSTL, GTL, PCI)
! Differential I/O standards (LVDS, HT, LVPECL, BLVDS, Differential HSTL and SSTL)
! Differential and VREF dependent inputs are powered by VCCAUX
Each Virtex-5 I/O tile contains two IOBs, and also two ILOGIC blocks and two OLOGIC blocks, as described in Chapter 7, “SelectIO Logic Resources.”
Figure 6-2 shows the basic IOB and its connections to the internal logic and the device Pad.
Each IOB has a direct connection to an ILOGIC/OLOGIC pair containing the input and output logic resources for data and 3-state control for the IOB. Both ILOGIC and OLOGIC can be configured as ISERDES and OSERDES, respectively, as described in Chapter 8, “Advanced SelectIO Logic Resources.”
SelectIO Resources General GuidelinesThis section summarizes the general guidelines to be considered when designing with the Virtex-5 SelectIO resources.
Figure 6-2: Basic IOB Diagram
ug190_6_02_021306
PADOUT
DIFFO_IN
DIFFO_OUT
I
T
O
DIFFI_IN
OUTBUFINBUF
PAD
What a pad looks like on an I/O Block (IOB)
schematic ...
Which pad goes to which package pin?
6
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Banks of I/O placed on chip floor plan
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
Virtex-5 FPGA User Guide www.xilinx.com 217UG190 (v4.2) May 9, 2008
SelectIO Resources General GuidelinesR
Virtex-5 I/O Bank RulesIn Virtex-5 devices, with some exceptions in the center column, an I/O bank consists of 40 IOBs (20 CLBs high and a single clock region). There are always four half-sized banks (20 IOBs) and a single configuration bank in the center column. The number of banks depends upon the device size, and in larger devices, there are additional full-sized banks in the center column. In the Virtex-5 Overview the total number of I/O banks is listed by device type. The XC5VLX30 has 12 usable I/O banks and one configuration bank. Figure 6-3 is an example of a columnar floorplan showing the XC5VLX30 I/O banks.
Reference Voltage (VREF) Pins
Low-voltage, single-ended I/O standards with a differential amplifier input buffer require an input reference voltage (VREF). VREF is an external input into Virtex-5 devices. Within each I/O bank, one of every 20 I/O pins is automatically configured as a VREF input, if using a single-ended I/O standard that requires a differential amplifier input buffer.
Output Drive Source Voltage (VCCO) Pins
Many of the low-voltage I/O standards supported by Virtex-5 devices require a different output drive voltage (VCCO). As a result, each device often supports multiple output drive source voltages.
Output buffers within a given VCCO bank must share the same output drive source voltage. The following input buffers use the VCCO voltage: LVTTL, LVCMOS, PCI, LVDCI and other DCI standards.
Figure 6-3: Virtex-5 XC5VLX30 I/O Banks
ug190_6_03_021306
BANK40 I/O
BANK20 I/OBANK20 I/O
BANK20 I/O
BANK20 I/O
BANK40 I/O
BANK40 I/O
BANK40 I/O
CONFIG
BANK40 I/O
BANK40 I/O
BANK40 I/O
BANK40 I/O
Virtex-5 FPGA Packaging and Pinout Specification www.xilinx.com 275UG195 (v4.3) June 18, 2008
FF676 Package—LX30R
Figure 3-8: FF676 Package—LX30 SelectIO Bank Diagram
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
ug195_c3_06_031507
12
34
56
78
910
1112
1314
1516
1718
1920
2122
2324
2526
ABCDEFGHJKLMNPRTUVWY
AAABACADAEAF
ABCDEFGHJKLMNPRTUVWYAAABACADAEAF
16 16 16 16 16 16 16 16 16 16 15 15 15 15 15 15 15 15 15 1516 16 16 16 16 16 16 16 16 16 15 15 15 15 15 15 15 15 15 15 1516 16 16 16 16 16 16 16 16 16 16 15 15 15 15 15 15 15 15 15 1516 16 16 16 16 16 16 16 16 3 3 3 15 15 15 15 15 15 15 15 1514 14 14 12 12 12 3 3 3 3 3 3 3 3 3 11 11 11 11 13 13
14 14 12 12 12 3 3 3 3 1 3 1 3 3 3 11 11 11 13 1314 14 12 12 12 12 1 1 1 1 1 1 1 1 1 11 11 11 13 13 1314 14 14 12 12 12 1 1 1 1 1 1 1 1 1 11 11 11 11 13 1314 14 12 12 12 11 11 11 11 13 13 1314 14 14 12 12 12 11 11 11 11 13 13
14 12 12 12 12 11 11 11 11 13 1314 14 12 12 12 12 11 11 11 11 13 13 1314 14 12 12 12 12 11 11 11 11 13 1314 12 12 12 12 11 11 11 11 13 13 1314 14 12 12 12 12 17 17 11 11 13 13
14 12 18 18 18 17 17 17 17 13 1314 14 18 18 18 18 17 17 17 17 13 13 1314 14 18 18 18 18 17 17 17 17 17 1314 18 18 18 18 17 17 17 17 17 13 1314 14 18 18 18 18 2 2 2 2 2 2 17 17 17 17 13 13
14 18 18 18 18 2 2 2 2 2 2 2 2 2 17 17 17 17 17 1314 14 18 18 18 18 4 4 2 4 2 2 2 2 17 17 17 17 17 13 1314 14 18 18 18 4 4 4 4 4 4 4 4 4 4 17 17 17 17 17 1314 18 18 18 18 4 4 4 4 4 4 4 1714 14 18 18 18
14 18 18 18
Colors on this package pinout map to banks.
7
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Virtex-5 Top-Down
8
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
Colors represent different types of resources:LogicBlock RAMDSP (ALUs)ClockingI/OSerial I/O + PCI
A routing fabric runs throughout the chip to wire everything together.
9
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
Routing fabric requires many interconnect layers.
10
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Logic Resources
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
11
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
Configurable Logic Blocks (CLBs)
Virtex-5 FPGA User Guide www.xilinx.com 171UG190 (v4.2) May 9, 2008
R
Chapter 5
Configurable Logic Blocks (CLBs)
CLB OverviewThe Configurable Logic Blocks (CLBs) are the main logic resources for implementing sequential as well as combinatorial circuits. Each CLB element is connected to a switch matrix for access to the general routing matrix (shown in Figure 5-1). A CLB element contains a pair of slices. These two slices do not have direct connections to each other, and each slice is organized as a column. Each slice in a column has an independent carry chain. For each CLB, slices in the bottom of the CLB are labeled as SLICE(0), and slices in the top of the CLB are labeled as SLICE(1).
The Xilinx tools designate slices with the following definitions. An “X” followed by a number identifies the position of each slice in a pair as well as the column position of the slice. The “X” number counts slices starting from the bottom in sequence 0, 1 (the first CLB column); 2, 3 (the second CLB column); etc. A “Y” followed by a number identifies a row of slices. The number remains the same within a CLB, but counts up in sequence from one CLB row to the next CLB row, starting from the bottom. Figure 5-2 shows four CLBs located in the bottom-left corner of the die.
Figure 5-1: Arrangement of Slices within the CLB
SwitchMatrix
Slice(1)
COUTCOUT
CINCIN
Slice(0)
CLB
UG190_5_01_122605
Slices define regular connections to the switching fabric, and to slices in
CLBs above and below it on the die.
The LX110T has 17,280 slices.
12
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
X-Y naming convention for slices
172 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
Slice DescriptionEvery slice contains four logic-function generators (or look-up tables), four storage elements, wide-function multiplexers, and carry logic. These elements are used by all slices to provide logic, arithmetic, and ROM functions. In addition to this, some slices support two additional functions: storing data using distributed RAM and shifting data with 32-bit registers. Slices that support these additional functions are called SLICEM; others are called SLICEL. SLICEM (shown in Figure 5-3) represents a superset of elements and connections found in all slices. SLICEL is shown in Figure 5-4.
Figure 5-2: Row and Column Relationship between CLBs and Slices
SliceX1Y1
COUTCOUT
CINCIN
SliceX0Y1
CLB
UG190_5_02_122605
SliceX1Y0
COUTCOUT
SliceX0Y0
CLB
SliceX3Y1
COUTCOUT
CINCIN
SliceX2Y1
CLB
SliceX3Y0
COUTCOUT
SliceX2Y0
CLB
Lower-left corner of the die.
X0, X2, ... are lower CLB slices.X1, X3, ... are upper CLB slices.
Y0, Y1, ... are CLB column positions.
13
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Atoms: 5-input Look Up Tables (LUTs)ExpressFabric Technology
WP245 (v1.1.1) July 7, 2006 www.xilinx.com 5
R
The 6-input LUT leads to several benefits:• As it implements wider functions directly in the LUT, the number of logic levels
between registers is reduced, leading to higher performance.• It implements significantly more logic than a LUT with four inputs.• Power consumption is reduced because the larger LUT reduces the amount of
required interconnect (routing resources).The Virtex-5 family SLICEM LUTs also provide additional benefits:• New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or
32 x 2 distributed RAM. Benefits for the designer are a much denser and faster implementation of distributed RAM with increased flexibility.
• Longer SRL chains. A single LUT supports a 32-bit SRL. A slice can thus implement a shift register of up to 128 bits, providing significant area savings and reduced routing resources in comparison to previous architectures. Shift registers are features available only in Xilinx devices. The Xilinx ISE™ software packer automatically packs two 16-bit SRLs with common addressing but different data. In other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it can be implemented in a single slice.
Routing and Interconnect ArchitectureWith process technology advancements, interconnect timing delays can account for more than 50% of the critical path delay. A new diagonally symmetric interconnect pattern, developed for the Virtex-5 family, enhances performance by reaching more places in fewer hops. The new pattern allows for more logic connections to be made within two or three hops. Moreover, the more regular routing pattern makes it easier for the Xilinx ISE software to find the most optimal routes. All of the interconnect features are transparent to FPGA designers, but translate to higher overall
Figure 3: Block Diagram of a Virtex-5 6-Input LUT
WP245_03_051006
LUT5
A1
A2
A3
A4 D
A5
A6
A2
A3
A4 D
D6
D5
A5
A6
A2
A3
A4
A5
A6
LUT6
LUT5A[6:2] D000000000100010
....
101
111011111011111
001
Q
Q
Q
Q
Q
Q
(1)
(1)
(1)
(0)
(0)
(0)
....D
A[6:2]
Computes any 5-input logic
function.
Timing is independent of function.
Flip-flops set during
configuration.14
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Virtex-5 6-LUTs: Composition of 5-LUTsMay be used
as one 6-input LUT (D6 out) ...
Combinational logic
(post configuration)
... or as two 5-input LUTS (D6 and D5)
ExpressFabric Technology
WP245 (v1.1.1) July 7, 2006 www.xilinx.com 5
R
The 6-input LUT leads to several benefits:• As it implements wider functions directly in the LUT, the number of logic levels
between registers is reduced, leading to higher performance.• It implements significantly more logic than a LUT with four inputs.• Power consumption is reduced because the larger LUT reduces the amount of
required interconnect (routing resources).The Virtex-5 family SLICEM LUTs also provide additional benefits:• New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or
32 x 2 distributed RAM. Benefits for the designer are a much denser and faster implementation of distributed RAM with increased flexibility.
• Longer SRL chains. A single LUT supports a 32-bit SRL. A slice can thus implement a shift register of up to 128 bits, providing significant area savings and reduced routing resources in comparison to previous architectures. Shift registers are features available only in Xilinx devices. The Xilinx ISE™ software packer automatically packs two 16-bit SRLs with common addressing but different data. In other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it can be implemented in a single slice.
Routing and Interconnect ArchitectureWith process technology advancements, interconnect timing delays can account for more than 50% of the critical path delay. A new diagonally symmetric interconnect pattern, developed for the Virtex-5 family, enhances performance by reaching more places in fewer hops. The new pattern allows for more logic connections to be made within two or three hops. Moreover, the more regular routing pattern makes it easier for the Xilinx ISE software to find the most optimal routes. All of the interconnect features are transparent to FPGA designers, but translate to higher overall
Figure 3: Block Diagram of a Virtex-5 6-Input LUT
WP245_03_051006
LUT5
A1
A2
A3
A4 D
A5
A6
A2
A3
A4 D
D6
D5
A5
A6
A2
A3
A4
A5
A6
LUT6
LUT5
The LX110T has 69,120 6-LUTs6-LUT delay is 0.9 ns
15
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
The simplest view of a slice
194 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
Designing Large Multiplexers
4:1 Multiplexer
Each Virtex-5 LUT can be configured into a 4:1 MUX. The 4:1 MUX can be implemented with a flip-flop in the same slice. Up to four 4:1 MUXes can be implemented in a slice, as shown in Figure 5-21.
Figure 5-21: Four 4:1 Multiplexers in a Slice
UG190_5_21_050506
(D[6:1])
(C[6:1])
(B[6:1])
(A[6:1])
(CLK)CLK
6
SLICE
LUT
LUT
LUT
LUT
A[6:1]
O6
6 A[6:1]
O6
RegisteredOutput
4:1 MUX Output
(Optional)
D Q
(D)
(DQ)
RegisteredOutput
4:1 MUX Output
(Optional)
D Q
(C)
(CQ)
RegisteredOutput
4:1 MUX Output
(Optional)
D Q
(B)
(BQ)
RegisteredOutput
4:1 MUX Output
(Optional)
D Q
(A)
(AQ)
6 A[6:1]
O6
6 A[6:1]
O6
SEL D [1:0], DATA D [3:0]Input
SEL C [1:0], DATA C [3:0]Input
SEL B [1:0], DATA B [3:0]Input
SEL A [1:0], DATA A [3:0]Input
Four 6-LUTs
Four Flip-Flops
Switching fabric may see combinational and registered outputs.
An actual Virtex-5 slice adds many small features to this
simplified diagram. We show them one by one ...
16
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Two 7-LUTs per slice ...
Extra multiplexers(F7AMUX, F7BMUX)
Virtex-5 FPGA User Guide www.xilinx.com 195UG190 (v4.2) May 9, 2008
CLB OverviewR
8:1 Multiplexer
Each slice has an F7AMUX and an F7BMUX. These two muxes combine the output of two LUTs to form a combinatorial function up to 13 inputs (or an 8:1 MUX). Up to two 8:1 MUXes can be implemented in a slice, as shown in Figure 5-22.
Figure 5-22: Two 8:1 Multiplexers in a Slice
UG190_5_22_090806
(D[6:1])
(C[6:1])
(CX)
(B[6:1])
(A[6:1])
(AX)
SELF7(1)(CLK)
CLK
SELF7(2)
SEL D [1:0], DATA D [3:0]Input (1)
SEL C [1:0], DATA C [3:0]Input (1)
SEL B [1:0], DATA B [3:0]Input (2)
SEL A [1:0], DATA A [3:0]Input (2)
6
SLICE
LUT
LUT
LUT
LUT
A[6:1]
O6
6 A[6:1]
O6 RegisteredOutput
8:1 MUXOutput (1)
(Optional)
D Q
(CMUX)
(CQ)
RegisteredOutput
8:1 MUXOutput (2)
(Optional)
D Q
(AMUX)
(AQ)
6 A[6:1]
O6
6 A[6:1]
O6
F7BMUX
F7AMUX
Extra inputs (AX and CX)
17
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Or one 8-LUTs per slice ...
Third multiplexer(F8MUX)
Third input (BX)
196 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
16:1 Multiplexer
Each slice has an F8MUX. F8MUX combines the outputs of F7AMUX and F7BMUX to form a combinatorial function up to 27 inputs (or a 16:1 MUX). Only one 16:1 MUX can be implemented in a slice, as shown in Figure 5-23.
It is possible to create multiplexers wider than 16:1 across more than one SLICEM. However, there are no direct connections between slices to form these wide multiplexers.
Fast Lookahead Carry Logic
In addition to function generators, dedicated carry logic is provided to perform fast arithmetic addition and subtraction in a slice. The Virtex-5 CLB has two separate carry chains, as shown in Figure 5-1. The carry chains are cascadable to form wider add/subtract logic, as shown in Figure 5-2.
The carry chain in the Virtex-5 device is running upward and has a height of four bits per slice. For each bit, there is a carry multiplexer (MUXCY) and a dedicated XOR gate for adding/subtracting the operands with a selected carry bits. The dedicated carry path and
Figure 5-23: 16:1 Multiplexer in a Slice
UG190_5_23_050506
(D[6:1])
(C[6:1])
(CX)
(B[6:1])
(A[6:1])
(AX)(BX)
(CLK)
SELF7
SELF7SELF8
CLK
6
SLICE
LUT
LUT
LUT
LUT
A[6:1]
O6
6 A[6:1]
O6
RegisteredOutput
16:1 MUXOutput
(Optional)
D Q
(BMUX)
(B)
6 A[6:1]
O6
6 A[6:1]
O6
F7BMUX
F8MUX
F7AMUX
SEL D [1:0], DATA D [3:0]Input
SEL C [1:0], DATA C [3:0]Input
SEL B [1:0], DATA B [3:0]Input
SEL A [1:0], DATA A [3:0]Input
Configuring the “n” of an n-LUT ...
18
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Extra muxes to chose LUT option ...
Virtex-5 FPGA User Guide www.xilinx.com 199UG190 (v4.2) May 9, 2008
CLB / Slice Timing ModelsR
General Slice Timing Model and ParametersA simplified Virtex-5 slice is shown in Figure 5-25. Some elements of the Virtex-5 slice are omitted for clarity. Only the elements relevant to the timing paths described in this section are shown.
Figure 5-25: Simplified Virtex-5 Slice
UG190_5_25_050506
LUT
O6
O5
6 D
FE/LAT
DCECLK
SR REV
Q
F7BMUX
F8MUX
DMUX
DQ
D Inputs
LUT
O6
O5
6 C
FE/LAT
DCECLK
SR REV
Q CQ
CMUX
C Inputs
DX
CX
LUT
O6
O5
6 B
FE/LAT
DCECLK
SR REV
Q BQ
BMUX
B Inputs
BX
FE/LAT
DCECLK
SR REV
Q AQ
F7AMUXLUT
O6
O5
6 A
AMUX
A Inputs
AX
CE
CLK
SRREV(DX)
From eight 5-LUTs ... to one 8-LUT.
Combinational or registered outs.
Flip-flops unused by LUTs can be used
standalone.
Flip-flops ...
19
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Slice flip-flop properties ...
Virtex-5 FPGA User Guide www.xilinx.com 177UG190 (v4.2) May 9, 2008
CLB OverviewR
Table 5-3: Truth Table when SRLOW is Used (Default Condition)
SR REV Function
0 0 No Logic Change
0 1 1
1 0 0
1 1 0
Table 5-4: Truth Table when SRHIGH is Used
SR REV Function
0 0 No Logic Change
0 1 0
1 0 1
1 1 0
Figure 5-5: Register/Latch Configuration in a SliceUG190_5_05_071207
DX
CX
BX
CE
AX
DQ
CQ
BQ
AQ
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
DFFLUT D Output
LUT C Output
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
Q
CECK
Q
Q
Q
SR
LUT B Output
LUT A Output AFF
BFF
CFF
CLK
Reset Type
Sync
Async
Each state element may be edge-triggered or latching.
Each state element may respond
differently to set/reset signal.
Clock enable, clock polarity, and set/reset
lines in a slice are shared.
Next: The vertical dimension ...20
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Reminder: arithmetic addition ...
One-bit full adder
[Co,S] = A + B + Ci
A, B, Ci: 1-bit number inputs.[Co,S]: 2-bit number output.“+”: Arithmetic addition.
Simplest multi-bit adder
“Ripple-carry adder”
21
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Virtex-5 vertical logic ...
Virtex-5 FPGA User Guide www.xilinx.com 197UG190 (v4.2) May 9, 2008
CLB OverviewR
carry multiplexer (MUXCY) can also be used to cascade function generators for implementing wide logic functions.
Figure 5-24 illustrates the carry chain with associated logic elements in a slice.
The carry chains carry lookahead logic along with the function generators. There are ten independent inputs (S inputs – S0 to S3, DI inputs – DI1 to DI4, CYINIT and CIN) and eight independent outputs (O outputs – O0 to O3, and CO outputs – CO0 to CO3).
The S inputs are used for the “propagate” signals of the carry lookahead logic. The “propagate” signals are sourced from the O6 output of a function generator. The DI inputs are used for the “generate” signals of the carry lookahead logic. The “generate” signals are sourced from either the O5 output of a function generator or the BYPASS input (AX, BX, CX, or DX) of a slice. The former input is used to create a multiplier, while the latter is used
Figure 5-24: Fast Carry Logic Path and Associated Elements
UG190_5_24_050506
O6 From LUTD
DMUX/DQ*
DMUX
DQO5 From LUTD
DX
S3MUXCY
DI3
CO3
O3
COUT (To Next Slice)
Carry Chain Block(CARRY4)
(Optional)
D Q
O6 From LUTC
CMUX/CQ*
CMUX
CQO5 From LUTC
CX
S2MUXCY
DI2
CO2
CO1
CO0
O2
(Optional)
D Q
O6 From LUTB
BMUX/BQ*
BMUX
BQO5 From LUTB
BX
S1MUXCY
DI1
O1
(Optional)
D Q
O6 From LUTA
AMUX/AQ*
AMUX
AQO5 From LUTA
AX
S0MUXCY
DI0
CIN
CIN (From Previous Slice)
* Can be used ifunregistered/registeredoutputs are free.
CYINIT
10
O0
(Optional)
D Q
We can map ripple-carry addition
onto carry-chain block.
The carry-chain block also supports the faster “carry-save” chain logic.
22
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Putting it all together ... a SLICEL.
174 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
Each CLB can contain zero or one SLICEM. Every other CLB column contains a SLICEMs. In addition, the two CLB columns to the left of the DSP48E columns both contain a SLICEL and a SLICEM.
Figure 5-4: Diagram of SLICEL
A6LUTROM
COUT
D
DX
C
CX
B
BX
A
AX
O6O5
UG190_5_04_032606
A5A4A3A2A1
D6
DMUX
D
DQ
C
CQ
CMUX
B
BQ
BMUX
A
AQ
AMUX
DX
D5D4D3D2D1
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
Q
CECK
CIN
0/1
A6LUTROM
O6O5
A5A4A3A2A1
C6
CX
C5C4C3C2C1
A6LUTROM
O6O5
A5A4A3A2A1
B6
BX
B5B4B3B2B1
A6LUTROM
O6O5
A5A4A3A2A1
A6
AXSRCE
CLK
A5A4A3A2A1
Q
Q
Q
Reset Type
Sync
Async The previous slides explain all
SLICEL features.
About 50% of the 17,280 slices in an LX110T are
SLICELs.
The other slices are SLICEMs, and have extra
features.23
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Recall: 5-LUT architecture ...ExpressFabric Technology
WP245 (v1.1.1) July 7, 2006 www.xilinx.com 5
R
The 6-input LUT leads to several benefits:• As it implements wider functions directly in the LUT, the number of logic levels
between registers is reduced, leading to higher performance.• It implements significantly more logic than a LUT with four inputs.• Power consumption is reduced because the larger LUT reduces the amount of
required interconnect (routing resources).The Virtex-5 family SLICEM LUTs also provide additional benefits:• New aspect ratios for distributed RAM. Every LUT can be configured as a 64 x 1 or
32 x 2 distributed RAM. Benefits for the designer are a much denser and faster implementation of distributed RAM with increased flexibility.
• Longer SRL chains. A single LUT supports a 32-bit SRL. A slice can thus implement a shift register of up to 128 bits, providing significant area savings and reduced routing resources in comparison to previous architectures. Shift registers are features available only in Xilinx devices. The Xilinx ISE™ software packer automatically packs two 16-bit SRLs with common addressing but different data. In other words, if the application needs a 16-bit deep, 8-bit-wide shift register, it can be implemented in a single slice.
Routing and Interconnect ArchitectureWith process technology advancements, interconnect timing delays can account for more than 50% of the critical path delay. A new diagonally symmetric interconnect pattern, developed for the Virtex-5 family, enhances performance by reaching more places in fewer hops. The new pattern allows for more logic connections to be made within two or three hops. Moreover, the more regular routing pattern makes it easier for the Xilinx ISE software to find the most optimal routes. All of the interconnect features are transparent to FPGA designers, but translate to higher overall
Figure 3: Block Diagram of a Virtex-5 6-Input LUT
WP245_03_051006
LUT5
A1
A2
A3
A4 D
A5
A6
A2
A3
A4 D
D6
D5
A5
A6
A2
A3
A4
A5
A6
LUT6
LUT5A[6:2] D000000000100010
....
101
111011111011111
001
Q
Q
Q
Q
Q
Q
(1)
(1)
(1)
(0)
(0)
(0)
....D
A[6:2]
32 Flip-flops. Configured to 1 or 0.
Some parts of a logic design need many flip-flops.
SLICEMs replace normal 5-LUTs with circuits that can act like 5-LUTs, but can alternatively use the 32 Flip-Flops as RAM, ROM, shift registers.
24
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
A SLICEM 6-LUT ...
Normal 6-LUT inputs.
Normal 5/6-LUT outputs.
Memory write
address
Memory data input
Memory data input.
Control output for chaining LUTs to
make larger memories.
Virtex-5 FPGA User Guide www.xilinx.com 173UG190 (v4.2) May 9, 2008
CLB OverviewR
Figure 5-3: Diagram of SLICEM
A6DI2
COUT
D
DX
C
CX
B
BX
A
AX
O6
DI1MC31
O5
UG190_5_03_041006
A5A4A3A2A1
D6
DIDMUX
D
DQ
C
CQ
CMUX
B
BQ
BMUX
A
AQ
AMUX
Reset Type
DX
D5D4D3D2D1
WA1-WA6WA7WA8
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
Q
CECK
CLKWSGEN
CIN
0/1
WE
Sync
Async
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
C6
CI
CX
C5C4C3C2C1
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
B6
BI
BX
B5B4B3B2B1
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
A6
AI
AXSRCE
CLK
WE
A5A4A3A2A1
Q
Q
Q
WA1-WA6WA7WA8
WA1-WA6WA7WA8
WA1-WA6WA7WA8
A 1.1 Mb distributed RAM can be made if all SLICEMs of an LX110T are used as RAM.
25
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Many RAM configurations possible ...
178 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
SRHIGH and SRLOW can be set individually for each storage element in a slice. The choice of synchronous (SYNC) or asynchronous (ASYNC) set/reset (SRTYPE) cannot be set individually for each storage element in a slice.
The initial state after configuration or global initial state is defined by separate INIT0 and INIT1 attributes. By default, setting the SRLOW attribute sets INIT0, and setting the SRHIGH attribute sets INIT1. Virtex-5 devices can set INIT0 and INIT1 independent of SRHIGH and SRLOW.
The configuration options for the set and reset functionality of a register or a latch are as follows:
! No set or reset
! Synchronous set
! Synchronous reset
! Synchronous set and reset
! Asynchronous set (preset)
! Asynchronous reset (clear)
! Asynchronous set and reset (preset and clear)
Distributed RAM and Memory (Available in SLICEM only)
Multiple LUTs in a SLICEM can be combined in various ways to store larger amount of data.
The function generators (LUTs) in SLICEMs can be implemented as a synchronous RAM resource called a distributed RAM element. RAM elements are configurable within a SLICEM to implement the following:
! Single-Port 32 x 1-bit RAM
! Dual-Port 32 x 1-bit RAM
! Quad-Port 32 x 2-bit RAM
! Simple Dual-Port 32 x 6-bit RAM
! Single-Port 64 x 1-bit RAM
! Dual-Port 64 x 1-bit RAM
! Quad-Port 64 x 1-bit RAM
! Simple Dual-Port 64 x 3-bit RAM
! Single-Port 128 x 1-bit RAM
! Dual-Port 128 x 1-bit RAM
! Single-Port 256 x 1-bit RAM
Distributed RAM modules are synchronous (write) resources. A synchronous read can be implemented with a storage element or a flip-flop in the same slice. By placing this flip-flop, the distributed RAM performance is improved by decreasing the delay into the clock-to-out value of the flip-flop. However, an additional clock latency is added. The distributed elements share the same clock input. For a write operation, the Write Enable (WE) input, driven by either the CE or WE pin of a SLICEM, must be set High.
Virtex-5 FPGA User Guide www.xilinx.com 187UG190 (v4.2) May 9, 2008
CLB OverviewR
Distributed RAM configurations greater than the provided examples require more than one SLICEM. There are no direct connections between slices to form larger distributed RAM configurations within a CLB or between slices.
Figure 5-14: Distributed RAM (RAM256X1S)
UG190_5_14_050506
DI1D
A[7:0]
WCLKWE
(CLK)(WE/CE)
68
SPRAM64
RAM256X1S
A[6:1]WA[8:1]CLKWE
O6
DI1
68
SPRAM64
A[6:1]WA[8:1]CLKWE
O6F7BMUX
F8MUXRegisteredOutput
Output
(Optional)
D Q
O
DI1
68
SPRAM64
A[6:1]WA[8:1]CLKWE
O6
DI1
68
SPRAM64
A[6:1]WA[8:1]CLKWE
O6F7AMUX
A6 (CX)
A6 (AX)
A7 (BX)
Example configuration: Single-port 256b x 1,
registered output.
A complete list:
A 128 x 32b LUT RAM has a 1.1ns access time.
26
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
SLICEM shift register (one of many).
Virtex-5 FPGA User Guide www.xilinx.com 189UG190 (v4.2) May 9, 2008
CLB OverviewR
or flip-flop is available to implement a synchronous read. In this case, the clock-to-out of the flip-flop determines the overall delay and improves performance. However, one additional cycle of clock latency is added. Any of the 32 bits can be read out asynchronously (at the O6 LUT outputs) by varying the 5-bit address. This capability is useful in creating smaller shift registers (less than 32 bits). For example, when building a 13-bit shift register, simply set the address to the 13th bit. Figure 5-15 is a logic block diagram of a 32-bit shift register.
Figure 5-16 illustrates an example shift register configuration occupying one function generator.
Figure 5-15: 32-bit Shift Register Configuration
Figure 5-16: Representation of a Shift Register
ug190_5_15_050506
Output (Q)
RegisteredOutput
(Optional)
(AQ)
DI1
D Q
(AX)
SHIFTIN (MC31 of Previous LUT)
SHIFTIN (D)
A[4:0]
CLKCE
(A[6:2])
(CLK)(WE/CE)
SRL32
SRLC32E
A[6:2]
CLKCE
O6
MC31 SHIFTOUT (Q31)5
UG190_5_16_050506
SHIFTIN (D)
SHIFTOUT(Q31)WE
CLK
Address (A[4:0])
32-bit Shift Register
MUX
Q
5
See Virtex-5 User Guide for an complete list of shift-register types.
27
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
SLICEL vs SLICEM ...
Virtex-5 FPGA User Guide www.xilinx.com 173UG190 (v4.2) May 9, 2008
CLB OverviewR
Figure 5-3: Diagram of SLICEM
A6DI2
COUT
D
DX
C
CX
B
BX
A
AX
O6
DI1MC31
O5
UG190_5_03_041006
A5A4A3A2A1
D6
DIDMUX
D
DQ
C
CQ
CMUX
B
BQ
BMUX
A
AQ
AMUX
Reset Type
DX
D5D4D3D2D1
WA1-WA6WA7WA8
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
DPRAM64/32SPRAM64/32SRL32SRL16LUTRAMROM
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
Q
CECK
CLKWSGEN
CIN
0/1
WE
Sync
Async
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
C6
CI
CX
C5C4C3C2C1
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
B6
BI
BX
B5B4B3B2B1
A6DI2
O6
DI1
MC31
O5
A5A4A3A2A1
A6
AI
AXSRCE
CLK
WE
A5A4A3A2A1
Q
Q
Q
WA1-WA6WA7WA8
WA1-WA6WA7WA8
WA1-WA6WA7WA8
174 www.xilinx.com Virtex-5 FPGA User GuideUG190 (v4.2) May 9, 2008
Chapter 5: Configurable Logic Blocks (CLBs)R
Each CLB can contain zero or one SLICEM. Every other CLB column contains a SLICEMs. In addition, the two CLB columns to the left of the DSP48E columns both contain a SLICEL and a SLICEM.
Figure 5-4: Diagram of SLICEL
A6LUTROM
COUT
D
DX
C
CX
B
BX
A
AX
O6O5
UG190_5_04_032606
A5A4A3A2A1
D6
DMUX
D
DQ
C
CQ
CMUX
B
BQ
BMUX
A
AQ
AMUX
DX
D5D4D3D2D1
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
CECK
D
FFLATCHINIT1INIT0SRHIGHSRLOW
SR REV
Q
CECK
CIN
0/1
A6LUTROM
O6O5
A5A4A3A2A1
C6
CX
C5C4C3C2C1
A6LUTROM
O6O5
A5A4A3A2A1
B6
BX
B5B4B3B2B1
A6LUTROM
O6O5
A5A4A3A2A1
A6
AXSRCE
CLK
A5A4A3A2A1
Q
Q
Q
Reset Type
Sync
Async
SLICEMSLICEL
SLICEM adds memory features to LUTs, + muxes.
28
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
!"#$%&'()*$$+,,-$$$).'/0$1
!"#$%&'&()*+#',$#-$./0123.445+6)*+#'7/4&6+-+6$0#895)($12#6: .(6;+*&6*9(&<
!"#$%&'()*!"#$%&'( !"#$%&')
To be continued ...
Throughout the semester, we will look at different Virtex-5 features in-depth.
Switch fabricBlock RAMDSP (ALUs)ClockingI/OSerial I/O + PCI
29
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture!"#$%&'()*$$+,,-$$$).'/0$+1
!""#$%&'$()*+,&-"#./*
).0*1$%2(3#&4. 567
2 !"##"$%&'"%()%*"+%,-.'%/ 0123%,-.4'%5*%65$#"7849%:!%4;<
/ 04<<%,-.2'%%5*%65$#"78=9%:!%=;<
2 )$(>%?%'#@A"%8B%=%'#@A"%C5C"D5*"
2 *"+%C$(E"''($F%)$(>%<;31%G:5C'H:IJ%#(%0;04%G:5C'H:IJ
2 0K<:IJ%8B%1<0%:IJ
2 022%8B%1?<%GL$M'#(*"%:5C'
-'"%*"+%%2%,-.9%1%'#@A"%N""C"$%C5C"9%%0<O%>($"%:IJ9%?3O%P"##"$%C"$)($>@*E"-'"%*"+%%2%,-.9%1%'#@A"%N""C"$%C5C"9%%0<O%>($"%:IJ9%?3O%P"##"$%C"$)($>@*E"
!"#$!"#$
!%&$!%&$
!'()*+,)-.'/(-01
2+(3-')1*45,1
!'()*+,)-.'/(-01
2+(3-')1*45,1
65)5/(-01
2+(3-')1*45,1
65)5/(-01
2+(3-')1*45,1
6"#$6"#$
6%&$6%&$
$+(
!7
$+($+(
!7!7
#*.8*59
:.+')1*
#*.8*59#*.8*59
:.+')1*:.+')1*
!'()*+,)-.'
$+441*
!'()*+,)-.'
$+441*
!'()*+,)-.'
61,.01
!'()*+,)-.'!'()*+,)-.'
61,.0161,.01
;18-()1*37-<1
=>?=>2
;18-()1*37-<1
=>?=>2
$+(
!7
$+($+(
!7!7
@00AB+2
BC-4)A%.8-,5<BC-4)A%.8-,5<BC-4)A%.8-,5<
&+<)-D<E&+<)-D<E
The payoff for “Design to the part” ...
30
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Class Organization
31
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
CS 194-6: Pure project course.
Project teams: 1-to-3 person groups. This Friday: setting project topics. 10-noon, 125 Cory (not 119 Cory!).
No exams.
Mondays: Lectures teach topics useful for the project
No homework.
Fridays: Project
check-ups. 32
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
CS 150 is a “hard” prerequisite for CS194
Exceptions considered only for graduate students ...
33
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
CS 194-6: Project red-letter days.
Grade is based on revised versions of presentation slides (email PDFs to Greg and John after presentation).
Specification Design ReviewMon, Sept 29
Implementation Design Review
Mon, Nov 3
Final Presentation
Fri, Dec 5
34
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Text, Printing, Honor Code ...Recommended Text: “Computer Organization and Design”, David Patterson and John Hennessy.
We expect you to obey the EECS Policy on Academic Dishonesty. See “Course Info” on website for info.
Printing: The first 200+ pages are free (count includes cover sheets). Then,$12 per 200 pages. Plan ahead ...
Class web site coming soon ...
Account forms coming soon ...
35
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Lab access, remote work, lab rulesFor card key access to 119/125 Cory:apply at 253 Cory.
Also work remotely -- see Remote Desktop info on Resources page.
Eat/drink at round tables only, log equipment failures in log book (and email TAs), don’t share account, logoff after each session. Be crime/safety aware.
Let us know if this doesn’t work yet ...
36
UC Regents Fall 2008 © UCBCS 194-6 L1: Virtex-5 Microarchitecture
Next Monday’s Lecture:
Single-cycle CPU design
37
top related