vlsi programming
TRANSCRIPT
-
7/30/2019 VLSI Programming
1/61
VLSI Programming: Lecture 1
Course 2IN35
Course: Kees van Berkel [email protected]
Rudolf Mak [email protected]
Lab: Kees van Berkel, Rudolf Mak
2/7/20121
Wei Tong, Hrishikesh Salunkhe
www: http://www.win.tue.nl/~cberkel/2IN35/
Lecture 1 Introduction
-
7/30/2019 VLSI Programming
2/61
Introduction to VLSI Programming: goals
to acquire insight in the description, design, andoptimization of fine-grained parallel computations;
to acquire insight in the (future) capabilities of VLSIas an implementation medium of parallel computations;
2/7/20122
to acquire skills in the design of parallel computations andin their implementation on FPGAs.
-
7/30/2019 VLSI Programming
3/61
Contents
Massive parallelism is needed to exploit the huge and stillincreasing computational capabilities of Very Large ScaleIntegrated (VLSI) circuits:
we focus on fine-grained parallelism(not on networks of computers);
2/7/20123
we assume that parallelism is by design(not by compilation);
we draw inspiration from consumer applications, such asdigital TV, 3D TV, image processing, mobile phones, etc.;
we will use Field Programmable Arrays (FPGA) as fine-grained abstraction of VLSI for practical implementation.
-
7/30/2019 VLSI Programming
4/61
FPGA IC on the Xilinx XUP Board
2/7/20124
XC2VP30
FPGA
-
7/30/2019 VLSI Programming
5/61
Lab work prerequisites
Notebook
Exceed (can be obtained through the software distributionof the university)
Access to UNIX server Dept. W&I
2/7/20125
,
Lab work is by teams of two students.
Have FPGA tools (SW) installed on your machine by Feb 28
-
7/30/2019 VLSI Programming
6/61
VLSI Programming: time table 2012
date class | lab subject (provisional)
Feb 7 2 | 0 hours introduction, DSP representations, bounds
Feb 14 2 | 0 hours pipelining, retiming, transposition, J-slow, unfolding
Feb 21 no lecture (carnaval) (have FPGA tools installed)
Feb 28 2 | 0 hours unfolding (cntd), look-ahead, strength reduction
Mar 6 1 | 3 hours Introductions FPGA, Verilog, Lab (A1)
2/7/20126
Mar 20 1 | 3 hours folding; FPGAs: pipelining, retiming (A2)
Mar 27 1 | 3 hours DSP processors; FPGAs: parallelism, strength reduc. (A3)
Apr 3 1 | 3 hours FPGAs: sample-rate conversion (A4)
Apr10,17 2x no lecture (examinations)
Apr 24 1 | 3 hours FPGAs: video scaler (A5); hand-in report A3
May 1 1 | 3 hours FPGAs: video scaler (A5 cntd)
May 22 deadline final report A4, A5
-
7/30/2019 VLSI Programming
7/61
Course grading (provisional)
Your course grade is based on:
the quality of your programs/designs [30%];
your final report on the design and evaluationof these programs (guidelines will follow) [30%];
2/7/20127
programs, the report and the lecture notes [20%];
intermediate assignments [20%].
Credits: 5 points = based on 140 hours on your side
-
7/30/2019 VLSI Programming
8/61
Note on course literature
First half of VLSI programming course is based on:
Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design andImplementation. Wiley Inter-Science 1999.
This book is recommended.
Accompanying slides can be found on:
2/7/20128
http://www.ece.umn.edu/users/parhi/slides.html http://www.win.tue.nl/~cberkel/2IN35/
Mandatory reading:
Keshab K. Parhi. High-Level Algorithm and ArchitectureTransformations for DSP Synthesis. Journal of VLSI SignalProcessing, 9, 121-143 (1995), Kluwer Academic Publishers.
-
7/30/2019 VLSI Programming
9/61
Introduction
Some inspiration from the technology sideVLSIFPGAs
Some inspiration from the application sideMachine Intellli ence
2/7/20129
Bee, SKA, SETIDigital Signal Processing (Software Defined Radio)
Parhi, Chapters 1, 2DSP Representation MethodsIteration bounds
-
7/30/2019 VLSI Programming
10/61
Some inspiration
2/7/201210
rom t e tec no ogy s e
-
7/30/2019 VLSI Programming
11/61
Vertical cut through VLSI circuit
2/7/201211
-
7/30/2019 VLSI Programming
12/61
Intel 4004 processor [1970]
1970
2/7/201212
4-bit
2300transis
tors
-
7/30/2019 VLSI Programming
13/61
Intel Itanium 2 6M Processor [2004]
2/7/201213
-
7/30/2019 VLSI Programming
14/61
Intel Itanium 2 6M Processor [2004]
Power dissipation 130 W (107 W typical) 3 Levels of Cache: 16k + 16k, 256K, 6M
CMOS technology: 130nm Clock frequency1.5 GHz 7,877 pins, 95 percent of the are for power
2/7/201214
1,322 Specint base2000 for a single processor! Next generation: 1.7GHz and 9MByte cache!
Rusu et. al [Intel], Itanium 2 Processor 6M: Higher Frequency andLarger L3 Cache, IEEE Micro April 2004, vol. 24, Issue 2, pp. 10-18.
-
7/30/2019 VLSI Programming
15/61
Moores Law
2/7/201215
-
7/30/2019 VLSI Programming
16/61
Rule of two [Hu, 1993]
Every 2 generations of IC technology (6 years)
device feature size 0.5 x chip size 2 x clock frequency 2 x (no longer true)
2/7/201216
number of i/o pins 2 x DRAM capacity 16 x logic-gate density 4 x
-
7/30/2019 VLSI Programming
17/61
ITRS: INTERNATIONAL TECHNOLOGY
ROADMAP FOR SEMICONDUCTORSThe overall objective of the ITRS is to present industry-wide
consensus on the best current estimate of the industrysresearch and development needs out to a 15-year horizon.
As such, it provides a guide to the efforts of companies,universities, governments, and other research providers/funders.
The ITRS has improved the quality of R&D investment decisionsmade at all levels and has helped channel research efforts toareas that most need research breakthroughs.
Involves over 1000 technical experts, world wide.
a self-fulfilling prophecy? or wishful thinking?
2/7/2012ST-Ericsson confidential17
-
7/30/2019 VLSI Programming
18/61
2009 ITRSMPU/high-performance ASIC
Half Pitch and Gate Length Trends
2/7/2012ST-Ericsson confidential18
-
7/30/2019 VLSI Programming
19/61
2009 ITRS Functions/chip and chip size
2/7/201219
-
7/30/2019 VLSI Programming
20/61
Transistor counts [M]/ Micro Processor Unit
2/7/2012ST-Ericsson confidential20
-
7/30/2019 VLSI Programming
21/61
FPGA, the last 10 years
Price: 300x100
1000
Capacity
Speed
Virtex-II Pro
Memory
Bandwidth
2/7/201221
1/91 1/92 1/93 1/94 1/95 1/96 1/97 1/98 1/99 1/00 1/01 1/02 1/03 1/04
Year
Speed: 20x
Capacity: 200x
Source: Xilinx
1
10
Price
Virtex-II Pro
10x
1x
Spartan-3
-
7/30/2019 VLSI Programming
22/61
Virtex 4 FPGA: 4VSX55
500MHz clock
Flexible Logic
6,144 CLBs
2/7/201222
multi-port RAM320 18 kbit
Programmable
512 DSP slides
450MHz
PowerPC1Gbps
Differential I/O
0.6-11.1Gbps
Serial Trx
-
7/30/2019 VLSI Programming
23/61
Some inspiration
2/7/201223
rom t e app cat on s e
-
7/30/2019 VLSI Programming
24/61
All things grand and small [Moravec 98]
2/7/201224
-
7/30/2019 VLSI Programming
25/61
Chess Machine Performance [Moravec 98]
2/7/201225
-
7/30/2019 VLSI Programming
26/61
brain power equivalentper $1000 of computer
Evolution computer power/cost [Moravec 98]
2/7/201226
-
7/30/2019 VLSI Programming
27/61
The Square Kilometer Array (SKA)
... the ultimate exploration tool
... and the ultimate
software defined radio
MPSoC -- 2010, June3027
-
7/30/2019 VLSI Programming
28/61
The Square Kilometer Array (SKA)
antenna surface: 1 km2 (sensitivity 50)large physical extent (3000+ km)wide frequency range: 70 MHz 30 GHzfull design by 2010; phase 1: 2017; phase 2: 2022
1000- 1500 dishes (15m) in the central 5 km (2000-3000 total)
MPSoC -- 2010, June3028
+ dense and/or sparse aperture arraysconnected to a massive data processor by an optical fibre network
Software Defined RadioSoftware Defined RadioSoftware Defined RadioSoftware Defined Radio Astronomy
computational load (on-line) 1 exa MAC1 exa MAC1 exa MAC1 exa MAC (1018 MAC/s)power budget = 30 MW30 MW30 MW30 MW 30 pJ/MAC 30 pJ/MAC 30 pJ/MAC 30 pJ/MAC allallallall----inininin
-
7/30/2019 VLSI Programming
29/61
References
Chip fotos: http://www-vlsi.stanford.edu/group/chips.html
ITRS Roadmap http://www.itrs.net/Links/2005ITRS/ExecSum2005.pdf
2/7/201229
en w compu er ar ware ma c e uman ra n http://www.jetpress.org/volume1/moravec.htm
BEE & Square Kilometer Array
http://bwrc.eecs.berkeley.edu/Research/BEE/ http://seti.berkeley.edu/casper/papers/BEE2_ska2004_poster.pdf http://www.skatelescope.org/
-
7/30/2019 VLSI Programming
30/61
VLSI Digital Signal ProcessingSystems
2/7/201230
Parhi, Chapters 1&2
-
7/30/2019 VLSI Programming
31/61
DSP applications classes
10G
1G
100M10M
1M video
HDTVradio
radar
erate
2/7/201231
100k
10k
1k
100
10
1
speechaudio modems
controlseismic
modeling
complexity
Samp
# instructions/sample
-
7/30/2019 VLSI Programming
32/61
Typical DSP algorithms
speech (de-)coding
speech recognitionspeech synthesisspeaker identification
sound synthesisecho cancellationmodem: (de-)modulationvision
2/7/201232
Hi-fi audio en/decodingnoise cancellationaudio equalization
ambient acousticemulation.
image (de-)compressionimage compositionbeam cancellation
spectral estimationetc.
-
7/30/2019 VLSI Programming
33/61
Typical DSP algorithms: FIR Filters
Filters reduce signal noise and enhance image or signal qualityby removing unwanted frequencies.
Finite Impulse Response (FIR) filters compute y(n):
)(*)()()()(1
nxnhkixkhiyN
==
2/7/201233
wherex is the input sequencey is the output sequence
h is the impulse response (filter coefficients)N is the number of taps (coefficients) in the filter
Output sequence depends only on input sequence and impulseresponse.
=
-
7/30/2019 VLSI Programming
34/61
Typical DSP algorithms: IIR Filters
Infinite Impulse Response (IIR) filters compute:
=
=
+=
1
0
1
1)()()()()(
N
k
M
kkixkbkiykaiy
2/7/201234
Output sequence depends on input sequence, impulseresponse,as well as previous outputs
Adaptive filters (FIR and IIR) update their coefficients tominimize the distance between the filter output and thedesired signal.
-
7/30/2019 VLSI Programming
35/61
Typical DSP Algorithms: DFT and FFT
The Discrete Fourier Transform (DFT) supports frequencydomain (spectral) analysis:
for k = 0, 1, , N-1, where
x is the input sequence in the time domain (real or complex)
1)()(
21
0
===
=
jeWnxWky Nj
N
N
n
nk
N
2/7/201235
y is an output sequence in the frequency domain (complex)
The Inverse Discrete Fourier Transform (IDFT) is computed as
The Fast Fourier Transform (FFT) and its inverse (IFFT) providean efficient method for computing the DFT and IDFT.
1-n,...1,0,nfor,)()(1
0
==
=
N
k
nk
N kyWnx
-
7/30/2019 VLSI Programming
36/61
Typical DSP Algorithms: DCT
The Discrete Cosine Transform (DCT) and its inverse IDCT arefrequently used in video (de-) compression (e.g., MPEG-2):
1-N...1,0,kfor,)(]2
)12(cos[)()(1
0
=+
=
=
N
n
nxN
knkeky
2/7/201236
where e(k) = 1/sqrt(2)ifk = 0; otherwise e(k) = 1.
A N-Point, 1D-DCT requires N2MAC operations.
1-N...1,0,kfor,)(]2cos[)()( 0=
+
=
=knyN
n
keNnx
-
7/30/2019 VLSI Programming
37/61
Typical DSP Algorithms: distance calcul.
Distance calculations are typically used in patternrecognition, motion estimation, and coding.
Problem: chose the vector rkwhose distance (see below) fromthe input vector xis minimum.
2/7/201237
|)()(|11
0
=
=
N
ik irix
Nd
=
=
1
0
2)]()([1N
ik irix
Nd
Mean Absolute Difference (MAD) Mean Square Error (MSE)
-
7/30/2019 VLSI Programming
38/61
Typical DSP Algorithms:Matrix Computs
Matrix computations are typically used to estimate parametersin DSP systems.
Matrix vector multiplication Matrix-matrix multiplication
2/7/201238
Matrix inversion Matrix triangulization
Matrices often have (band) structures or may be sparse.
-
7/30/2019 VLSI Programming
39/61
Computation Rates
To estimate the hardware resources required, we can usethe equation:
where
SSCNRR =
2/7/201239
c
Rs is the sampling rateNs is the (average) number of operations per sample
For example, a 1-D FIR has NS = 2Nand a 2-D FIR has NS = 2N
2.
C i l R f FIR Fil i
-
7/30/2019 VLSI Programming
40/61
Computational Rates for FIR Filtering
Signal type Frequency # taps Performance
Speech 8 kHz N =128 20 MOPs
Music 48 kHz N =256 240 MOPs
Video phone 6.75 MHz N*N = 81 1,090 MOPs
2/7/201240
TV 27 MHz N*N = 81 4,370 MOPs
HDTV 144 MHz N*N = 81 23,300 MOPs
-
7/30/2019 VLSI Programming
41/61
2/7/201241
-
7/30/2019 VLSI Programming
42/61
2/7/201242
-
7/30/2019 VLSI Programming
43/61
2/7/201243
-
7/30/2019 VLSI Programming
44/61
2/7/201244
-
7/30/2019 VLSI Programming
45/61
2/7/201245
z-k = k units delay
edge labeled n= multiplication by n
-
7/30/2019 VLSI Programming
46/61
Graphical representations: typical usage
block diagram
data flow graph
signal flow graph
2/7/201246
general
signal processing
LTI systems
Linear Systems
-
7/30/2019 VLSI Programming
47/61
Linear Systems
input x, output y:
discrete system:
x(n) y(n)
linear system:
results in
2/7/201247
x
1(n) + x
2(n) y
1(n) + y
2(n)
c1 x1(n) + c2 x2(n) c1 y1(n) + c2 y2(n)
for arbitrary c1 and c2
Most of our examples will be linear systems
results in
results in
Linear Time-Invariant Systems
-
7/30/2019 VLSI Programming
48/61
Linear Time Invariant Systems
input x, output y:
x(n+k) = x(n) shifted by integer k sample periods
time-invariant system
x(n) =x(n+k) y(n) = y(n+k)results in
2/7/201248
Most of our examples will be linear time-invariant systems, orLTI systems
Commutativity of LTI systems
-
7/30/2019 VLSI Programming
49/61
Commutativity of LTI systems
LTISystem A
LTISystem Bx(n) y(n)
f(n)
2/7/201249
LTI
System B
LTI
System A
x(n) y(n)g(n)
is equivalent to
Iteration of a Synchronous Flow Graph
-
7/30/2019 VLSI Programming
50/61
Iteration of a Synchronous Flow Graph
Each actor fires the minimum number of times to return thegraph to a particular state
Example of amulti-rate DFG:A
1
B
2
C
32 2 1
# firings for 1 iteration
2/7/201250
2 2 3
# tokens per edge for 1 iteration
A A B B C C
2 4 6 3
Iteration period
-
7/30/2019 VLSI Programming
51/61
Iteration period
Iteration period =the time required for the execution of one iteration of the SFG
Example:
a
2/7/201251
Let
Tm= 10 = multiplication timeTa= 4 = addition time
Iteration period = Tm+Ta= 14
= minimum sample period Ts; that is: Ts Tm+Ta
+ D y n-x n
-
7/30/2019 VLSI Programming
52/61
2/7/201252
-
7/30/2019 VLSI Programming
53/61
2/7/201253
-
7/30/2019 VLSI Programming
54/61
2/7/201254
-
7/30/2019 VLSI Programming
55/61
2/7/201255
4 types of delay paths
-
7/30/2019 VLSI Programming
56/61
1
2
3 outputsinputs
combinationalfunctions
4 types of delay paths
from to
2/7/201256
delay elements
= state
4
Finite state machine (FSM) representation of a DSP system
1 inputs state
2 state outputs
3 inputs outputs
4 state state
-
7/30/2019 VLSI Programming
57/61
2/7/201257
DSP references
-
7/30/2019 VLSI Programming
58/61
Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design andImplementation. Wiley Inter-Science 1999.
Richard G. Lyons. Understanding Digital Signal Processing(2ndedition). Prentice Hall 2004.
2/7/201258
John G. Proakis and Dimitris K Manolakis. Digital Signal Processing(4th edition), Prentice Hall, 2006.
Simon Haykin. Neural Networks, a Comprehensive Foundation(2ndedition). Prentice Hall 1999.
Computer Architecture and DSP references
-
7/30/2019 VLSI Programming
59/61
Hennessy and Patterson, Computer Architecture, aQuantitative Approach. 3rd edition. Morgan Kaufmann, 2002.
Phil Lapsley, Jeff Bier, Amit Sholam, Edward Lee. DSPProcessor Fundamentals, Berkeley Design Technology, Inc,1994-199
2/7/201259
Jennifer Eyre, Jeff Bier, The Evolution of DSP Processors, IEEESignal Processing Magazine, 2000.
Kees van Berkel et al. Vector Processing as an Enabler forSoftware-Defined Radio in Handheld Devices, EURASIP Journalon Applied Signal Processing 2005:16, 2613-2625.
VLSI Programming: next lecture
-
7/30/2019 VLSI Programming
60/61
Parhi, Chapters 2, 3
Representations of DSP algorithms Data flow graphs Loop bounds and iteration bounds
2/7/201260
Pipelining of digital filters Parallel processing Retiming techniques
-
7/30/2019 VLSI Programming
61/61
THANK YOU