neta peled & hillel mendelson supervisor: mike sumszyk final presentation of part b annual...
TRANSCRIPT
![Page 1: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/1.jpg)
Neta Peled & Hillel MendelsonSupervisor: Mike Sumszyk
Real Time Video FilteringFinal Presentation of part B
Annual project
![Page 2: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/2.jpg)
The algorithm
Part A overview
Part B challenges
Blocks implementation
Conclusions
Real Time Video Filtering
![Page 3: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/3.jpg)
The algorithm: Nonlinear Diffusion use numeric solution with iterations to solve
the diffusion equation
Why use it for image processing? Image noise is smoothed Edges remain sharp
Project Recap
![Page 4: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/4.jpg)
Original image
![Page 5: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/5.jpg)
dt = 30 !!! one iteration
Look at the edges(sharp!)
Look at the hat(smoothed)
![Page 6: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/6.jpg)
Part A overview
Difficulties with the algorithm: Very complex design, makes real time
almost impossible Transpose entire image Reverse order loop huge memory bandwidth required
So why use this model ? Good results even after a single iteration
(Yoni & Zion needed at least 20 iterations => need for multiple FPGAs)
![Page 7: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/7.jpg)
Part A overview
Exploring different architecture solutions in Matlab Comparing “sub-frames” processing vs. entire frame
processing Fixed-point analysis of the algorithm in Matlab Learning about memory resources:
Internal memory: MRAM, M4K, M512 External memory: DDR
Analyzing the memory bandwidth requirements of the algorithm
DVI signal generators Implementation of a real-time streaming of pixels
through DDR double buffering: • DVI in=>DDR write=>DDR read =>DVI out
![Page 8: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/8.jpg)
Part B Transpose image implementation
• First transpose (800x525 => 525x800)• Second transpose (525x800 => 800x525)• Each transpose implies synchronization between internal memories and
external memories using dedicated controllers and FIFOs
Detection of frame first pixel• Needed because each transpose block should start operating only at the
first pixel of a frame• Also needed because the pipeline of Sergey & Roman need to get a starting
signal, when the first pixel of a frame enter the pipeline.
Implementation of frame rate convertors• Down rate convertor at the input (60 fps => 15 fps)• Up rate convertor at the output (15 fps => 60 fps)
CORRECT DVI Synchronization!• PLL fixed location at input and output pins. • Registered Input/output pins.
Fixed-point analysis of the algorithm in Quartus
![Page 9: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/9.jpg)
DVIIN
DVIOUT
Part A Implementation
data
24bit(RGB)
3bit
DVI sync
PLL
Reset detector
DVI Ctrl signals
generator
DVI sync
3bit
25.2MHz
DVI clk
DVI clk
¼ DVI clk
DDR 2 banks
Gidel’s memory controller
180MHz 180MHz
StratixII
data
24bit
Internal memories
Internal memories
![Page 10: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/10.jpg)
T’
DVIIN
PIPE
DVIOUT
The Final architecture (PART B)
columns
lines
Freq controller:
4F to F
T’ PIPE
Freq Controller+T’
4F to F
data
24bit(RGB)
3bit
DVI sync
PLL
Reset detector
DVI Ctrl signals
generator
DVI sync
3bit
25.2MHz
DVI clk
DVI clk
¼ DVI clk
¼ DVI clk
¼ DVI clk
DDR 2 banks
Gidel’s memory controller
180MHz 180MHz
StratixII
data
24bit
![Page 11: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/11.jpg)
T’
DVIIN
PIPE
DVIOUT
The Final architecture (PART B)
columns
lines
Freq controller:
4F to F
T’ PIPE
Freq Controller+T’
4F to F
data
24bit(RGB)
3bit
DVI sync
PLL
Reset detector
DVI Ctrl signals
generator
DVI sync
3bit
25.2MHz
DVI clk
DVI clk
¼ DVI clk
¼ DVI clk
¼ DVI clk
DDR 8 Double Buffers
Gidel’s memory controller
180MHz 180MHz
StratixII
data
24bit
![Page 12: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/12.jpg)
T’
DVIIN
PIPE
DVIOUT
Fundamental DDR controller
columns
lines
Freq controller:
4F to F
T’ PIPE
Freq Controller+T’
4F to F
data
24bit(RGB)
3bit
DVI sync
PLL
Reset detector
DVI Ctrl signals
generator
DVI sync
3bit
25.2MHz
DVI clk
DVI clk
¼ DVI clk
¼ DVI clk
¼ DVI clk
DDR 8 Double Buffers
Gidel’s memory controller
180MHz 180MHz
StratixII
data
24bit
![Page 13: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/13.jpg)
Fundamental DDR controller
There are 4 bidirectional communication channels to/from DDR
Each channel requires another controller which is a variation of a fundamental controller
Up rate Down rate First tranpose (800x525 => 525x800) Second Transpose (525x800 => 800x525)
Each one has asymmetric behavior for read and write
![Page 14: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/14.jpg)
WRITEcontroller
READcontroller
Fundamental DDR controller
Synchronization states
![Page 15: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/15.jpg)
Dual ClockFIFO
DDR WR controller
DDR RD controller
wr fin
continue
continue
rd fin
DDR double buffer
When finishing a frame:Each controller calculates its new address and waits for the other controller to finish.While waiting, the controller keeps sending “continue” signal to the other controller.
Dual ClockFIFOPipe Pipe
![Page 16: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/16.jpg)
Bloody signals Flush -According to Gidel’s manual: flush signal is used to force writing the data to the
memory when the last word is incomplete.BUT, even when using a port size equal to the memory width, one must use the ‘flush’ signal.
Write empty: When performing write bursts from different
addresses, one must wait for signal write_empty before starting a new burst. Without waiting - the data is lost.
NOT in Gidel’s manual!
![Page 17: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/17.jpg)
T’
DVIIN
PIPE
DVIOUT
Down rate DDR controllers
columns
lines
Freq controller:
4F to F
T’ PIPE
Freq Controller+T’
4F to F
data
24bit(RGB)
3bit
DVI sync
PLL
Reset detector
DVI Ctrl signals
generator
DVI sync
3bit
25.2MHz
DVI clk
DVI clk
¼ DVI clk
¼ DVI clk
¼ DVI clk
DDR 8 Double Buffers
Gidel’s memory controller
180MHz 180MHz
StratixII
data
24bit
![Page 18: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/18.jpg)
Down rate controllers Write controller:
Writes to DDR only one frame out of every 4 frames.
Frame rate: 15 frames/sec, pixel rate: 6.2MHz• Data loss is almost unnoticeable• Algorithm performance is not affected!
Actual bandwidth: 25 MHz (DVI clock)
Read controller: Same as the fundamental DDR controller (burst of
entire frame) Actual bandwidth: 6.2 MHz
![Page 19: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/19.jpg)
Down rate controllers
“normal”READ
controller
WRITEcontroller
Write 1 frame to DDR
Counts 3 more frames, cleans the pipe
![Page 20: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/20.jpg)
T’
DVIIN
PIPE
DVIOUT
Up rate DDR controllers
columns
lines
Freq controller:
4F to F
T’ PIPE
Freq Controller+T’
4F to F
data
24bit(RGB)
3bit
DVI sync
PLL
Reset detector
DVI Ctrl signals
generator
DVI sync
3bit
25.2MHz
DVI clk
DVI clk
¼ DVI clk
¼ DVI clk
¼ DVI clk
DDR 8 Double Buffers
Gidel’s memory controller
180MHz 180MHz
StratixII
data
24bit
![Page 21: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/21.jpg)
UP rate controllers Write controller:
Same as the fundamental DDR controller (burst of entire frame)
Actual bandwidth: 6.2 MHz
Read controller: Reads the same frame from the DDR 4 times
• To meet DVI data rate requirements Actual bandwidth : 25MHz
![Page 22: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/22.jpg)
Up rate controllers
READcontroller
WRITEcontroller
Main “loop”- reads 4 times the same frame
Sync with WR, swap addresses
![Page 23: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/23.jpg)
T’
DVIIN
PIPE
DVIOUT
Transpose DDR controller
columns
lines
Freq controller:
4F to F
T’ PIPE
Freq Controller+T’
4F to F
data
24bit(RGB)
3bit
DVI sync
PLL
Reset detector
DVI Ctrl signals
generator
DVI sync
3bit
25.2MHz
DVI clk
DVI clk
¼ DVI clk
¼ DVI clk
¼ DVI clk
DDR 8 Double Buffers
Gidel’s memory controller
180MHz 180MHz
StratixII
data
24bit
![Page 24: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/24.jpg)
stratixII
A reminder of how it works:
M-RAMWRITE
M-RAMREAD
DDRIIT’
WRITE
DDRIIT’
READ
Penalty every row skip
Sequential read from DDR
Penalty all the time !
Transpose
![Page 25: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/25.jpg)
Transpose challenges Two different transposes:
The first transpose - 800x525 Transpose back - 525x800 Debugging difficulty…
Synchronization to the beginning of the frame is required
Transpose counters: “heavy” sequential Combinational logic causes
Timing problems
Transpose on read or on write?
![Page 26: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/26.jpg)
Transpose - memory configuration settings
Mram Max number of rows (minimum penalty) Number must divide 800 or 525 (no reminder) Number must agree with Gidel controller We chose 50 and 35 lines respectively
DDR Load balancing Gidel requirements
![Page 27: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/27.jpg)
Transpose’s synchronization blocksMram
Write and read Address counters
Beginning of frame detection unit
delaying the data
3 Mrams for RGB
![Page 28: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/28.jpg)
Transpose’s synchronization blocksDDR
Synchronization on the WR controller:New “Data in” portdesignated states to
deal with the first pixel of the frame after reset.
“cleans” the DCFIFO until detecting the first pixel of a new frame.
The WR controller sends reset signal to the RD controller.
![Page 29: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/29.jpg)
Transpose counters DDR and Mram counters:
The “heaviest” combinational logic of the entire design
If (a) and (not b) and (not c) thenIf (a) and (b) and (not c) thenIf (a) and (b) and (c) then
Long CL paths results in timing problems!
No code reuse and more HW (but we have enough!)
guarantees shorter, parallel CL
If (a) then If (b) then
If (c) then
![Page 30: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/30.jpg)
Can’t easily “divide and conquer”- Result is available only after 2 transposes:
We used SignalTap and built verification units
Debugging difficulties
Mram DDR
Addresses counters
Addresses counters
First T’
sync sync
Dual clk
FIFO
Mram DDR
Addresses counters
Addresses counters
Second T’
sync sync
Dual clk
FIFO
![Page 31: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/31.jpg)
Debugging difficulties
Can’t simulate DDR’s behavior in MODELSIM We don’t have a reliable model of the external
memory’s behavior Gidel’s controller is NOT “transparent” to the
users - We know nothing about:• Gidel’s Internal implementation• Gidel’s handling requests policy of the DDR
We can read from the DDR through PCI but – it changes the data path…
![Page 32: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/32.jpg)
Transpose on read Read and Write protocols are different
WRITE:• Wait 16clks after start• Wait ~100 clks after flush• Wait for signal write_empty
READ:• Wait for signal almost_empty_RD
Looks like READ loop is shorter! We successfully implemented transpose on read. However, the improvement is not good enough to
avoid using down/up rate controllers. The combined up rate and transpose: read loop is
more “busy”, better perform T’ on write!
![Page 33: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/33.jpg)
Can we avoid the loss of data? 2 iterations:
Only 2 transposes are needed! 2 FPGAs DDR configuration (for each FPGA):
• 1 transpose on bank A (19 MHz)• 1 transpose on bank B (19 MHz)
For each bank: 180x0.75/3=45 >25.2 !!!
Add more memory:• 1 T’ on bank A, 1 on bank B, 1 on additional memory:
For each bank: 180x0.75/3=45 >25.2 !!!
![Page 34: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/34.jpg)
T’
DVIIN
PIPE
DVIOUT
Timing Problems
columns
lines
Freq controller:
4F to F
T’ PIPE
Freq Controller+T’
4F to F
data
24bit(RGB)
3bit
DVI sync
PLL
Reset detector
DVI Ctrl signals
generator
DVI sync
3bit
25.2MHz
DVI clk
DVI clk
¼ DVI clk
¼ DVI clk
¼ DVI clk
DDR 8 Double Buffers
Gidel’s memory controller
180MHz 180MHz
StratixII
data
24bit
![Page 35: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/35.jpg)
Timing Problems
Problems Inconsistent compilation results Jittery image Lost data Timing problems
Solutions Registered I/Os PLL Fixed placing
![Page 36: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/36.jpg)
Additional Issues Multiport
• Data loss at end of burst• Long penalties• I/O strength• ProcII vs. ProcIII (no DVI)
Sync• Waiting for signal from second group
1 2 3 4 5 2 7 12 17
6 7 8 9 10 3 8 13 18
11 12 13 14 15 4 9 14 19
16 17 18 19 20 5 10 15 20
6 11 16 1
![Page 37: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/37.jpg)
Additional Issues SignalTap
![Page 38: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/38.jpg)
Summery
Internal memory blocks:Addressing controllerTransposeLine reverse
External memory:Double buffer on DDRUp/down rate controller
DVI synchronization
![Page 39: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/39.jpg)
Questions?
![Page 40: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project](https://reader035.vdocuments.site/reader035/viewer/2022062422/56649ee75503460f94bf8c21/html5/thumbnails/40.jpg)
We invite you to join us in the lab for a short
demonstration