lzrw3 data compression core
DESCRIPTION
LZRW3 Data Compression Core. mid presentation Part A Project Netanel Yamin & by: Shahar Zuta Moshe porian Advisor: Dual semester project November 2012. Contents. Project Overview Project goals Requirements Architecture Micro architecture Problems & solutions Conclusions - PowerPoint PPT PresentationTRANSCRIPT
mid presentation Part A Project
Netanel Yamin & by: Shahar Zuta
Moshe porian Advisor:
Dual semester project November 2012
Contents Project Overview Project goals Requirements Architecture Micro architecture Problems & solutions Conclusions Testability Methodology Schedule
algorithm overview
INPUT FILE
-------------------------------------------------------
Literal items ONLY
A copy item consists of two bytes that represent from 3 to 18 bytes. literal item consist of one byte which represents himself
LZRW3 COMPRESSO
R
OUTPUT FILE
]----[-]-----[]-------[]-----------[]----[
GROUPS OF ITEMS(literal/Copy)
mechanism
HASH FUNCTIO
N
INDEX409
5
0
INPUT FILE:
Offset
Expression_c om
press _ion
E x p
Offset value=
0
XXX
ZZZ
YYY
UUU
demonstration
UUU
r e s
3
XXX
Output
Exp
res
L.I
L.I
NOTE: The next 3 byte should be
“x p r” , then “ p r e “ and only then “r e s”, we did’nt demonstrate all the actions
for simplicity.
“L.I“ stands for
“Literal Item“
mechanism
HASH FUNCTIO
N
INDEX409
5
0
INPUT FILE:
Expres sion_c om
press _ion
Offset value=
XXX
ZZZ
YYY
UUU
demonstration
ZZZ
03
6
s i
9
_ o
YYYExp
res
Output
L.I
L.I
sio L.I
n_c L.I
Offset
cn
mechanism
HASH FUNCTIO
N
INDEX409
5
0
INPUT FILE:
Expression_c om
press _ion
Offset value=
XXX
ZZZ
YYY
UUU
demonstration
o m p
03
12
69
Exp
res
Output
L.I
L.I
sio L.I
n_c L.I
omp L.I
Offset
mechanism
HASH FUNCTIO
N
INDEX409
5
0
INPUT FILE:
Express _comp ress _io
Offset value=
XXX
ZZZ
YYY
UUU
r e s
XXX
03
15
12
96
demonstration
Exp
res
Output
L.I
L.I
sio L.I
n_c L.I
omp L.I
123
C.IXXX
io nn
3+
012345
Offset
“C.I“ stands for
“Copy Item “
Hash 3 bytes
Hash table [index
]
Enter offset
O.F-. Literal
item
Get offset
O.F.- Copy item
Length++
more
same byte
s
FWD 1 byte
FWD 3 +Length
bytes
START
index
empty filed
Same 3
bytes
no
yes
yes
Project Goals
Implementation of LZRW3 data compression
algorithm
Implementing strong debugging capabilities
via GUI
RequirementsVHDL implementationDE2 development board that features an
Altera Cyclone II FPGAFPGA – Host communication via UART
protocolUse internal memory on FPGA, no interface
to external memoryAdapted to data templates of 2Kbyte to
32KbyteHigh performance- data transfer of 1Gbps
RequirementsVHDL implementationXUPV5 development board that features an
Xilinx Virtex-5 FPGAFPGA – Host communication via UART
protocolUse internal memory on FPGA, no interface
to external memoryAdapted to data templates of 2Kbyte to
32KbyteHigh performance- data transfer of 1Gbps
Architecture
Rx PATH
Tx PATH
INPUT BLOCK memory LZRW3
COMPRESSOR
CORE
COMPRESSED FILE memory
GUI
XILINX VIRTEX 5 ON XUVP505 BOARD
UART
UART
Architecture
Rx PATH
Tx PATH
INPUT BLOCKmemory LZRW3
COMPRESSOR
CORE
COMPRESSED FILE memory
GUI
XILINX VIRTEX 5 ON XUVP505 BOARD
UART
UART
LZRW3 COMPRESSOR
CORE
Lzrw3_go
Lzrw3_mode
data_input_byte (7..0)
data_input_valid
data_input_taken
clk
Lzrw3_busy
Lzrw3_done
Lzrw3_output_group_size (4..0)
data_output_valid
data_output_taken
data_output_last
reset
data_output_bytes(13..0)
End_of_file
STAGE 1 – three bytes buffer
3 BYTESBUFFER
enable
reset
New_byte(7..0)
clk
Newer_byte(7..0)
Mid_byte(7..0)
Older_byte(7..0)
STAGE 2- hash function
enable
HASH FUNCTION
middle_byte(7..0)
clk
Table_index(11..0)
older_byte(7..0)
Newer_byte(7..0)
reset
TABLE INDEX = (((40543*(((*(PTR))<<8)^((*((PTR)+1))<<4)^(*((PTR)+2))))>>4) & 0xFFF) PTR pointes to the first byte . TABLE INDEX range: 0 to 4095.
7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0
7 6 5 4 3 7 2 6 1 5 0 4 3 7 2 6 1 5 0 4 3 2 1 0
, ,0000,0000
0000, , ,0000
0000,0000, ,
, , , , , , , , , , , , , , ,
a a a a a a a a
b b b b b b bb
c c c c c c c c
a a a a a b a b a b a b b c b c b c b c c c c c
STAGE 2- RTL view
STAGE 3 – hash tableenable
HASH TABLE
Data_out_valid
Table_index(0..11)
clk
Offset(19..0)
Current_offset(19..0)Offset
counter
reset
clear
Current_offset
0
0
0
0
1
1
0
1
0
1
1
0
Valid bits
21 bits
40
96
ro
ws
Offsetcounter
DATA_ IN
INDEX
ADDRESS
Offset
Data_out_valid
1
Offsetcounter
STAGE 4 – input file memory
Stage 4 implementationInput file memory should supply three byte at
the same time.
How to choose bank when byte arrives?
# _ %3Bank current offset
__ _
3
current offsetAddress in bank
SOLUTIONInstead of counting in stage 3 and divide in
stage 4, we incerment by one only after three clock cycles.
In this configuration we expand the offset by 2 bits (tagging) to select the the data need to write into.
Hash table size now is 4096 x (19+2) .
1001010101001110011 10
19 bits 2 bits
Solution costs (mem units) Memory usage At stage 3 from synplify_pro:
same as before.
LUT usage:
20 4096 81920 80 3 _ 108bit Kbit RAM block Kbit
36Kbit
Back to stage 4
Input file memorybanks
comparator
Continue
1
0
clk
clkTentative
Next address
clk
counter
offset
TAG
Com
pris
on_v
alid
Compare_success
clk
Offset_tag
Tentative_tag
clk
clk
Tentative_taken
Compare_success_P
Item_length_p
Offs
et_v
alid
Bank 0,1,2addresses
0
1
Addresses
alignment
Older_byte_P
Offset_valid
CBA
3401
Y Z
TENT
00
A
0
0
XB CD
CD
B
B
11
1
0
INDEX
TAG indicate the banks bytes order
Input file memorybanks
comparator
Continue
1
0
clk
clkTentative
Next address
clk
counter
offset
TAG
Com
pris
on_v
alid
Compare_success
clk
Offset_tag
Tentative_tag
clk
clk
Tentative_taken
Compare_success_P
Item_length_p
Offs
et_v
alid
Bank 0,1,2addresses
0
1
Addresses
alignment
Older_byte_P
Offset_valid
D C
00
1
T
DE
CINDE
X
C
Problem(1)in stage 4, at first we implemented the counter that counts the number of successful comparisons in the comparator which is made of an asynchronous process. It passed simulations but was not synthesizable.
Solution(1)we’ve changed the architecture of the units so the counter is implemented in a synchronous unit, it receives a signal from the asynchronous comparator if the comparison was successful and responds accordingly.
Problem(2)in stage 4, in order to perform the comparison of the current 3 bytes in the pipe and three bytes from the RAM memory we need to extract three following bytes from different addresses at one clock period.
Solution(2)we distributed the one memory we had into 3 RAM memory banks which contains following addresses so in case we want to extract 3 following bytes from the memory we’ll extract one byte from each bank.
Problem(3)in stage 4, the current pipe bytes that arrive the comparator are arranged in their arrival order but the three bytes withdrawn from the banks aren’t necessarily arranged in the right order.
Reading configurations
1. SAME ADDRESES
2. DIFFERENT ADDRESS
Reading configurations
3. DIFFERENT ADDRESS # 2
Reading configurations
(�ׂ3)SolutionWe used the TAG that represented the extracted bytes addresses to determine which extracted byte will be compared with which current piped byte.
Problem(4)In stage 4, the RAM memory banks need to have the next address to extract on the next
clock before the end of the current clock .
(4)SolutionWe created two units that will contain the next two possible addresses (tentative
address unit or address align unit).
ConclusionsWriting code for synthesis is different from
writing code for simulation.In asynchronous implementation all the
signals need to be in the sensitivity list.Reset should not pass through any logic.Think hardware when writing VHDL code for
synthesis.Keep on simplicity to achieve more flexibility.
2048
2048Testability
Synthesisable
Hash Function
Block
UnsynthesisableSimulation Function
Random input
generator
A B C
A B C
Assert the comparison and report to console
Input file
MethodologyStage data flow review.Writing VHDL code.Writing VHDL testbench.Code review and debugging.Synthesis check- synplify.
Check RTL view.Check CLK constraints.
Commit SVN folders and update data flow if needed.
Next stage data flow review.
Simulation & debugging
Schedule 1/2DateGoals
24/4/2012 – 1/5/2012
Project Characterization& Algorithm interpreting
2/5/2012Characterization Presentation
2/5/2012 – 16/5/2012
Full Characterization of all blocks
17/5/2012 – 1/7/2012
•System blocks VHDL •Design
1/7/2012 – 27/7/2012
Work on project paused for exams
29/7/2012– 11/11/2012
•System blocks VHDL •Design (Cont.)•Writing every unit a simulating testbench
Schedule 2/2DateGoals
12/11/2012Mid presentation
13/11/2012– 19/12/2012
•System blocks VHDL •Design (Cont.)•Writing every unit a simulating testbench
20/1/2012Part A final- Core Simulation Vs. Golden model
21/1/2012 – 15/2/2012
Assemble all units and FPGA synthesis
16/2/2012 – 28/2/2012
GUI implementation
1/3/2012 – 10/3/2012
Final overall Tests & debug
11/3/2012 – 31/3/2012
Editing and finishing project portfolio
1/4/2012Final presentation