california state university, northridge high-throughput
TRANSCRIPT
![Page 1: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/1.jpg)
CALIFORNIA STATE UNIVERSITY, NORTHRIDGE
High-Throughput, Lossless Data Compression and Decompression
On FPGAs
A graduate project submitted in partial fulfillment of the requirements
for the degree of Masters of Science
in Electrical Engineering.
By
Vikas Udayashekar
in collaboration with
Spoorthi Suresh
May 2012
![Page 2: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/2.jpg)
ii
The graduate project of Vikas Udayashekar is approved:
________________________________________________ ____________
Dr. Somnath Chattopadhyay Date
_________________________________________________ ____________
Dr. Ahmad Sarfaraz Date
_________________________________________________ ____________
Dr. Ramin Roosta, Chair Date
California State University, Northridge
![Page 3: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/3.jpg)
iii
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompanies the successful completion of any
task would be incomplete without the mention of the people who made it possible and
whose constant encouragement and guidance has been a source of inspiration throughout
the course of the project.
We express our sincere gratitude to Dr. Ramin Roosta, our project committee
chairperson. His invaluable assistance is one the main reasons that this project has been
successfully completed. We also wish to thank the other members of our graduate project
committee, Dr.Somnath Chattopadhayay and Dr. Ahmad Sarfaraz for their suggestions
and support. We would like to extend our profound gratitude to our Department Chair Dr.
Ali Amini for facilitating and helping us.
It is by God’s grace and the continuous support of our parents and friends that we
have been able to complete our MS program. Our family’s invaluable support in
providing us with a high quality of education has helped us achieve our goals. We want
to also express our appreciation to the Electrical and Computer Engineering Department
at California State University, Northridge, including all the professors whose classes we
had the pleasure to take.
![Page 4: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/4.jpg)
iv
TABLE OF CONTENTS
SIGNATURE PAGE………………………………………………………………...…. ii
ACKNOWLEDGEMENT……...………………………….………………………..…. iii
LIST OF FIGURES…….....……..…………………………………………..…….......vi
ABSTRACT………………………..……………………….…………………………...vii
CHAPTER 1 INTRODUCTION………….………………………………………….....1
1 Introduction and Background………………...…………...………..…..1
1.1 How does compression work? ………...…………………….2
1.2 Text and Signals: lossless and lossy compression…………...2
CHAPTER 2 824B ALGORITHM………………………………………………..…... 4
2.1 Introduction……………………………………………………….…4
CHAPTER 3 824B FPGA DESIGN…………………………………………………....6
3.1 Introduction ………………………………………...………………..6
3.2 FPGA Compression Pipeline……………………………..…………..6
3.2 FPGA Decompression Pipeline ……………………….…………….10
CHAPTER 4 SOFTWARE LANGUAGE/ HARDWARE IMPLEMENTATION…...12
4.1 Programmable Devices………………………………….…………...12
4.1.1 Programmable Logic Devices …………….……………..12
4.1.2 Complex Programmable Devices ……………………….13
4.1.3 Field Programmable Gate Arrays(FPGA)………..……...14
4.1.3.1 Advantages of FPG……………………….…...16
4.2 Hardware Design and Development………………………………..17
4.2.1 Design Entry………………………………………….....17
![Page 5: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/5.jpg)
v
4.2.2 Synthesis………………………….……………….…...18
4.2.3 Simulation…………………………………….……......18
4.2.4 Implementation………………………….……………..18
4.2.4.1 Translate…………………………….……......18
4.2.4.2 Map…………………………………………..18
4.2.4.3 Place and Route……………………………...19
4.3 Device Programming……………………………..……………... 19
4.4 Verilog HDL……………………………………………………...20
4.4.1 Importance of HDLs…………………..………………20
4.4.2 Why Verilog? ………………………………….……...20
CHAPTER 5 Result And Discussion………………………………………………..22
Verification in Modelsim(Xilinx)………….…………………………….…..22
Compression………….……………………………………………….……..22
Decompression………….……………………………….…………………..30
HDL Synthesis Report- Compression.….……………….…………………..38
HDL Synthesis Report- Decompression.………………….………….……..39
CHAPTER 6 Conclusion……………………………………………………………40
REFERENCE..………………………………………………………..…………….41
APPENDIX …………………………………………………………………….…..42
![Page 6: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/6.jpg)
vi
LIST OF FIGURES
Figure 1: 8 Byte Input Split into 7 Phrases ……………….………………………...…5
Figure 2: FPGA Compression Pipeline……………………………………………….. 7
Figure 3: FPGA Decompression Pipeline ……………………………………………. 10
Figure 4: Internal Structure of a CPLD……………………………………………….. 13
Figure 5: Internal Structure of an FPGA ………………...…………………………… 15
Figure 6: Internal architecture of CLB …………………………………...…………... 16
Figure 7: Design Flow ……………………………………………………………..…. 17
Figure 8: Simulation result 1 for Compression................................................................22
Figure 9: Simulation result 2 for Compression …………………….…………..…….. 23
Figure 10: Simulation result 3 for Compression ………………….….………………..24
Figure 11: Simulation result 4 for Compression …………………….………………...25
Figure 12: Simulation result 5 for Compression ….…………………………………..26
Figure 13: Simulation result 6 for Compression..……………………………..……….27
Figure 14: Simulation result 7 for Compression.……………………………………....28
Figure 15: Simulation result 8 for Compression …………………………….…..…….29
Figure 16: Simulation result 1 for Decompression ………………………...………….30
Figure 17: Simulation result 2 for Decompression.………………………..…………..31
Figure 18: Simulation result 3 for Decompression …………………………………....32
![Page 7: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/7.jpg)
vii
Figure 19: Simulation result 4 for Decompression ……………………………………33
Figure 20: Simulation result 5 for Decompression ……..……………….………….....34
Figure 21: Simulation result 6 for Decompression …………………………...…….....35
Figure 22: Simulation result 7 for Decompression …………………………..…….....36
Figure 23: Simulation result 8 for Decompression ………………………..……….....37
![Page 8: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/8.jpg)
viii
ABSTRACT
High-Throughput, Lossless Data Compression and Decompression
on FPGAs
By
Vikas Udayashekar
Masters of Science in Electrical Engineering
Before writing data to a storage medium or transmitting across a transmission medium
lossless compression is often used. Storage space or transmission bandwidth is saved by
compression; when the data is subsequently read a decompression operation is
performed.
Though this scheme has clear benefits, the execution time of compression and
decompression is critical to its application in real-time systems. Software compression
utilities are often slow, leading to degraded system performance. Hardware-based
solutions, on the other hand, often drive large resource requirements and are not
amenable to supporting future algorithmic changes.
We present a high-throughput, streaming, lossless compression algorithm and its
efficient implementation on FPGAs. A peak throughput of 1GB/sec per engine, with a
sustained overall measured throughput of 2.66GB/sec on a PCIe-based FPGA board with
compression and decompression engines is provided by the proposed solution. An overall
speedup of 13.6x over reference software implementation is represented by this result.
With multiple engines running in parallel, the proposed design provides a path to
potential speedups of up to two orders of magnitude. The achievable overall throughput is
limited only by the available PCIe bus bandwidth in the current implementation.
![Page 9: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/9.jpg)
1
CHAPTER 1: INTRODUCTION
1. Introduction And Background
To save storage space or reduce required transmission bandwidth lossless data
compression is often used. In software, data compression algorithms are often
implemented. It can sometimes be a performance bottleneck, although this approach
saves important real estate on processor chips and allows for later modifications to the
algorithm. Aiming for the entropy of the data, most of the existing work on data
compression has been concentrated on achieving the best compression efficiency
possible. However, execution speed of the compression / decompression operation is
more important than the compression efficiency in a number of applications. In such
applications hardware-based fast compression algorithms may be used. Many such
algorithms like custom hardware based ALDC [1], MXT [2], 842[3], and FPGA based
XMatchPRO[4] exit. However, most of these solutions utilize expensive CAM (content-
addressable memory) structure for implementing the history windows (dictionaries) and
achieve throughputs in the range of 100 MB/sec to 400 MB/sec. In this design, we
present the 842B algorithm, a hardware-friendly, lossless compression algorithm derived
from the original 842 algorithm [3], and its FPGA implementation. Instead of expensive
CAMs, the proposed algorithm uses hashing-based dictionary lookups and offers a
throughput of 1GByte/sec per engine. Compression and decompression of arbitrary size
data blocks is allowed by the low latency streaming architecture and can be placed
directly on the transmission channels. Additionally, better compression efficiency is yield
by multiple overlapping sliding compression windows (dictionaries) of different lengths.
Very modest FPGA resources are required as the compressor and decompressor designs
presented here are very lean. Therefore, in application areas the designs are suitable for
use as small modules where FPGA-based systems can be applied, including signal and
image processing, network routers and transmitter-receiver systems, effectively
increasing the CPU cycles and bandwidth resources available for other purposes in such
systems.
![Page 10: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/10.jpg)
2
1.1 How does compression work?
Compression relies on the fact that the data is redundant, that till some extent it was
generated following some rules and that we can learn those rules, and thus predict
accurately the data. A compressor can reduce the size of a file by deciding which data is
more frequent and assigning it less bits than to less frequent data. Clearly compression
has two parts: one guess which are the most frequent symbols, and other which outputs
the "decision" of the first one.
1.2 Text and signals: lossless and lossy compression
We have seen before that we may want to compress different kinds of data such as text,
data bases, binary programs, sound, image and video. In practice we distinguish about
text compression and signal compression. We do this separation because data bases, and
binary programs have the same characteristic as text. Likewise sound, image and video
are signals and thus share properties. In the other hand text and image data have nothing
in common, and that's what they don't belong to the same group.
We also do this separation because for these two groups we use different kinds of
compression. That comes from the nature of the data. Digital signals are an imperfect
representation of an analogic signal, thus when compressing them we can discard some of
the information to achieve more compression. This is done with transformation and
quantization algorithms.
Let's say we have a byte from an image, its value is 65 and it represents the quantity of
red in a given pixel. If when decompressing this byte is 66 we wouldn't notice the
difference between red and a very little more of red. However if that was a text file 65
would be 'A' (assuming it's Ascii), and there's a big difference if we decompress 66 which
would be a 'B' instead of 'A'. Due to the nature of text we can't afford errors.
So we use lossless compression for text where the original file must be exact bit per bit to
the original one. And lossy compression for signals where some error is acceptable and in
most of the cases is not detected. However you should note that signals can be lossless
compressed, though then the compression achieved is far worse than with lossy
![Page 11: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/11.jpg)
3
compression. In most of the cases signal compression goes of discarding as much data as
possible but retaining as much quality as possible.
![Page 12: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/12.jpg)
4
CHAPTER 2: 842B ALGORITHM
2.1 Introduction
Repeating patterns of size 8, 4 and 2 bytes in the input data stream is identified by the
842B algorithm and 6 to 8 bit pointers replaces them to previously seen data. The
algorithm follows the same principles as the original 842 algorithm [3]. Every 8-byte
chunk of the input data is divided into 7 phrases (Figure 2-1) which are compared against
previously seen phrases. The input phrases for lookup in processing subsequent inputs are
stored using dictionaries . The address of the phrase in the dictionary (pointer) is stored in
a hash table, at a location given by the hash value of the phrase for constant-time phrase
look-up. The 7 sub-phrases of the 8B input are hashed into 7 keys, which are used to read
the pointers from the hash tables during compression. 7 phrases are read from the
dictionaries and compared against the input sub-phrases using these pointers. Indicating
the composition of the compressed data, the compressed output is generated as the
smallest possible combination of the pointers to the data. Decoding the template and
extracting different pointers and raw phrases from the compressed data, reading the
remaining phrases from the dictionaries and reconstructing the uncompressed data
decompression is involved. Reconstructing the dictionary contents is required for reading
the phrases from the dictionaries. By simply writing the post-decompression phrases back
into the dictionaries the dictionary is reconstructed on the fly, much as in the compression
operation. Note that during decompression since the pointers are already present in the
compressed data no hashing and no hash tables are required. The 842B algorithm uses
three separate dictionaries, representing three different sliding history windows, one for
each of 8, 4 and 2 byte phrases, unlike the original 842 algorithm which uses a single
phrase dictionary. The three dictionaries redundantly store the 7 sub phrases of the
current input.
![Page 13: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/13.jpg)
5
Figure 1:8-Byte Input split into 7 phrases
Redundant storage of data has two benefits, while this would seem wasteful. Multi-port
RAM arrays are expensive to implement as port count increases. A 7-port RAM array is
replaced with 1, 2 and 4 port RAM arrays for the 8, 4 and 2 byte phrases, respectively,
trading off register ports with RAM capacity. Once we make that tradeoff, the optimal
dictionary sizes can be chosen for each of the 3 phrase lengths independently. The 842B
algorithm also incorporates performance enhancers such as detecting and replacing
multiple consecutive 8-byte repeats with just a 5 bit template and a repeat count in
addition to the basic phrase comparisons. As a special case of repeats long strings of
zeros are also detected.
![Page 14: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/14.jpg)
6
CHAPTER 3: 842B FPGA DESIGN
3.1 Introduction
For the 842B compressor and decompressor, we present the FPGA design. Multi-
stage pipelines are used to design both the compressor and the decompressor. The
pipelines are used to stream the input data which process one block of input per cycle.
The compression operation is feed-forward and lends itself well to pipelining. On the
other hand, the dictionary write-back in the decompression pipeline introduces a feedback
loop and results in multiple data hazard conditions.
As stated earlier, the 842B algorithm operates on 7 different phrases for each 8-byte
input. Despite the fact that these phrases are independent of each other and can be
processed in parallel, software 842B algorithm implementations process these 7 phrases
sequentially. Our FPGA pipelines include seven parallel data-paths, one for each phrase
to exploit this parallelism and achieve improved performance.
3.2 FPGA Compression Pipeline
Different stages of the compression pipeline as implemented in the FPGA are shown in
Figure 2. 8 bytes of input per cycle are taken by the compression pipeline and outputs one
compressed data word and a template.
Every 8-byte input is broken into 7 phrases; the hashing, hash-table look-up, dictionary
look-up and phrase comparison for all these phrases are performed in parallel and a 7-bit
match/mismatch status is generated. The pointers and the raw input phrases in the
smallest possible output, based on the match/mismatch status are encoded by the encoder.
A template to indicate the composition of the compressed output is also generated. The
mapping from the match/mismatch status to the corresponding smallest output
combination (and thus the template) is got statically. Using look-up tables this mapping is
implemented on the FPGA. The 5-bit template can be read directly from the look-up
table, given a match/mismatch status.
![Page 15: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/15.jpg)
7
Figure 2: FPGA Compression Pipeline
The compression pipeline also performs hash table and dictionary writes in addition to
the above operation. The next dictionary address, generated sequentially using a counter,
is stored in the hash table, at a location given by the hash value of the phrase during the
hash table read/write cycle. Pointers may be overwritten with the latest pointer hashing to
the same location since many input phrases might hash to the same value. The
implementation is analogous to a direct mapped cache memory. This dictionary and hash
table-based design uses regular RAM arrays instead of the CAMs typically used in other
hardware compressors and hence is more area-efficient and simpler to implement.
The input phrase is written in the dictionary at a location given by the output of the
counter during the dictionary read/write cycle. Note that there is a need for up to 4
simultaneous reads and 4 writes from and to the memory banks since all 7 phrases are
being processed simultaneously. We duplicate the dictionaries to support multiple read
ports since the FPGA block RAMs that are used to implement the dictionaries and the
hash tables provide only two read/write ports. This increases the memory requirements
four-fold for a 4-port dictionary. Simply by performing a single wide write operation
multiple writes are supported and thus do not demand further dictionary replication.
The compressor’s performance depends on many factors. The first is the dictionary
size, which represents the amount of data that is “remembered”. The larger the dictionary,
the more phrases are remembered, and hence, the higher the probability of finding a
phrase match. On the other hand, a larger dictionary requires longer pointers, which in
turn increases the size of the compressed data. Larger dictionaries also require higher
FPGA resource. Thus, there exists a tradeoff between allocated hardware resources and
algorithm performance. A dictionary size-performance sweet spot which yields the best
![Page 16: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/16.jpg)
8
average compression ratio is indicated by our simulations. This sweet-spot occurs at
different dictionary sizes for different phrase sizes. The dictionary size is 2KB, 2KB and
512B for the 8-byte, 4-byte and 2-byte phrase dictionary respectively in our FPGA
design.
Dictionaries also include a wrap-around mechanism, they wrap to the beginning
when they get filled. This behavior represents windows of sizes that are equal to the
dictionary sizes sliding over the data to be compressed. This wraparound approach results
in better compression than the one which flushes the dictionaries once filled.
The hashing scheme and the sizes of the hash tables is another factor affecting
the compression efficiency. To achieving good compression performance, efficient and
effective hashing is one of the keys. A hashing function is generates the address of the
hash table location where the pointer to a dictionary entry is stored. Dictionary pointers
may be overwritten since hashing is a many-to-one function, i.e. many different phrases
hash to the same value. It is thereby important to have a hashing scheme that spreads the
hashes evenly across the entire hash table.
The hashing scheme must be (i) lightweight (requiring few resources) and (ii)
simple, enabling achievement of high frequency for efficient hardware implementation.
The simplest hashing scheme is modulo operation which simply involves selecting the
lower order bits of the input phrase as the hash. This scheme, however, results in poor
hash quality, thereby creating multiple conflicts in certain table locations while leaving
the rest untouched.
Creating XOR trees by bitwise ANDing the input phrase with a constant and
XORing together the bits of the result to generate one bit of the hash is involved in our
scheme. Generating an N bit hash out of an M bit input an N M-bit constants is required.
A total of MxN 2-input AND operations and N(M-1) 2-input XOR operations is required
for hashing. Experimentally the optimal values of the constants are determined and hard-
coded into the FPGA, thus reducing the AND operations to simple bit selection. Using an
XOR tree, the selected bits are then XORed.
For effective hashing, the selection of the appropriate hash constants is critical.
We use the Random Invertible Binary Matrix (RIBM) approach to generate the XOR
tree’s hash constants [5, 6]. The Random Invertible Binary Matrix is produced off-line by
![Page 17: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/17.jpg)
9
filling randomly with 1s and 0s and checking for invertibility to ensure maximum
dispersion in the output bits.
Hash conflicts occur, even with very good hashing techniques, overwriting a
current pointer with a new one. Conflicts result in “forgetting” a previously written
phrase as the pointer to that phrase is lost. We chose to use a large direct mapped
organization for its simplicity though an N-way set associative hash table could be used
to increase the hash hit rate. The probability of pointers being overwritten is reduced by
larger hash tables and hence increases the chances of finding a previously written phrase
in the dictionary. Having larger hash tables, however, requires larger FPGA resource.
Hash table sizes can be optimized against performance like regular caches and hence
increasing the table size beyond a certain point yields diminishing performance gains. A
hash table with roughly 4 times the number of entries in the corresponding dictionary
achieves good performance for our design.
Pattern encoding for our design is shown in following table extracted from C- pack
compression and decompression algorithm [7].
Table: Pattern encoding table for compression and decompression
![Page 18: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/18.jpg)
10
3.3 FPGA Decompression Pipeline
Figure 3 shows the 842B decompression pipeline in the FPGA. One set of
compressed data and its template is fed to the pipeline per cycle. The template is decoded
and the various pointers and raw phrases are extracted from the compressed data by the
data decoder. These extracted pointers are used to read the phrases from the three
dictionaries.
Figure 3: FPGA Decompression pipeline
The 8-byte uncompressed data is generated by the data generator by selecting each
2-byte phrase from one of four sources, namely, the extracted phrases or data read from
one of the three dictionaries. This module, thus, simply contains four 4:1 multiplexors.
The select lines for these multiplexors are read directly from a look-up table using the 5-
bit template much as in the compressor design.
The decompressed output is written into the three dictionaries to reconstruct the
dictionaries on the fly. The dictionary write-back introduces a feedback path in the
pipeline, which leads to possible data hazards. In other words, compressed data might
contain a pointer to a dictionary location which has not been written yet. A data hazard
can be led by four possible scenarios. The first is where a pointer points to a phrase that
arrived one cycle earlier (1-ahead). In the above case, dictionary data being read is still in
the data-gen stage and has not yet been written into the dictionary. This data, however, is
required in the data-gen stage in the next cycle, and hence can be forwarded. This
situation is detected and the data is forwarded appropriately by adding a hazard detection
and data forwarding unit in the data-gen stage. Since there are three separate dictionaries,
three forwarding units are required.
When the read and write requests from the same address arrive during the same
cycle (4 ahead) or a read request arrives 1 or two cycles earlier than the write request (3-
![Page 19: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/19.jpg)
11
ahead and 2-ahead respectively), the other three data hazards occur. 4-ahead hazard is a
true read-during-write condition whereas 2-ahead and 3-ahead hazards occur due to
pipelining as a consequence of the read/write operations requiring more than 1 cycle. A
dictionary bypass unit is added to address these hazards at the output of each dictionary,
which bypasses the dictionary and forwards the dictionary write data as the response to
the read request.
The hazard detection logic in the decompressor could have been avoided by
disallowing near pointers, i.e. pointers between phrases less than or equal to 4 cycles
apart, during compression. This approach would have yielded simpler pipeline logic but
then reduced compression efficiency.
![Page 20: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/20.jpg)
12
CHAPTER 4: SOFTWARE LANGUAGE/HARDWARE IMPLEMENTATION
This chapter gives details of Programmable Logic Devices and Verilog HDL.
Programmable devices like PLD, CPLD and FPGA are explained. At the end, history of
Verilog HDL, importance of HDLs and advantages of Verilog HDL are discussed.
4.1 Programmable Devices
Programmable devices are those devices which can be programmed by the user.
Various programmable devices are PLDs, CPLDs, ASICs and FPGAs
4.1.1 Programmable Logic Devices
At the low end of the spectrum are the original Programmable Logic Devices
(PLDs). A programmable logic device is an IC that is user configurable and is capable of
implementing logic functions. These were the first chips that could be used to implement
a flexible digital logic design in hardware. In other words, one could remove a couple of
the 7400-series TTL parts (ANDs, ORs, and NOTs) from the board and replace them
with a single PLD. Other names for this class of device are Programmable Logic Array
(PLA), Programmable Array Logic (PAL), and Generic Array Logic (GAL).
PLDs have several clear advantages over the 7400-series TTL parts that they
replaced. First, of course, is that chip requires less board area, power, and wiring.
Another advantage is that the design inside the chip is flexible, so a change in the logic
doesn't require any rewiring of the board. Rather, simply replacing that one PLD with
another part that has been programmed with the new design can alter the decoding logic.
Inside each PLD is a set of fully connected macro cells. These macro cells are typically
comprised of some amount of combinatorial logic (AND and OR gates) and a flip-flop.
In other words, a small Boolean logic equation can be built within each macro cell.
Hardware designs for these simple PLDs are generally written in languages like ABEL or
PALASM (the hardware equivalents of assembly) or drawn with the help of a schematic
capture tool.
![Page 21: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/21.jpg)
13
4.1.2 Complex Programmable Devices
As chip density is increased, it was natural for the PLD manufacturers to evolve
their products into larger (logically, but not necessarily physically) parts called Complex
Programmable Logic Devices (CPLDs). For most practical purposes, CPLDs can be
thought of as multiple PLDs (plus some programmable interconnect) in a single chip. The
Larger size of a CPLD allows implementing either more logic equations or a more
complicated design. In fact, these chips are large enough to replace dozens of those 7400-
Series parts.
Figure 4 contains a block diagram of a CPLD. Each of the four logic blocks
shown is equivalent to one PLD. However, in an actual CPLD there may be more or less
than four logic blocks. These logic blocks are themselves comprised of macro cells and
interconnect wiring, just like an ordinary PLD.
Figure 4: Internal Structure of a CPLD
Unlike the programmable interconnect within a PLD, the switch matrix within a
CPLD may or may not be fully connected. In other words, some of the theoretically
possible connections between logic block outputs and inputs may not actually be
supported within a given CPLD. The effect of this is most often to make 100% utilization
![Page 22: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/22.jpg)
14
Of the macro cells very difficult to achieve. Some hardware designs simply won't fit
within a given CPLD, even though there are sufficient logic gates and flip-flops
available.
Because CPLDs can hold larger designs than PLDs, their potential uses are more
varied. They are still sometimes used for simple applications like address decoding, but
more often contain high-performance control-logic or complex finite state machines. At
the high-end (in terms of numbers of gates), there is also a lot of overlap in potential
applications with FPGAs. Traditionally, CPLDs have been chosen over FPGAs whenever
High-performance logic is required. Because of its less flexible internal architecture, the
delay through a CPLD (measured in nanoseconds) is more predictable and usually
shorter.
4.1.3 Field Programmable Gate Arrays (FPGA)
'Field Programmable' means that the FPGA's function is defined by a user's
program rather than by the manufacturer of the device. A typical integrated circuit
performs a particular function defined at the time of manufacture. In contrast, a program
written by someone other than the device manufacturer defines the FPGA’s function.
Depending on the particular device, the program is either ’burned’ in permanently or
semi-permanently as part of a board assembly process, or is loaded from an external
memory each time the device is powered up. This user programmability gives the user
access to complex integrated designs without the high engineering costs associated with
application specific integrated circuits (ASIC). The FPGA is an integrated circuit that
contains many (64 to over 10,000) identical logic cells that can be viewed as standard
components. The individual cells are interconnected by a matrix of wires and
programmable switches.
The logic cell architecture varies between different device families. Generally
speaking, each logic cell combines a few binary inputs (typically between 3 and 10) to
one or two outputs according to a Boolean logic function specified in the user program.
The cell's combinatorial logic may be physically implemented as a small look-up table
memory (LUT) or as a set of multiplexers and gates. LUT devices tend to be a bit more
flexible and provide more inputs per cell than multiplexer cells at the expense of
propagation delay.
![Page 23: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/23.jpg)
15
Figure 5: Internal Structure of an FPGA
The development of the FPGA was distinct from the PLD/CPLD evolution. There
are three key parts of its structure: logic blocks, interconnect, and I/O blocks. The I/O
blocks form a ring around the outer edge of the part. Each of these provides individually
selectable input, output, or bi-directional access to one of the general-purpose I/O pins on
the exterior of the FPGA package. Inside the ring of I/O blocks lies a rectangular array of
logic blocks. The wire connecting logic block to logic blocks and I/O to logic block is
called as programmable inter connect.
![Page 24: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/24.jpg)
16
Figure 6: Internal Architecture of CLB
The logic blocks within an FPGA can be as small and simple as the macro cells in
A PLD (a so-called fine-grained architecture) or larger and more complex (coarse-
grained).
However, they are never as large as an entire PLD, as the logic blocks of a CPLD are.
The logic blocks of a CPLD contain multiple macro cells. But the logic blocks in an
FPGA are generally nothing more than a couple of logic gates or a look-up table and a
flip-flop.
4.1.3.1 Advantages of FPGA
Because of all the extra flip-flops, the density is higher from several thousand
gates to few million gates and the architecture of an FPGA is much more flexible than
that of a CPLD. This makes FPGAs better in register-heavy applications. They are also
often used in place where the processing of input data streams must be performed at a
very fast pace. In addition, FPGAs are usually denser (more gates in a given area) and
cost less than CPLD, so they are the best choice for larger logic designs. FPGA’s uses
static memory so they are reprogrammable.
![Page 25: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/25.jpg)
17
4.2 Hardware Design and Development
A description of the hardware's structure and behavior is written in a high-level
hardware description language (usually VHDL or Verilog) and that code is then compiled
and downloaded prior to execution. Of course, schematic capture is also an option for
design entry, but it has become less popular as designs have become more complex and
the language-based tools have improved. The overall process of hardware development
for programmable logic is shown in Figure 4.4
Figure 7: Design Flow
4.2.1 Design Entry
In the design entry process, the behavior of circuit is written in hardware
description language like VHDL or Verilog.
![Page 26: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/26.jpg)
18
4.2.2 Synthesis
First, an intermediate representation of the hardware design is produced. This step
is called synthesis and the result is a representation called a netlist. In this step, any
semantic and syntax errors are checked. The synthesis report is created which gives the
details of errors and warning if any. The netlist is device independent, so its contents do
not depend on the particulars of the FPGA or CPLD; it is usually stored in a standard
format called the Electronic Design Interchange Format (EDIF).
4.2.3 Simulation
Simulator is a software program to verify functionality of a circuit. The
functionality of code is checked. The inputs are applied and corresponding outputs are
checked. If the expected outputs are obtained then the circuit design is correct.
Simulation gives the output waveforms in form of zeros and ones. Although problems
with the size or timing of the hardware may still crop up later, the designer can at least be
sure that his logic is functionally correct before going on to the next stage of
development.
4.2.4 Implementation
Device implementation is done to put a verified code on FPGA. The various steps
in design implementation are:
Translate
Map
Place and route
4.2.4.1 Translate
Translate converts the EDIF file to the NGD (Native Generic Description File)
which means code is converted to the gates or net lists. The translate process generates
the translate report which gives the errors and warnings in translation process. This report
also gives the list of device and I/O utilization, which helps the designer to determine the
selection of best device.
4.2.4.2 Map
Mapping converts the NGD (Native Generic Description) file obtained from
translate process to the NCD (Native Circuit Description File) which means the gates are
converted to the physical components like flip flops and multiplexer.
![Page 27: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/27.jpg)
19
4.2.4.3 Place and Route
Place is the process of selecting specific logic blocks in the FPGAs where design gates
will reside. Route is the physical routing of interconnect between logic blocks. This
means that logic blocks, CLB, I/O blocks are assigned to specific locations on die and
interconnections are made between them. This step involves mapping the logical
structures described in the net list onto actual macro cells, interconnections, and input and
output pins. This process is similar to the equivalent step in the development of a printed
circuit board, and it may likewise allow for either automatic or manual layout
optimizations. The result of the place & route process is a bitstream. This name is used
generically, despite the fact that each CPLD or FPGA (or family) has its own, usually
proprietary, bitstream format. Bitstream is the binary data that must be loaded into the
FPGA or CPLD to cause chip to execute a particular hardware design.
4.3 Device Programming
Once bit stream file is created for a particular FPGA or CPLD, it is downloaded
on the device. The details of this process are dependent upon the chip's underlying
process technology. Programming technologies used are PROM (for one-time
programmable), EPROM, EEPROM, and Flash. Just like their memory counterparts,
PROM and EPROM based logic devices can only be programmed with the help of a
separate piece of lab equipment called a device programmer. On the other hand, many of
the devices based on EEPROM or Flash technology are in-circuit programmable. In other
words, the additional circuitry that's required to perform device reprogramming is
provided within the FPGA or CPLD silicon as well. This makes it possible to erase and
reprogram the device internals via a JTAG interface or from an on-board embedded
processor. In addition to non-volatile technologies, there are also programmable logic
devices based on SRAM technology. In such cases, the contents of the device are
volatile. This has both advantages and disadvantages. The obvious disadvantage is that
the internal logic must be reloaded after every system or chip reset. That means, some
sort of an additional memory chip is needed to hold the bit stream. But it also means that
the contents of the logic device can be changed.
![Page 28: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/28.jpg)
20
4.4 Verilog HDL
The history of the Verilog HDL[8] goes back to the 1980s, when a company
called Gateway Design Automation developed a logic simulator, Verilog-XL, and with it
a hardware description language. Cadence Design Systems acquired Gateway in 1989
and with it the rights to the language and the simulator. In 1990, Cadence put the
language (but not the simulator) into the public domain, with the intention that it should
become a standard, nonproprietary language. The Verilog HDL is now maintained by a
nonprofit making organization, Accellera, which was formed from the merger of Open
Verilog International (OVI) and VHDL International. OVI had the task of taking the
language through the IEEE standardization procedure. In December 1995 Verilog HDL
became IEEE Std. 1364-1995. A significantly revised version was published in 2001:
IEEE Std. 1364-2001. There was a further revision in 2005 but this only added a few
minor changes. Accellera have also developed a new standard, System Verilog, which
extends Verilog. System Verilog became an IEEE standard (1800-2005) in 2005. There is
also a draft standard for analog and mixed-signal extensions to Verilog, Verilog-AMS.
4.4.1 Importance of HDLs
HDLs have many advantages compared to traditional schematic based design.
Designs can be described at a very abstract level by use of HDLs. Designers can write
their RTL description without choosing a specific fabrication technology. Logic synthesis
tools can automatically convert the design to any fabrication technology. If a new
technology emerges, designers do not need to redesign their circuit.
Functional verification of the design can be done early in the design cycle.
Better representation of design due to simplicity of HDLs when compared to gatelevel
schematics.
Modification and optimization of the design became easy with HDLs.
Cuts down design cycle time significantly because the chance of a functional bug at a
later stage in the design-flow is minimal[8].
4.4.2 Why Verilog?
Verilog HDL has evolved as a standard hardware description language. Verilog
HDL offers many useful features for hardware design.
Easy to learn and easy to use, due to its similarity in syntax to that of the C
![Page 29: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/29.jpg)
21
programming language.
Different levels of abstraction can be mixed in the same design.
Availability of Verilog HDL libraries for post-logic synthesis simulation.
Most of the synthesis tools support Verilog HDL.
The Programming Language Interface (PLI) is a powerful feature that allows the
user to write custom C code to interact with the internal data structures of Verilog.
Designers can customize a Verilog HDL simulator to their needs with the PLI [8]
![Page 30: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/30.jpg)
22
CHAPTER 5: RESULT AND DISCUSSION
Verification in Modelsim(Xilinx)
Compression
Figure 8: Simulation result 1 for Compression
![Page 31: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/31.jpg)
23
Figure 9: Simulation result 2 for Compression
![Page 32: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/32.jpg)
24
Figure 10: Simulation result 3 for Compression
![Page 33: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/33.jpg)
25
Figure 11: Simulation result 4 for Compression
![Page 34: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/34.jpg)
26
Figure 12: Simulation result 5 for Compression
![Page 35: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/35.jpg)
27
Figure 13: Simulation result 6 for Compression
![Page 36: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/36.jpg)
28
Figure 14: Simulation result 7 for Compression
![Page 37: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/37.jpg)
29
Figure 15: Simulation result 8 for Compression
![Page 38: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/38.jpg)
30
Decompression
Figure 16: Simulation result 1 for Decompression
![Page 39: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/39.jpg)
31
Figure 17: Simulation result 2 for Decompression
![Page 40: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/40.jpg)
32
Figure 18: Simulation result 3 for Decompression
![Page 41: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/41.jpg)
33
Figure 19: Simulation result 4 for Decompression
![Page 42: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/42.jpg)
34
Figure 20: Simulation result 5 for Decompression
![Page 43: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/43.jpg)
35
Figure 21: Simulation result 6 for Decompression
![Page 44: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/44.jpg)
36
Figure 22: Simulation result 7 for Decompression
![Page 45: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/45.jpg)
37
Figure 23: Simulation result 8 for Decompression
![Page 46: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/46.jpg)
38
HDL Synthesis Report- Compression
![Page 47: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/47.jpg)
39
HDL Synthesis Report- Decompression
![Page 48: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/48.jpg)
40
CHAPTER 6: CONCLUSION
Storage space or transmission bandwidth was saved by compression; when the data was
subsequently read a decompression operation was performed. A high-throughput,
streaming, lossless compression algorithm and its efficient implementation on FPGAs
was achieved. A peak throughput on a PCIe-based FPGA board with compression and
decompression engines was provided by the proposed solution. An overall speedup over
reference software implementation was represented by this result. With multiple engines
running in parallel, the proposed design provided a path to potential speedups of up to
two orders of magnitude. The achievable overall throughput was limited only by the
available PCIe bus bandwidth in the current implementation. Our FPGA pipelines
included seven parallel data-paths, one for each phrase to exploit parallelism and achieve
improved performance. The wraparound approach resulted in better compression than the
one which flushes the dictionaries once filled.
![Page 49: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/49.jpg)
41
REFERENCE
[1] Craft, D. J. “A fast hardware data compression algorithm and some algorithmic
extensions”, IBM Journal of Research and Development, 42(6), 733 – 745, November
1998.
[2] Tremaine, R.B., et. al. “IBM Memory Expansion Technology (MXT)”,IBM Journal
of Research and Development, 45(2), 271-285, March 2001.
[3] Franaszek, P. A., Lastras, L. A., Peng, S., and Robinson, J. T., “Data Compression
with Restricted Parsings”, dcc, 203-212, Data Compression Conference (DCC'06), 2006.
[4] Núñez, J. L., et. al. "X-MatchPRO: A ProASIC-Based 200 Mbytes/s Full-Duplex
Lossless Data Compressor”, Lecture Notes in Computer Science, 2147/2001, 613-617,
January 2001.
[5] Qureshi, M. K., et. al., “Enhancing Lifetime and Security of PCMBased Main
Memory with Start-Gap Wear Leveling” 42nd International Symposium on
Microarchitecture (MICRO 2009), December 2009.
[6] Vandierendonck, H. and De Bosschere, K. “XOR-based hash functions”, IEEE
Transactions on Computers, 54(7), 800- 812, July 2005.
[7] Xi Chen, Lei Yang, Robert P. Dick, Member, IEEE, Li Shang, Member “C-Pack: A
High-Performance Microprocessor Cache Compression Algorithm ”, IEEE, and Haris
Lekatsas, August 2010.
[8] Samir Palnitkar, Verilog HDL A guide to Digital Design and Synthesis, 3rd
Edition,
SunSoft Press, 1996.
![Page 50: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/50.jpg)
42
APPENDIX
VERILOG HDL FOR COMPRESSION AND DECOMPRESSION,
VERIFICATION AND REPORT
Compression
Top module:
module top_comp_new(sel,clk,reset,dataout);
input [3:0] sel;
input clk;
input reset;
output [7:0] dataout;
wire [3:0] sel;
wire clk;
wire reset;
wire [63:0] datain;
wire [7:0] dataout;
wire [67:0] en_data_out;
wire [63:0] key8;
wire [31:0] key4_1;
wire [31:0] key4_2;
wire [15:0] key2_1;
wire [15:0] key2_2;
wire [15:0] key2_3;
wire [15:0] key2_4;
wire [3:0] addr8;
wire [3:0] addr4_1;
wire [3:0] addr4_2;
wire [3:0] addr2_1;
wire [3:0] addr2_2;
wire [3:0] addr2_3;
wire [3:0] addr2_4;
wire mis8;
wire mis4_1;
wire mis4_2;
wire mis2_1;
wire mis2_2;
![Page 51: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/51.jpg)
43
wire mis2_3;
wire mis2_4;
wire [63:0] data8;
wire [31:0] data4_1;
wire [31:0] data4_2;
wire [15:0] data2_1;
wire [15:0] data2_2;
wire [15:0] data2_3;
wire [15:0] data2_4;
wire pout8;
wire pout4_1;
wire pout4_2;
wire pout2_1;
wire pout2_2;
wire pout2_3;
wire pout2_4;
assign key8 = datain;
assign key4_1 = datain[63:32];
assign key4_2 = datain[31:0];
assign key2_1 = datain[63:48];
assign key2_2 = datain[47:32];
assign key2_3 = datain[31:16];
assign key2_4 = datain[15:0];
assign datain = 64'haabbccdd12345678;
Hash8 c0(key8,clk,reset,addr8);
Hash4 c1(key4_1,key4_2,clk,reset,addr4_1,addr4_2);
Hash2 c2(key2_1,key2_2,key2_3,key2_4,clk,reset,addr2_1,addr2_2,addr2_3,addr2_4);
dict8 c3(addr8,key8,mis8,clk,reset,data8);
dict4 c4(addr4_1,addr4_2,key4_1,key4_2,mis4_1,mis4_2,clk,reset,data4_1,data4_2);
dict2
c5(addr2_1,addr2_2,addr2_3,addr2_4,key2_1,key2_2,key2_3,key2_4,mis2_1,mis2_2,mis2_3,mi
s2_4,clk,reset,data2_1,data2_2,data2_3,data2_4);
phase_comp
c6(datain,data8,data4_1,data4_2,data2_1,data2_2,data2_3,data2_4,clk,reset,mis8,mis4_1,mis4_2
,mis2_1,
mis2_2,mis2_3,mis2_4,pout8,pout4_1,pout4_2,pout2_1,pout2_2,pout2_3,pout2_4);
![Page 52: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/52.jpg)
44
encoder c7(clk,reset,datain,pout8,pout4_1,pout4_2,pout2_1,pout2_2,pout2_3,pout2_4,addr8,
addr4_1,addr4_2,addr2_1,addr2_2,addr2_3,addr2_4,en_data_out);
assign dataout = (sel == 4'b0000) ? en_data_out[7:0]:
(sel == 4'b0001) ? en_data_out[15:8]:
(sel == 4'b0010) ? en_data_out[23:16]:
(sel == 4'b0011) ? en_data_out[31:24]:
(sel == 4'b0100) ? en_data_out[39:32]:
(sel == 4'b0101) ? en_data_out[47:40]:
(sel == 4'b0110) ? en_data_out[55:48]:
(sel == 4'b0111) ? en_data_out[63:56]:
(sel == 4'b1000) ? en_data_out[67:64]: 8'b0;
endmodule
2 byte hash:
module Hash2(key2_1,key2_2,key2_3,key2_4,clk,reset,addr2_1,addr2_2,addr2_3,addr2_4);
input [15:0] key2_1;
input [15:0] key2_2;
input [15:0] key2_3;
input [15:0] key2_4;
input clk;
input reset;
output [3:0] addr2_1;
output [3:0] addr2_2;
output [3:0] addr2_3;
output [3:0] addr2_4;
wire [15:0] key2_1;
wire [15:0] key2_2;
wire [15:0] key2_3;
wire [15:0] key2_4;
wire clk;
wire reset;
reg [3:0] addr2_1;
reg [3:0] addr2_2;
reg [3:0] addr2_3;
reg [3:0] addr2_4;
reg [19:0] hashtable2_1 [7:0];
reg [19:0] hashtable2_2 [7:0];
reg [19:0] hashtable2_3 [7:0];
![Page 53: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/53.jpg)
45
reg [19:0] hashtable2_4 [7:0];
reg match_1;
reg match_2;
reg match_3;
reg match_4;
reg [3:0] count_1;
reg [3:0] count_2;
reg [3:0] count_3;
reg [3:0] count_4;
reg [2:0] i_1;
reg [2:0] i_2;
reg [2:0] i_3;
reg [2:0] i_4;
always@(posedge clk)
begin
if(reset == 1'b1)
begin
hashtable2_1[0] <= 20'd0;
hashtable2_1[1] <= 20'd0;
hashtable2_1[2] <= 20'd0;
hashtable2_1[3] <= 20'd0;
hashtable2_1[4] <= 20'd0;
hashtable2_1[5] <= 20'd0;
hashtable2_1[6] <= 20'd0;
hashtable2_1[7] <= 20'd0;
count_1 <= 4'd0;
i_1 <= 3'd0;
end
else if(match_1 == 1'b0)
begin
hashtable2_1[i_1] <= {key2_1,count_1};
count_1 <= count_1 + 1;
i_1 <= i_1 + 1;
end
end
always@( reset or key2_1)
begin
if(reset == 1'b1)
begin
addr2_1 = 4'd0;
![Page 54: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/54.jpg)
46
match_1 = 1'd0;
end
else if(key2_1 == hashtable2_1[0][19:4])
begin
addr2_1 = hashtable2_1[0][3:0];
match_1 = 1'd1;
end
else if(key2_1 == hashtable2_1[1][19:4])
begin
addr2_1 = hashtable2_1[1][3:0];
match_1 = 1'd1;
end
else if(key2_1 == hashtable2_1[2][19:4])
begin
addr2_1 = hashtable2_1[2][3:0];
match_1 = 1'd1;
end
else if(key2_1 == hashtable2_1[3][19:4])
begin
addr2_1 = hashtable2_1[3][3:0];
match_1 = 1'd1;
end
else if(key2_1 == hashtable2_1[4][19:4])
begin
addr2_1 = hashtable2_1[4][3:0];
match_1 = 1'd1;
end
else if(key2_1 == hashtable2_1[5][19:4])
begin
addr2_1 = hashtable2_1[5][3:0];
match_1 = 1'd1;
end
else if(key2_1 == hashtable2_1[6][19:4])
begin
addr2_1 = hashtable2_1[6][3:0];
match_1 = 1'd1;
end
else if(key2_1 == hashtable2_1[7][19:4])
![Page 55: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/55.jpg)
47
begin
addr2_1 = hashtable2_1[7][3:0];
match_1 = 1'd1;
end
else
begin
addr2_1 = addr2_1;
match_1 = 1'd0;
end
end
always@(posedge clk)
begin
if(reset == 1'b1)
begin
hashtable2_2[0] <= 20'd0;
hashtable2_2[1] <= 20'd0;
hashtable2_2[2] <= 20'd0;
hashtable2_2[3] <= 20'd0;
hashtable2_2[4] <= 20'd0;
hashtable2_2[5] <= 20'd0;
hashtable2_2[6] <= 20'd0;
hashtable2_2[7] <= 20'd0;
count_2 <= 4'd0;
i_2 <= 3'd0;
end
else if(match_2 == 1'b0)
begin
hashtable2_2[i_2] <= {key2_2,count_2};
count_2 <= count_2 + 1;
i_2 <= i_2 + 1;
end
end
always@( reset or key2_2 )
begin
if(reset == 1'b1)
begin
addr2_2 = 4'd0;
match_2 = 1'd0;
end
else if(key2_2 == hashtable2_2[0][19:4])
begin
![Page 56: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/56.jpg)
48
addr2_2 = hashtable2_2[0][3:0];
match_2 = 1'd1;
end
else if(key2_2 == hashtable2_2[1][19:4])
begin
addr2_2 = hashtable2_2[1][3:0];
match_2 = 1'd1;
end
else if(key2_2 == hashtable2_2[2][19:4])
begin
addr2_2 = hashtable2_2[2][3:0];
match_2 = 1'd1;
end
else if(key2_2 == hashtable2_2[3][19:4])
begin
addr2_2 = hashtable2_2[3][3:0];
match_2 = 1'd1;
end
else if(key2_2 == hashtable2_2[4][19:4])
begin
addr2_2 = hashtable2_2[4][3:0];
match_2 = 1'd1;
end
else if(key2_2 == hashtable2_2[5][19:4])
begin
addr2_2 = hashtable2_2[5][3:0];
match_2 = 1'd1;
end
else if(key2_2 == hashtable2_2[6][19:4])
begin
addr2_2 = hashtable2_2[6][3:0];
match_2 = 1'd1;
end
else if(key2_2 == hashtable2_2[7][19:4])
begin
addr2_2 = hashtable2_2[7][3:0];
match_2 = 1'd1;
end
![Page 57: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/57.jpg)
49
else
begin
addr2_2 = addr2_2;
match_2 = 1'd0;
end
end
always@(posedge clk)
begin
if(reset == 1'b1)
begin
hashtable2_3[0] <= 20'd0;
hashtable2_3[1] <= 20'd0;
hashtable2_3[2] <= 20'd0;
hashtable2_3[3] <= 20'd0;
hashtable2_3[4] <= 20'd0;
hashtable2_3[5] <= 20'd0;
hashtable2_3[6] <= 20'd0;
hashtable2_3[7] <= 20'd0;
count_3 <= 4'd0;
i_3 <= 3'd0;
end
else if(match_3 == 1'b0)
begin
hashtable2_3[i_3] <= {key2_3,count_3};
count_3 <= count_3 + 1;
i_3 <= i_3 + 1;
end
end
always@( reset or key2_3)
begin
if(reset == 1'b1)
begin
addr2_3 = 4'd0;
match_3 = 1'd0;
end
else if(key2_3 == hashtable2_3[0][19:4])
begin
addr2_3 = hashtable2_3[0][3:0];
match_3 = 1'd1;
end
else if(key2_3 == hashtable2_3[1][19:4])
![Page 58: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/58.jpg)
50
begin
addr2_3 = hashtable2_3[1][3:0];
match_3 = 1'd1;
end
else if(key2_3 == hashtable2_3[2][19:4])
begin
addr2_3 = hashtable2_3[2][3:0];
match_3 = 1'd1;
end
else if(key2_3 == hashtable2_3[3][19:4])
begin
addr2_3 = hashtable2_3[3][3:0];
match_3 = 1'd1;
end
else if(key2_3 == hashtable2_3[4][19:4])
begin
addr2_3 = hashtable2_3[4][3:0];
match_3 = 1'd1;
end
else if(key2_3 == hashtable2_3[5][19:4])
begin
addr2_3 = hashtable2_3[5][3:0];
match_3 = 1'd1;
end
else if(key2_3 == hashtable2_3[6][19:4])
begin
addr2_3 = hashtable2_3[6][3:0];
match_3 = 1'd1;
end
else if(key2_3 == hashtable2_3[7][19:4])
begin
addr2_3 = hashtable2_3[7][3:0];
match_3 = 1'd1;
end
else
begin
addr2_3 = addr2_3;
match_3 = 1'd0;
end
![Page 59: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/59.jpg)
51
end
always@(posedge clk)
begin
if(reset == 1'b1)
begin
hashtable2_4[0] <= 20'd0;
hashtable2_4[1] <= 20'd0;
hashtable2_4[2] <= 20'd0;
hashtable2_4[3] <= 20'd0;
hashtable2_4[4] <= 20'd0;
hashtable2_4[5] <= 20'd0;
hashtable2_4[6] <= 20'd0;
hashtable2_4[7] <= 20'd0;
count_4 <= 4'd0;
i_4 <= 3'd0;
end
else if(match_4 == 1'b0)
begin
hashtable2_4[i_4] <= {key2_4,count_4};
count_4 <= count_4 + 1;
i_4 <= i_4 + 1;
end
end
always@( reset or key2_4 )
begin
if(reset == 1'b1)
begin
addr2_4 = 4'd0;
match_4 = 1'd0;
end
else if(key2_4 == hashtable2_4[0][19:4])
begin
addr2_4 = hashtable2_4[0][3:0];
match_4 = 1'd1;
end
else if(key2_4 == hashtable2_4[1][19:4])
begin
addr2_4 = hashtable2_4[1][3:0];
match_4 = 1'd1;
end
![Page 60: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/60.jpg)
52
else if(key2_4 == hashtable2_4[2][19:4])
begin
addr2_4 = hashtable2_4[2][3:0];
match_4 = 1'd1;
end
else if(key2_4 == hashtable2_4[3][19:4])
begin
addr2_4 = hashtable2_4[3][3:0];
match_4 = 1'd1;
end
else if(key2_4 == hashtable2_4[4][19:4])
begin
addr2_4 = hashtable2_4[4][3:0];
match_4 = 1'd1;
end
else if(key2_4 == hashtable2_4[5][19:4])
begin
addr2_4 = hashtable2_4[5][3:0];
match_4 = 1'd1;
end
else if(key2_4 == hashtable2_4[6][19:4])
begin
addr2_4 = hashtable2_4[6][3:0];
match_4 = 1'd1;
end
else if(key2_4 == hashtable2_4[7][19:4])
begin
addr2_4 = hashtable2_4[7][3:0];
match_4 = 1'd1;
end
else
begin
addr2_4 = addr2_4;
match_4 = 1'd0;
end
end
endmodule
4 byte hash:
![Page 61: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/61.jpg)
53
module Hash4(key4_1,key4_2,clk,reset,addr4_1,addr4_2);
input [31:0] key4_1;
input [31:0] key4_2;
input clk;
input reset;
output [3:0] addr4_1;
output [3:0] addr4_2;
wire [31:0] key4_1;
wire [31:0] key4_2;
wire clk;
wire reset;
reg [3:0] addr4_1;
reg [3:0] addr4_2;
reg [35:0] hashtable4_1 [7:0];
reg match_1;
reg [3:0] count_1;
reg [2:0] i_1;
reg [35:0] hashtable4_2 [7:0];
reg match_2;
reg [3:0] count_2;
reg [2:0 ] i_2;
always@(posedge clk)
begin
if(reset == 1'b1)
begin
hashtable4_1[0] <= 36'd0;
hashtable4_1[1] <= 36'd0;
hashtable4_1[2] <= 36'd0;
hashtable4_1[3] <= 36'd0;
hashtable4_1[4] <= 36'd0;
hashtable4_1[5] <= 36'd0;
hashtable4_1[6] <= 36'd0;
hashtable4_1[7] <= 36'd0;
count_1 <= 4'd0;
i_1 <= 3'd0;
end
else if(match_1 == 1'b0)
begin
hashtable4_1[i_1] <= {key4_1,count_1};
![Page 62: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/62.jpg)
54
count_1 <= count_1 + 1;
i_1 <= i_1 + 1;
end
end
always@( reset or key4_1)
begin
if(reset == 1'b1)
begin
addr4_1 = 4'd0;
match_1 = 1'd0;
end
else if(key4_1 == hashtable4_1[0][35:4])
begin
addr4_1 = hashtable4_1[0][3:0];
match_1 = 1'd1;
end
else if(key4_1 == hashtable4_1[1][35:4])
begin
addr4_1 = hashtable4_1[1][3:0];
match_1 = 1'd1;
end
else if(key4_1 == hashtable4_1[2][35:4])
begin
addr4_1 = hashtable4_1[2][3:0];
match_1 = 1'd1;
end
else if(key4_1 == hashtable4_1[3][35:4])
begin
addr4_1 = hashtable4_1[3][3:0];
match_1 = 1'd1;
end
else if(key4_1 == hashtable4_1[4][35:4])
begin
addr4_1 = hashtable4_1[4][3:0];
match_1 = 1'd1;
end
else if(key4_1 == hashtable4_1[5][35:4])
begin
addr4_1 = hashtable4_1[5][3:0];
![Page 63: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/63.jpg)
55
match_1 = 1'd1;
end
else if(key4_1 == hashtable4_1[6][35:4])
begin
addr4_1 = hashtable4_1[6][3:0];
match_1 = 1'd1;
end
else if(key4_1 == hashtable4_1[7][35:4])
begin
addr4_1 = hashtable4_1[7][3:0];
match_1 = 1'd1;
end
else
begin
addr4_1 = addr4_1;
match_1 = 1'd0;
end
end
always@(posedge clk)
begin
if(reset == 1'b1)
begin
hashtable4_2[0] <= 36'd0;
hashtable4_2[1] <= 36'd0;
hashtable4_2[2] <= 36'd0;
hashtable4_2[3] <= 36'd0;
hashtable4_2[4] <= 36'd0;
hashtable4_2[5] <= 36'd0;
hashtable4_2[6] <= 36'd0;
hashtable4_2[7] <= 36'd0;
count_2 <= 4'd0;
i_2 <= 3'd0;
end
else if(match_2 == 1'b0)
begin
hashtable4_2[i_2] <= {key4_2,count_2};
count_2 <= count_2 + 1;
i_2 <= i_2 + 1;
end
end
![Page 64: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/64.jpg)
56
always@( reset or key4_2 )
begin
if(reset == 1'b1)
begin
addr4_2 = 4'd0;
match_2 = 1'd0;
end
else if(key4_2 == hashtable4_2[0][35:4])
begin
addr4_2 = hashtable4_2[0][3:0];
match_2 = 1'd1;
end
else if(key4_2 == hashtable4_2[1][35:4])
begin
addr4_2 = hashtable4_2[1][3:0];
match_2 = 1'd1;
end
else if(key4_2 == hashtable4_2[2][35:4])
begin
addr4_2 = hashtable4_2[2][3:0];
match_2 = 1'd1;
end
else if(key4_2 == hashtable4_2[3][35:4])
begin
addr4_2 = hashtable4_2[3][3:0];
match_2 = 1'd1;
end
else if(key4_2 == hashtable4_2[4][35:4])
begin
addr4_2 = hashtable4_2[4][3:0];
match_2 = 1'd1;
end
else if(key4_2 == hashtable4_2[5][35:4])
begin
addr4_2 = hashtable4_2[5][3:0];
match_2 = 1'd1;
end
else if(key4_2 == hashtable4_2[6][35:4])
begin
![Page 65: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/65.jpg)
57
addr4_2 = hashtable4_2[6][3:0];
match_2 = 1'd1;
end
else if(key4_2 == hashtable4_2[7][35:4])
begin
addr4_2 = hashtable4_2[7][3:0];
match_2 = 1'd1;
end
else
begin
addr4_2 = addr4_2;
match_2 = 1'd0;
end
end
endmodule
8 byte hash:
module Hash8(key8,clk,reset,addr8);
input [63:0] key8;
input clk;
input reset;
output [3:0] addr8;
wire [63:0] key8;
wire clk;
wire reset;
reg [3:0] addr8;
reg [67:0] hashtable8 [7:0];
reg match;
reg [3:0] count;
reg [2:0] i;
always@(posedge clk)
begin
if(reset == 1'b1)
begin
hashtable8[0] <= 68'd0;
hashtable8[1] <= 68'd0;
hashtable8[2] <= 68'd0;
hashtable8[3] <= 68'd0;
hashtable8[4] <= 68'd0;
![Page 66: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/66.jpg)
58
hashtable8[5] <= 68'd0;
hashtable8[6] <= 68'd0;
hashtable8[7] <= 68'd0;
count <= 4'd0;
i <= 3'd0;
end
else if(match == 1'b0)
begin
hashtable8[i] <= {key8,count};
count <= count + 1;
i <= i + 1;
end
end
always@( reset or key8 )
begin
if(reset == 1'b1)
begin
addr8 = 4'd0;
match = 1'd0;
end
else if(key8 == hashtable8[0][67:4])
begin
addr8 = hashtable8[0][3:0];
match = 1'd1;
end
else if(key8 == hashtable8[1][67:4])
begin
addr8 = hashtable8[1][3:0];
match = 1'd1;
end
else if(key8 == hashtable8[2][67:4])
begin
addr8 = hashtable8[2][3:0];
match = 1'd1;
end
else if(key8 == hashtable8[3][67:4])
begin
addr8 = hashtable8[3][3:0];
match = 1'd1;
end
![Page 67: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/67.jpg)
59
else if(key8 == hashtable8[4][67:4])
begin
addr8 = hashtable8[4][3:0];
match = 1'd1;
end
else if(key8 == hashtable8[5][67:4])
begin
addr8 = hashtable8[5][3:0];
match = 1'd1;
end
else if(key8 == hashtable8[6][67:4])
begin
addr8 = hashtable8[6][3:0];
match = 1'd1;
end
else if(key8 == hashtable8[7][67:4])
begin
addr8 = hashtable8[7][3:0];
match = 1'd1;
end
else
begin
addr8 = addr8;
match = 1'd0;
end
end
endmodule
2 byte dictionary:
module
dict2(addr2_1,addr2_2,addr2_3,addr2_4,key2_1,key2_2,key2_3,key2_4,mis2_1,mis2_2,mis2_3,
mis2_4,clk,reset,data2_1,data2_2,data2_3,data2_4);
input [3:0] addr2_1;
input [3:0] addr2_2;
input [3:0] addr2_3;
input [3:0] addr2_4;
input clk;
input reset;
input mis2_1;
![Page 68: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/68.jpg)
60
input mis2_2;
input mis2_3;
input mis2_4;
input [15:0] key2_1;
input [15:0] key2_2;
input [15:0] key2_3;
input [15:0] key2_4;
output [15:0] data2_1;
output [15:0] data2_2;
output [15:0] data2_3;
output [15:0] data2_4;
wire [3:0] addr2_1;
wire [3:0] addr2_2;
wire [3:0] addr2_3;
wire [3:0] addr2_4;
wire [15:0] key2_1;
wire [15:0] key2_2;
wire [15:0] key2_3;
wire [15:0] key2_4;
wire clk;
wire reset;
wire mis2_1;
wire mis2_2;
wire mis2_3;
wire mis2_4;
wire [15:0] data2_1;
wire [15:0] data2_2;
wire [15:0] data2_3;
wire [15:0] data2_4;
reg [15:0] dictionary2_1 [15:0];
reg [15:0] dictionary2_2 [15:0];
reg [15:0] dictionary2_3 [15:0];
reg [15:0] dictionary2_4 [15:0];
reg [3:0] count2_1;
reg [3:0] count2_2;
reg [3:0] count2_3;
![Page 69: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/69.jpg)
61
reg [3:0] count2_4;
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary2_1[0] <= 16'd0;
dictionary2_1[1] <= 16'd0;
dictionary2_1[2] <= 16'd0;
dictionary2_1[3] <= 16'd0;
dictionary2_1[4] <= 16'd0;
dictionary2_1[5] <= 16'd0;
dictionary2_1[6] <= 16'd0;
dictionary2_1[7] <= 16'd0;
dictionary2_1[8] <= 16'd0;
dictionary2_1[9] <= 16'd0;
dictionary2_1[10] <= 16'd0;
dictionary2_1[11] <= 16'd0;
dictionary2_1[12] <= 16'd0;
dictionary2_1[13] <= 16'd0;
dictionary2_1[14] <= 16'd0;
dictionary2_1[15] <= 16'd0;
count2_1 <= 4'd0;
end
else if(mis2_1 == 1'b1)
begin
dictionary2_1[count2_1] <= key2_1;
count2_1 <= count2_1 + 1;
end
end
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary2_2[0] <= 16'd0;
dictionary2_2[1] <= 16'd0;
dictionary2_2[2] <= 16'd0;
dictionary2_2[3] <= 16'd0;
dictionary2_2[4] <= 16'd0;
dictionary2_2[5] <= 16'd0;
dictionary2_2[6] <= 16'd0;
dictionary2_2[7] <= 16'd0;
dictionary2_2[8] <= 16'd0;
dictionary2_2[9] <= 16'd0;
![Page 70: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/70.jpg)
62
dictionary2_2[10] <= 16'd0;
dictionary2_2[11] <= 16'd0;
dictionary2_2[12] <= 16'd0;
dictionary2_2[13] <= 16'd0;
dictionary2_2[14] <= 16'd0;
dictionary2_2[15] <= 16'd0;
count2_2 <= 4'd0;
end
else if(mis2_2 == 1'b1)
begin
dictionary2_2[count2_2] <= key2_2;
count2_2 <= count2_2 + 1;
end
end
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary2_3[0] <= 16'd0;
dictionary2_3[1] <= 16'd0;
dictionary2_3[2] <= 16'd0;
dictionary2_3[3] <= 16'd0;
dictionary2_3[4] <= 16'd0;
dictionary2_3[5] <= 16'd0;
dictionary2_3[6] <= 16'd0;
dictionary2_3[7] <= 16'd0;
dictionary2_3[8] <= 16'd0;
dictionary2_3[9] <= 16'd0;
dictionary2_3[10] <= 16'd0;
dictionary2_3[11] <= 16'd0;
dictionary2_3[12] <= 16'd0;
dictionary2_3[13] <= 16'd0;
dictionary2_3[14] <= 16'd0;
dictionary2_3[15] <= 16'd0;
count2_3 <= 4'd0;
end
else if(mis2_3 == 1'b1)
begin
dictionary2_3[count2_3] <= key2_3;
count2_3 <= count2_3 + 1;
end
end
![Page 71: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/71.jpg)
63
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary2_4[0] <= 16'd0;
dictionary2_4[1] <= 16'd0;
dictionary2_4[2] <= 16'd0;
dictionary2_4[3] <= 16'd0;
dictionary2_4[4] <= 16'd0;
dictionary2_4[5] <= 16'd0;
dictionary2_4[6] <= 16'd0;
dictionary2_4[7] <= 16'd0;
dictionary2_4[8] <= 16'd0;
dictionary2_4[9] <= 16'd0;
dictionary2_4[10] <= 16'd0;
dictionary2_4[11] <= 16'd0;
dictionary2_4[12] <= 16'd0;
dictionary2_4[13] <= 16'd0;
dictionary2_4[14] <= 16'd0;
dictionary2_4[15] <= 16'd0;
count2_4 <= 4'd0;
end
else if(mis2_4 == 1'b1)
begin
dictionary2_4[count2_1] <= key2_4;
count2_4 <= count2_4 + 1;
end
end
assign data2_1 = dictionary2_1[addr2_1];
assign data2_2 = dictionary2_2[addr2_2];
assign data2_3 = dictionary2_3[addr2_3];
assign data2_4 = dictionary2_4[addr2_4];
endmodule
4 byte dictionary:
module dict4(addr4_1,addr4_2,key4_1,key4_2,mis4_1,mis4_2,clk,reset,data4_1,data4_2);
input [3:0] addr4_1;
input [3:0] addr4_2;
input [31:0] key4_1;
![Page 72: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/72.jpg)
64
input [31:0] key4_2;
input clk;
input reset;
input mis4_1;
input mis4_2;
output [31:0] data4_1;
output [31:0] data4_2;
wire [3:0] addr4_1;
wire [3:0] addr4_2;
wire [31:0] key4_1;
wire [31:0] key4_2;
wire clk;
wire reset;
wire mis4_1;
wire mis4_2;
wire [31:0] data4_1;
wire [31:0] data4_2;
reg [31:0] dictionary4_1 [15:0];
reg [31:0] dictionary4_2 [15:0];
reg [3:0] count1;
reg [3:0] count2;
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary4_1[0] <= 32'd0;
dictionary4_1[1] <= 32'd0;
dictionary4_1[2] <= 32'd0;
dictionary4_1[3] <= 32'd0;
dictionary4_1[4] <= 32'd0;
dictionary4_1[5] <= 32'd0;
dictionary4_1[6] <= 32'd0;
dictionary4_1[7] <= 32'd0;
dictionary4_1[8] <= 32'd0;
dictionary4_1[9] <= 32'd0;
dictionary4_1[10] <= 32'd0;
dictionary4_1[11] <= 32'd0;
dictionary4_1[12] <= 32'd0;
dictionary4_1[13] <= 32'd0;
dictionary4_1[14] <= 32'd0;
dictionary4_1[15] <= 32'd0;
count1 <= 4'd0;
end
![Page 73: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/73.jpg)
65
else if(mis4_1 == 1'b1)
begin
dictionary4_1[count1] <= key4_1;
count1 <= count1 + 1;
end
end
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary4_2[0] <= 32'd0;
dictionary4_2[1] <= 32'd0;
dictionary4_2[2] <= 32'd0;
dictionary4_2[3] <= 32'd0;
dictionary4_2[4] <= 32'd0;
dictionary4_2[5] <= 32'd0;
dictionary4_2[6] <= 32'd0;
dictionary4_2[7] <= 32'd0;
dictionary4_2[8] <= 32'd0;
dictionary4_2[9] <= 32'd0;
dictionary4_2[10] <= 32'd0;
dictionary4_2[11] <= 32'd0;
dictionary4_2[12] <= 32'd0;
dictionary4_2[13] <= 32'd0;
dictionary4_2[14] <= 32'd0;
dictionary4_2[15] <= 32'd0;
count2 <= 4'd0;
end
else if(mis4_2 == 1'b1)
begin
dictionary4_2[count2] <= key4_2;
count2 <= count2 + 1;
end
end
assign data4_1 = dictionary4_1[addr4_1];
assign data4_2 = dictionary4_2[addr4_2];
endmodule
8 byte dictionary:
module dict8(addr8,key8,mis8,clk,reset,data8);
input [3:0] addr8;
![Page 74: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/74.jpg)
66
input [63:0] key8;
input clk;
input reset;
input mis8;
output [63:0] data8;
wire [3:0] addr8;
wire [63:0] key8;
wire clk;
wire reset;
wire mis8;
wire [63:0] data8;
reg [63:0] dictionary8 [15:0];
reg [3:0] count;
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary8[0] <= 64'd0;
dictionary8[1] <= 64'd0;
dictionary8[2] <= 64'd0;
dictionary8[3] <= 64'd0;
dictionary8[4] <= 64'd0;
dictionary8[5] <= 64'd0;
dictionary8[6] <= 64'd0;
dictionary8[7] <= 64'd0;
dictionary8[8] <= 64'd0;
dictionary8[9] <= 64'd0;
dictionary8[10] <= 64'd0;
dictionary8[11] <= 64'd0;
dictionary8[12] <= 64'd0;
dictionary8[13] <= 64'd0;
dictionary8[14] <= 64'd0;
dictionary8[15] <= 64'd0;
count <= 4'd0;
end
else if(mis8 == 1'b1)
begin
dictionary8[count] <= key8;
count <= count + 1;
end
end
![Page 75: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/75.jpg)
67
assign data8 = dictionary8[addr8];
endmodule
Phase comparator:
module phase_comp(data8,
d8,
d4_1,
d4_2,
d2_1,
d2_2,
d2_3,
d2_4,
clk,
reset,
mis8,
mis4_1,
mis4_2,
mis2_1,
mis2_2,
mis2_3,
mis2_4,
pout8,
pout4_1,
pout4_2,
pout2_1,
pout2_2,
pout2_3,
pout2_4);
input [63:0] data8;
input [63:0] d8;
input [31:0] d4_1;
input [31:0] d4_2;
input [15:0] d2_1;
input [15:0] d2_2;
input [15:0] d2_3;
input [15:0] d2_4;
input clk;
input reset;
output mis8;
output mis4_1;
output mis4_2;
![Page 76: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/76.jpg)
68
output mis2_1;
output mis2_2;
output mis2_3;
output mis2_4;
output pout8;
output pout4_1;
output pout4_2;
output pout2_1;
output pout2_2;
output pout2_3;
output pout2_4;
wire [63:0] data8;
wire [63:0] d8;
wire [31:0] d4_1;
wire [31:0] d4_2;
wire [15:0] d2_1;
wire [15:0] d2_2;
wire [15:0] d2_3;
wire [15:0] d2_4;
wire clk;
wire reset;
wire mis8;
wire mis4_1;
wire mis4_2;
wire mis2_1;
wire mis2_2;
wire mis2_3;
wire mis2_4;
wire pout8;
wire pout4_1;
wire pout4_2;
wire pout2_1;
wire pout2_2;
wire pout2_3;
wire pout2_4;
wire [63:0] data8_temp;
assign data8_temp = data8;
![Page 77: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/77.jpg)
69
assign mis8 = (reset == 1'd1)?1'd0:
(data8_temp == d8)?1'd0:1'd1;
assign pout8 = (reset == 1'd1)?1'd0:
(data8_temp == d8)?1'd1:1'd0;
assign mis4_1 = (reset == 1'd1)?1'd0:
(data8_temp[63:32] == d4_1)?1'd0:1'd1;
assign pout4_1 = (reset == 1'd1)?1'd0:
(data8_temp[63:32] == d4_1)?1'd1:1'd0;
assign mis4_2 = (reset == 1'd1)?1'd0:
(data8_temp[31:0] == d4_2)?1'd0:1'd1;
assign pout4_2 = (reset == 1'd1)?1'd0:
(data8_temp[31:0] == d4_2)?1'd1:1'd0;
assign mis2_1 = (reset == 1'd1)?1'd0:
(data8_temp[63:48] == d2_1)?1'd0:1'd1;
assign pout2_1 = (reset == 1'd1)?1'd0:
(data8_temp[63:48] == d2_1)?1'd1:1'd0;
assign mis2_2 = (reset == 1'd1)?1'd0:
(data8_temp[47:32] == d2_2)?1'd0:1'd1;
assign pout2_2 = (reset == 1'd1)?1'd0:
(data8_temp[47:32] == d2_2)?1'd1:1'd0;
assign mis2_3 = (reset == 1'd1)?1'd0:
(data8_temp[31:16] == d2_3)?1'd0:1'd1;
assign pout2_3 = (reset == 1'd1)?1'd0:
(data8_temp[31:16] == d2_3)?1'd1:1'd0;
assign mis2_4 = (reset == 1'd1)?1'd0:
(data8_temp[15:0] == d2_4)?1'd0:1'd1;
assign pout2_4 = (reset == 1'd1)?1'd0:
(data8_temp[15:0] == d2_4)?1'd1:1'd0;
endmodule
Encoder:
![Page 78: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/78.jpg)
70
module encoder(clk,
reset,
data8,
pout8,
pout4_1,
pout4_2,
pout2_1,
pout2_2,
pout2_3,
pout2_4,
addr8,
addr4_1,
addr4_2,
addr2_1,
addr2_2,
addr2_3,
addr2_4,
dataout);
input clk;
input reset;
input [63:0] data8;
input pout8;
input pout4_1;
input pout4_2;
input pout2_1;
input pout2_2;
input pout2_3;
input pout2_4;
input [3:0] addr8;
input [3:0] addr4_1;
input [3:0] addr4_2;
input [3:0] addr2_1;
input [3:0] addr2_2;
input [3:0] addr2_3;
input [3:0] addr2_4;
output [67:0] dataout;
wire [63:0] data8;
wire clk;
wire reset;
![Page 79: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/79.jpg)
71
wire pout8;
wire pout4_1;
wire pout4_2;
wire pout2_1;
wire pout2_2;
wire pout2_3;
wire pout2_4;
wire [3:0] addr8;
wire [3:0] addr4_1;
wire [3:0] addr4_2;
wire [3:0] addr2_1;
wire [3:0] addr2_2;
wire [3:0] addr2_3;
wire [3:0] addr2_4;
reg [67:0] dataout;
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dataout <= 67'd0;
end
else if(data8 == 64'd0)
begin
dataout <= 67'd0;
end
else if(pout8 == 1'd1)
begin
dataout <= {4'b0001,addr8};
end
else if(pout4_1 == 1'd1)
begin
dataout <= {data8[31:0],4'b0010,addr4_1};
end
else if(pout4_2 == 1'd1)
begin
dataout <= {data8[63:32],4'b0011,addr4_2};
end
else if(pout2_1 == 1'd1)
begin
dataout <= {data8[47:0],4'b0111,addr2_1};
end
![Page 80: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/80.jpg)
72
else if(pout2_2 == 1'd1)
begin
dataout <= {data8[63:48],data8[31:0],4'b0110,addr2_2};
end
else if(pout2_3 == 1'd1)
begin
dataout <= {data8[63:32],data8[15:0],4'b0101,addr2_3};
end
else if(pout2_4 == 1'd1)
begin
dataout <= {data8[63:16],4'b0100,addr2_4};
end
else
begin
dataout <= {data8,4'b1000};
end
end
endmodule
Decompression
Top module:
module top_dcomp(datain,clk,reset,dataout);
input clk;
input reset;
input [67:0] datain;
output [63:0] dataout;
wire clk;
wire reset;
wire [67:0] datain;
wire [63:0] dataout8;
wire [63:0] Dataout8;
wire [63:0] dataOut8;
wire [67:0] data_out;
wire [63:0] dataout;
![Page 81: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/81.jpg)
73
wire [3:0] addr8;
wire [3:0] addr4;
wire [3:0] addr2;
decoder d1(clk,reset,datain,data_out,addr8,addr4,addr2);
dict8 d2(addr8,
dataout8,
clk,
reset,
dataout);
dict4 d3(addr4,
Dataout8,
clk,
reset,
dataout);
dict2 d4(addr2,
dataOut8,
clk,
reset,
dataout);
data_gen d5(clk,reset,data_out,dataout8,dataout);
endmodule
Decoder:
module decoder( clk,
reset,
datain,
dataout,
addr8,
addr4,
addr2);
input clk;
input reset;
input [67:0] datain;
output [67:0] dataout;
output [3:0] addr8;
![Page 82: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/82.jpg)
74
output [3:0] addr4;
output [3:0] addr2;
wire clk;
wire reset;
wire [67:0] datain;
wire [67:0] dataout;
wire [3:0] addr8;
wire [3:0] addr4;
wire [3:0] addr2;
assign dataout = datain;
assign addr8 = (datain[3:0] == 4'd0)?4'd0:datain[7:3];
assign addr4 = (datain[3:0] == 4'd0)?4'd0:datain[7:3];
assign addr2 = (datain[3:0] == 4'd0)?4'd0:datain[7:3];
endmodule
2 byte dictionary:
module dict2(addr8,dataout8,clk,reset,datain8);
input [3:0] addr8;
input clk;
input reset;
input [63:0] datain8;
output [63:0] dataout8;
wire [3:0] addr8;
wire clk;
wire reset;
wire [63:0] datain8;
wire [63:0] dataout8;
wire [15:0] datain2_1;
wire [15:0] datain2_2;
wire [15:0] datain2_3;
wire [15:0] datain2_4;
reg [15:0] dictionary2_1 [15:0];
reg [15:0] dictionary2_2 [15:0];
![Page 83: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/83.jpg)
75
reg [15:0] dictionary2_3 [15:0];
reg [15:0] dictionary2_4 [15:0];
reg [3:0] count2_1;
reg [3:0] count2_2;
reg [3:0] count2_3;
reg [3:0] count2_4;
assign data2_1 = datain8[15:0];
assign data2_2 = datain8[31:16];
assign data2_3 = datain8[47:32];
assign data2_4 = datain8[63:48];
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary2_1[0] <= 16'd0;
dictionary2_1[1] <= 16'd0;
dictionary2_1[2] <= 16'd0;
dictionary2_1[3] <= 16'd0;
dictionary2_1[4] <= 16'd0;
dictionary2_1[5] <= 16'd0;
dictionary2_1[6] <= 16'd0;
dictionary2_1[7] <= 16'd0;
dictionary2_1[8] <= 16'd0;
dictionary2_1[9] <= 16'd0;
dictionary2_1[10] <= 16'd0;
dictionary2_1[11] <= 16'd0;
dictionary2_1[12] <= 16'd0;
dictionary2_1[13] <= 16'd0;
dictionary2_1[14] <= 16'd0;
dictionary2_1[15] <= 16'd0;
count2_1 <= 4'd0;
end
else if(datain2_1 != dictionary2_1[0] && datain2_1 != dictionary2_1[1] &&
datain2_1 != dictionary2_1[2] &&
datain2_1 != dictionary2_1[3] && datain2_1 != dictionary2_1[4] && datain2_1 !=
dictionary2_1[5] &&
datain2_1 != dictionary2_1[6] && datain2_1 != dictionary2_1[7] && datain2_1 !=
dictionary2_1[8] &&
datain2_1 != dictionary2_1[9] && datain2_1 != dictionary2_1[10] && datain2_1
!= dictionary2_1[11] &&
datain2_1 != dictionary2_1[12] && datain2_1 != dictionary2_1[13] && datain2_1
!= dictionary2_1[14] &&
![Page 84: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/84.jpg)
76
datain2_1 != dictionary2_1[15] )
begin
dictionary2_1[count2_1] <= datain2_1;
count2_1 <= count2_1 + 1;
end
end
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary2_2[0] <= 16'd0;
dictionary2_2[1] <= 16'd0;
dictionary2_2[2] <= 16'd0;
dictionary2_2[3] <= 16'd0;
dictionary2_2[4] <= 16'd0;
dictionary2_2[5] <= 16'd0;
dictionary2_2[6] <= 16'd0;
dictionary2_2[7] <= 16'd0;
dictionary2_2[8] <= 16'd0;
dictionary2_2[9] <= 16'd0;
dictionary2_2[10] <= 16'd0;
dictionary2_2[11] <= 16'd0;
dictionary2_2[12] <= 16'd0;
dictionary2_2[13] <= 16'd0;
dictionary2_2[14] <= 16'd0;
dictionary2_2[15] <= 16'd0;
count2_2 <= 4'd0;
end
else if(datain2_2 != dictionary2_2[0] && datain2_2 != dictionary2_2[1] &&
datain2_2 != dictionary2_2[2] &&
datain2_2 != dictionary2_2[3] && datain2_2 != dictionary2_2[4] && datain2_2 !=
dictionary2_2[5] &&
datain2_2 != dictionary2_2[6] && datain2_2 != dictionary2_2[7] && datain2_2 !=
dictionary2_2[8] &&
datain2_2 != dictionary2_2[9] && datain2_2 != dictionary2_2[10] && datain2_2
!= dictionary2_2[11] &&
datain2_2 != dictionary2_2[12] && datain2_2 != dictionary2_2[13] && datain2_2
!= dictionary2_2[14] &&
datain2_2 != dictionary2_2[15] )
begin
dictionary2_2[count2_2] <= datain2_2;
count2_2 <= count2_2 + 1;
![Page 85: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/85.jpg)
77
end
end
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary2_3[0] <= 16'd0;
dictionary2_3[1] <= 16'd0;
dictionary2_3[2] <= 16'd0;
dictionary2_3[3] <= 16'd0;
dictionary2_3[4] <= 16'd0;
dictionary2_3[5] <= 16'd0;
dictionary2_3[6] <= 16'd0;
dictionary2_3[7] <= 16'd0;
dictionary2_3[8] <= 16'd0;
dictionary2_3[9] <= 16'd0;
dictionary2_3[10] <= 16'd0;
dictionary2_3[11] <= 16'd0;
dictionary2_3[12] <= 16'd0;
dictionary2_3[13] <= 16'd0;
dictionary2_3[14] <= 16'd0;
dictionary2_3[15] <= 16'd0;
count2_3 <= 4'd0;
end
else if(datain2_3 != dictionary2_3[0] && datain2_3 != dictionary2_3[1] &&
datain2_3 != dictionary2_3[2] &&
datain2_3 != dictionary2_3[3] && datain2_3 != dictionary2_3[4] && datain2_3 !=
dictionary2_3[5] &&
datain2_3 != dictionary2_3[6] && datain2_3 != dictionary2_3[7] && datain2_3 !=
dictionary2_3[8] &&
datain2_3 != dictionary2_3[9] && datain2_3 != dictionary2_3[10] && datain2_3
!= dictionary2_3[11] &&
datain2_3 != dictionary2_3[12] && datain2_3 != dictionary2_3[13] && datain2_3
!= dictionary2_3[14] &&
datain2_3 != dictionary2_3[15] )
begin
dictionary2_3[count2_3] <= datain2_3;
count2_3 <= count2_3 + 1;
end
end
always@(posedge clk)
begin
![Page 86: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/86.jpg)
78
if(reset == 1'b1)
begin
dictionary2_4[0] <= 16'd0;
dictionary2_4[1] <= 16'd0;
dictionary2_4[2] <= 16'd0;
dictionary2_4[3] <= 16'd0;
dictionary2_4[4] <= 16'd0;
dictionary2_4[5] <= 16'd0;
dictionary2_4[6] <= 16'd0;
dictionary2_4[7] <= 16'd0;
dictionary2_4[8] <= 16'd0;
dictionary2_4[9] <= 16'd0;
dictionary2_4[10] <= 16'd0;
dictionary2_4[11] <= 16'd0;
dictionary2_4[12] <= 16'd0;
dictionary2_4[13] <= 16'd0;
dictionary2_4[14] <= 16'd0;
dictionary2_4[15] <= 16'd0;
count2_4 <= 4'd0;
end
else if(datain2_4 != dictionary2_4[0] && datain2_4 != dictionary2_4[1] &&
datain2_4 != dictionary2_4[2] &&
datain2_4 != dictionary2_4[3] && datain2_4 != dictionary2_4[4] && datain2_4 !=
dictionary2_4[5] &&
datain2_4 != dictionary2_4[6] && datain2_4 != dictionary2_4[7] && datain2_4 !=
dictionary2_4[8] &&
datain2_4 != dictionary2_4[9] && datain2_4 != dictionary2_4[10] && datain2_4
!= dictionary2_4[11] &&
datain2_4 != dictionary2_4[12] && datain2_4 != dictionary2_4[13] && datain2_4
!= dictionary2_4[14] &&
datain2_4 != dictionary2_4[15] )
begin
dictionary2_4[count2_4] <= datain2_4;
count2_4 <= count2_4 + 1;
end
end
assign dataout8 =
{dictionary2_4[addr8],dictionary2_3[addr8],dictionary2_2[addr8],dictionary2_1[addr8]};
endmodule
4 byte dictionary:
![Page 87: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/87.jpg)
79
module dict4(addr8,dataout8,clk,reset,datain8);
input [3:0] addr8;
input clk;
input reset;
input [63:0] datain8;
output [63:0] dataout8;
wire [3:0] addr8;
wire clk;
wire reset;
wire [63:0] datain8;
wire [63:0] dataout8;
wire [31:0] datain4_1;
wire [31:0] datain4_2;
reg [31:0] dictionary4_1 [15:0];
reg [31:0] dictionary4_2 [15:0];
reg [3:0] count4_1;
reg [3:0] count4_2;
assign data4_1 = datain8[31:0];
assign data4_2 = datain8[63:32];
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary4_1[0] <= 32'd0;
dictionary4_1[1] <= 32'd0;
dictionary4_1[2] <= 32'd0;
dictionary4_1[3] <= 32'd0;
dictionary4_1[4] <= 32'd0;
dictionary4_1[5] <= 32'd0;
dictionary4_1[6] <= 32'd0;
dictionary4_1[7] <= 32'd0;
dictionary4_1[8] <= 32'd0;
dictionary4_1[9] <= 32'd0;
dictionary4_1[10] <= 32'd0;
dictionary4_1[11] <= 32'd0;
dictionary4_1[12] <= 32'd0;
![Page 88: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/88.jpg)
80
dictionary4_1[13] <= 32'd0;
dictionary4_1[14] <= 32'd0;
dictionary4_1[15] <= 32'd0;
count4_1 <= 4'd0;
end
else if(datain4_1 != dictionary4_1[0] && datain4_1 != dictionary4_1[1] &&
datain4_1 != dictionary4_1[2] &&
datain4_1 != dictionary4_1[3] && datain4_1 != dictionary4_1[4] && datain4_1 !=
dictionary4_1[5] &&
datain4_1 != dictionary4_1[6] && datain4_1 != dictionary4_1[7] && datain4_1 !=
dictionary4_1[8] &&
datain4_1 != dictionary4_1[9] && datain4_1 != dictionary4_1[10] && datain4_1
!= dictionary4_1[11] &&
datain4_1 != dictionary4_1[12] && datain4_1 != dictionary4_1[13] && datain4_1
!= dictionary4_1[14] &&
datain4_1 != dictionary4_1[15] )
begin
dictionary4_1[count4_1] <= datain4_1;
count4_1 <= count4_1 + 1;
end
end
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary4_2[0] <= 32'd0;
dictionary4_2[1] <= 32'd0;
dictionary4_2[2] <= 32'd0;
dictionary4_2[3] <= 32'd0;
dictionary4_2[4] <= 32'd0;
dictionary4_2[5] <= 32'd0;
dictionary4_2[6] <= 32'd0;
dictionary4_2[7] <= 32'd0;
dictionary4_2[8] <= 32'd0;
dictionary4_2[9] <= 32'd0;
dictionary4_2[10] <= 32'd0;
dictionary4_2[11] <= 32'd0;
dictionary4_2[12] <= 32'd0;
dictionary4_2[13] <= 32'd0;
dictionary4_2[14] <= 32'd0;
dictionary4_2[15] <= 32'd0;
count4_2 <= 4'd0;
end
![Page 89: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/89.jpg)
81
else if(datain4_2 != dictionary4_2[0] && datain4_2 != dictionary4_2[1] &&
datain4_2 != dictionary4_2[2] &&
datain4_2 != dictionary4_2[3] && datain4_2 != dictionary4_2[4] && datain4_2 !=
dictionary4_2[5] &&
datain4_2 != dictionary4_2[6] && datain4_2 != dictionary4_2[7] && datain4_2 !=
dictionary4_2[8] &&
datain4_2 != dictionary4_2[9] && datain4_2 != dictionary4_2[10] && datain4_2
!= dictionary4_2[11] &&
datain4_2 != dictionary4_2[12] && datain4_2 != dictionary4_2[13] && datain4_2
!= dictionary4_2[14] &&
datain4_2 != dictionary4_2[15] )
begin
dictionary4_2[count4_2] <= datain4_2;
count4_2 <= count4_2 + 1;
end
end
assign dataout8 = {dictionary4_1[addr8],dictionary4_2[addr8]};
endmodule
8 byte dictionary:
module dict8(addr8,dataout8,clk,reset,datain8);
input [3:0] addr8;
input clk;
input reset;
input [63:0] datain8;
output [63:0] dataout8;
wire [3:0] addr8;
wire clk;
wire reset;
wire [63:0] datain8;
wire [63:0] dataout8;
reg [63:0] dictionary8 [15:0];
reg [3:0] count;
![Page 90: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/90.jpg)
82
always@(posedge clk)
begin
if(reset == 1'b1)
begin
dictionary8[0] <= 64'd0;
dictionary8[1] <= 64'd0;
dictionary8[2] <= 64'd0;
dictionary8[3] <= 64'd0;
dictionary8[4] <= 64'd0;
dictionary8[5] <= 64'd0;
dictionary8[6] <= 64'd0;
dictionary8[7] <= 64'd0;
dictionary8[8] <= 64'd0;
dictionary8[9] <= 64'd0;
dictionary8[10] <= 64'd0;
dictionary8[11] <= 64'd0;
dictionary8[12] <= 64'd0;
dictionary8[13] <= 64'd0;
dictionary8[14] <= 64'd0;
dictionary8[15] <= 64'd0;
count <= 4'd0;
end
else if(datain8 != dictionary8[0] && datain8 != dictionary8[1] && datain8 !=
dictionary8[2] &&
datain8 != dictionary8[3] && datain8 != dictionary8[4] && datain8 !=
dictionary8[5] &&
datain8 != dictionary8[6] && datain8 != dictionary8[7] && datain8 !=
dictionary8[8] &&
datain8 != dictionary8[9] && datain8 != dictionary8[10] && datain8 !=
dictionary8[11] &&
datain8 != dictionary8[12] && datain8 != dictionary8[13] && datain8 !=
dictionary8[14] &&
datain8 != dictionary8[15] )
begin
dictionary8[count] <= datain8;
count <= count + 1;
end
end
assign dataout8 = dictionary8[addr8];
endmodule
Data generator:
![Page 91: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/91.jpg)
83
module data_gen( clk,
reset,
datain,
datain8,
dataout);
input clk;
input reset;
input [67:0] datain;
input [63:0] datain8;
output [63:0] dataout;
wire clk;
wire reset;
wire [67:0] datain;
wire [63:0] datain8;
wire [63:0] dataout;
assign dataout = (datain[7:4] == 4'b0001)? datain8:
(datain[7:4] == 4'b0010)? {datain8[63:32],datain[39:8]}:
(datain[7:4] == 4'b0011)? {datain[39:8],datain8[31:0]}:
(datain[7:4] == 4'b0100)? {datain[55:8],datain8[15:0]}:
(datain[7:4] == 4'b0101)? {datain[55:24],datain8[31:16],datain[23:8]}:
(datain[7:4] == 4'b0110)? {datain[55:40],datain8[47:32],datain[39:8]}:
(datain[7:4] == 4'b0111)? {datain8[63:48],datain[55:8]}:
(datain[7:4] == 4'b1000)? {datain[67:4]}:64'd0;
endmodule
![Page 92: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE High-Throughput](https://reader034.vdocuments.site/reader034/viewer/2022042019/6255dbf77bbd9904322d2009/html5/thumbnails/92.jpg)
84