paper review avelino zepeda martinez high performance reconfigurable pipelined matrix multiplication...
TRANSCRIPT
Paper ReviewAvelino Zepeda Martinez
High Performance Reconfigurable Pipelined
Matrix Multiplication Module Designer
• Usage– Communication Systems– Signal and Video Processing
• Issues– Operations of square matrices increase as
functions of n3
Area Speed Power
2.- Background
• For the multiplication of an m*r matrix and an r*n matrix the result is found with:
• Number of multiplications, M, and additions, A, increase with matrices sizes
3.- Matrix Multiplication
• Basic Matrix Multiplication block using dt
• Can perform any matrix multiplication– Inefficient
5.- Matrix Multiplication (Cont.)
• Three types of errors– Number Representation
ADCs Sampling Rate Available Bits
– Rounding Error Round to Nearest Even (RNE) Round Towards Zero, or Truncation (TRA) Round Down (Floor) Round Up (Ceiling) Round Away from Zero
– Algorithm/Design Error
6.- Error Analysis
• Reconfigurable Matrix Multiplication Module Designer (RMD)
– Designed in Pearl scripting language
– Outputs: RTL of Multiplication Module Testbench MATLAB files Modelsim verification files
– Designed to output RTL for FPGA and VLSI
8.- Design Overview
• Three main sections– Module Designer– Area, Speed, and Error Analysis– High Speed Memory Interface
9.- RMD FPGA Design Flow
• Main Design• Outputs RTL
– Matrix Multiplication Processing Unit (MMPU)– Memory Interface– Control Unit (CU)
10.- Module Designer
• RTL created for 2x2 matrix to 2048x2048 matrix
• Composed of:– Matrix Multiplier Block (p-MMB)– Internal Logic
11.- Module Designer (MMPU)
• Bottom-Up Design Approach• Start with 2-MMB, or 2X2, which is the
pipelined version of dt
• Insert adders after 2-MMB blocks
12.- Module Designer (p-MMB)
• Inputs– Stored in DDR2 memory of FPGA– Normalized to -2 < ai,j, bi,j < 2
• Multiplied values in range of -4n < ci,j < 4n
• Include a 1-bit sign extension block in adders– If inputs are r-bits, then output is 2r+k.– k is 0 after multipliers, and is incremented by 1
after adders
13.- Module Designer (p-MMB Cont.)
• Input values are represented with Integer bits and Fraction bits
• Inputs are fixed point and normalized, therefore
• Output is stored in memory– (2r+10)-bit, (2x+10, 2y)– Rounded back to r-bit using RNE
14.- Module Designer (Memory Interface)
• Can be created for Fixed or Variable Operation Size
• Designed to use Finite State Machine
• For variable size each operation size has a sub-FSM
15.- Module Designer (Control Unit)
• RMD also generates MATLAB and Testbench files– Improves accuracy of output Matrix– Reduces design and verification time
• MATLAB creates data files for the Testbench– Maximum input values supported
Bit size: 64 bits Matrix Size: 2048x2048 Test Vectors: 100
• Data tested on Testbench using Modelsim
16.- Area, Speed, and Error Analysis
• Output data is compared with actual values obtained from MATLAB
• Errors obtained:– Absolute Error (ε)
– Relative Error (η)
– Rounding Error(μ)
– Total Rounding Error (μtotal)
17.- Area, Speed, and Error Analysis (Cont.)
• RMD calculates the estimated area– Area = Matrix Multiplier Block + Memory + Control
• These calculations use:– n :Maximum Matrix Multiplication Size– r :Input bits– p :Matrix Multiplier Block Size– Mr :r-bit Multiplier– Ar :r-bit Adder– Rr :r-bit Register– Muxr :r-bit (2-1) Mux– RNE :(2r+kmax to r) bit Rounding– HA :Half Adder– Memr :r-bit Memory– FF :Flip Flops– Fmax :Maximum Frequency
18.- Area, Speed, and Error Analysis (Cont.)
• Two Native Port Interfaces– Interface with DDR2 memory– Width of 64 bits– Supports Back-to-Back Transfers– Transfer Sizes:
Byte Half-word Word 4-word and 8-word cache line 16-word, 32-word, and 64-word bursts
22.-High Speed Memory Interface