dynamically parameterized architectures for power aware video coding: motion estimation and dct
DESCRIPTION
Dynamically Parameterized Architectures for Power Aware Video Coding: Motion Estimation and DCT. Wayne Burleson ([email protected]) Prashant Jain ([email protected]) Subramanian Venkatraman ([email protected]). Dept. of Electrical and Computer Engineering - PowerPoint PPT PresentationTRANSCRIPT
Dynamically Parameterized Architectures for Power Aware
Video Coding: Motion Estimation and DCT
Wayne Burleson ([email protected])
Prashant Jain ([email protected])
Subramanian Venkatraman ([email protected])
Dept. of Electrical and Computer EngineeringUniversity of Massachusetts Amherst
This work was partially supported by NSF-9988238
Outline
Introduction
Video Content Variation
Dynamic Parameterization to achieve Power-Aware
Video Coding
Motion Estimation & DCT
On-Going Work
Introduction
Video Content and processing are non-uniform in space
and time.
Video processing can gracefully degrade in power
constrained environments.
Exploits Perceptual tolerance.
MPEG-4.
High level algorithm changes affect power efficiency the
most.
Recent Work
Configurable FPGA based Architectures [Villasenor ‘95].
Heterogeneous architecture with Programmable
Processors [Kneip ‘98].
Heterogeneous Configurable architecture with on-chip
low-power FPGA [Zhang ‘00].
FPGAs
Slow
High power dissipation
Adaptive System-On-a-Chip (aSOC)
Partially Predefined Configuration ArchitectureHeterogeneous tiles with Statically scheduled interconnection switchesTiles can be reconfigured internally as well as from an external source
uP
DSP
RISC
RAM
ME/DCTCore
SRAM
Switch
Switch Memory
FPGA
FPGA
Ref. J. Liang et. al., aSOC: A Scalable, Single-Chip Communications Architecture in the Proceedings of the IEEE International Conference on Parallel Architectures and Compilation Techniques, 2000
Outline
Introduction
Video Content Variation
Dynamic Parameterization
Motion Estimation & DCT
On-Going Work
Content Variation across sequences
Content Variation in TimeHorizontal Component of the Motion Vectors
Content Variation in SpaceBackground: Not much variation
High variation
Outline
Introduction
Content Variation
Dynamic Parameterization
Motion Estimation & DCT
On-Going Work
Dynamic Parameterization
Functional parameters vary the output of a
computation.
Architectural parameters allow trade-offs in area,
performance, power and reliability.
Parameters can be bound at varying stages.
StandardTime
IP Time Run-TimeConfig.Time
Compile/BootTime
DesignTime
Years… Months… Secs… msecs… secs…
Dynamic Parameter Adjustment
System Requirements and Constraints
Signal statistics from the Input Signals
Algorithm statistics from the post processing of the Input Signals
Algorithm
Architecture
Predictor
Archi. Para.
Function. Para.
Signals
Precision, Quality, Compress.
Algo. & Archi. Stats.
SignalStats.
Area Speed Power
Area, Latency, Power
Predictor Inputs
Predictor Outputs
Architectural and Functional Parameters
Signal ProcessingSystem
Functional Parameter Adjustment: Algorithms
Full Search Logarithmic
Algorithms Compression Frames encoded/sec
(fps)
Full Search 70:1 0.2
Logarithmic 50:1 2.76
Functional Parameter Adjustment: Search Space
Larger search space improves chances of a good match.
A Good match
Increasing search space is effective up to a point
Larger search space increases computations.
High Compression
bpp
Plot for a specific sequence
Power versus Search Area
Memories – Major contributors to Power dissipation.
Algorithms presented reduce memory accesses and computations.
Our novel architecture reconfigures to different algorithms with reduced memory accesses and computations, thus saving power.
Power Consumption in Video Coding
Ref. Peter Kuhn, “Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation”
Com
pu
tati
on
(%
)
0
10
20
30
40
50
60
Steps
MotionEstimationDCT (IDCT)
Other (VLC,Quantization)
ME
DCTIDCTVLC,
etc.
Outline
Introduction
Content Variation
Dynamic Parameterization
Motion Estimation & DCT
On-Going Work
Functional Parameter: Full Search
Selects the most representative block from an exhaustive set of candidate blocks within a search window.
Functional Parameter: Spiral Search
Performs a Spiral Search for the matching block.
Algorithm is data dependent during run-time.
Functional Parameter : 3-Step Search
Functional Parameter: Pel Subsampling
16x16 Pixel Array 4:1 Subsampling2:1 Subsampling
Functional Parameter: Half-Pel ME
Current and Previous block data can be filtered to Half-Pel resolution.
Ref. Peter Kuhn, “Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation”
A
DC
B
a
c
b
a= (A+B+C+D)/2
b= (B+D)/2
c= (C+D)/2
I/O Re-use
Current Block
Candidate Blocks
Candidate blocks differ by a single row of pixels
Can reuse the previous rows of pixels
Previous rows are stored in FIFOs
Matching Criteria
The Matching Criteria used is Sum of Absolute Differences (SAD).
Ref. Peter Kuhn, “Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation”
1 1),(
1),(),(
Nx
xm
Ny
yndyndxm
kInm
kIdydxSAD
Proposed Architecture for Dynamically Parameterized ME
16x16PE Array
Address Generator
Unit
SRAMExternal to PE Array
MemoryBlock
SummingBlock
PE
RAMAddresses
PE Control
307,200 bytes/frame
storage
Architecture: Processing Element (PE)
|c-p|
Local Control
Sum of Absolute
Differences
Half-Pel
FIFOCurrent Pixel & 256 bytes
Outline
Introduction
Content Variation
Dynamic Parameterization
Motion Estimation & DCT
On-Going Work
Discrete Cosine Transform
Integral part of any still-image or video compression system.
Compute intensive - next only to motion estimation.
Amenable to VLSI implementation – “Decomposition” property and “Distributed Arithmetic”.
Decomposition Property
Y6
Y4
Y2
Y0
2
1
CBBC
AAAA
BCCB
AAAA
43
52
61
7
xx
xx
xx
xx0
7
5
3
1
Y
Y
Y
Y
2
1
DEFG
EGDF
FDGE
GFED
43
52
61
7
xx
xx
xx
xx0
1D DCT in matrix notation
2D DCT~ 2 1D DCTs
Ref. W.H. Chen at al., “A Fast Computational Algorithm for the Discrete Cosine Transform”, IEEE Trans. Commun.,
Distributed ArithmeticA0
A1
A1+A0
A1+A0
A2
A3+A2+A1
A3+A2+A1+A0
+
Result
X00
X01
X02
X03
X10
X11
X12
X13
X20
X21
X22
X23
X30
X31
X32
X33
4 t
o 1
6 A
ddre
ss
Deco
der
X2
Bit-serial arithmetic using Read Accumulate Computation (RAC) unit
Inner product computation of coefficient vector A and input vector X
Facilitates variable-precision processing
Ref. T. Xanthopoulos et al., “A Low-Power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization”, IEEE JSSC 2000
Exploiting Content Variation
Most Significant Bit Rejection (MSBR) RAC operation disabled in the presence of spatial
correlation
Row Column Classification (RCC) Reduction in overall arithmetic activity by imposing
upper bound on RAC cycles
Replication of Arithmetic Units (RAU) Replication of the RAC units – trade-off between
Power and Performance
Energy Efficiency Comparison Among DCT/IDCT
Chip Sw-Cap/sample
Matsui et al. 375 pF
Bhattacharya et al. 479 pF
Kuroda et al. 417 pF
T. Xanthopoulos et al.
128 pF
Ref. T. Xanthopoulos et al., “A Low-Power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization”, IEEE JSSC 2000
Architecture of DCT Core
Ref. T. Xanthopoulos et al., “A Low-Power DCT Core Using Adaptive Bitwidth and Arithmetic Activity Exploiting Signal Correlations and Quantization”, IEEE JSSC 2000
Outline
Introduction
Video Content Variation
Dynamic Parameterization to achieve Power-Aware
Video Coding
Motion Estimation & DCT
On-Going Work
On-Going Work
Implementations at the RTL, netlist and physical levels.
Power estimation at the various levels mentioned above.
Techniques for statistically tracking content variation.
Full prototyping based on actual video workloads using a logic emulator from IKOS systems, and
Extensions to other parameterized multimedia computations (e.g. 3D Graphics, natural and synthetic audio).
Conclusions
Content variation and Dynamic Parameterization can be
used to achieve power aware video coding.
Proposed Motion Estimation & DCT architectures to be
implemented to achieve the above.
Dynamically Parameterized Architectures for Power Aware
Video Coding: Motion Estimation and DCT
Wayne Burleson ([email protected])
Prashant Jain ([email protected])
Subramanian Venkatraman ([email protected])
Dept. of Electrical and Computer EngineeringUniversity of Massachusetts Amherst
This work was partially supported by NSF-9988238
http://vsp2.ecs.umass.edu/vspg/publication.html