15群( )-8編member.ieice-hbkb.org/files/upload/backup_final data/10... · web view3章...
TRANSCRIPT
15()-8
10 - 5LSI
3
10 - 5 - 3
3-1
JPEGJPEG2000LSI
3-1-1 JPEG[20091 ]
JPEG 1) 88/3188DCT/DCT//JPEGI/FCPU/DSPI/F
31JPEG
LSI
JPEG/DCT/DCT//11
(1)2DCT/IDCT
2DCT8888DCT4096DCTDCT(1)(2) 1DCT1DCT22DCT1024
+
+
=
=
=
16
)
1
2
(
cos
16
)
1
2
(
cos
4
1
DCT
7
0
7
0
p
p
v
y
u
x
f
C
C
F
x
xy
y
v
u
uv
F
(1)
+
+
=
=
=
16
)
1
2
(
cos
16
)
1
2
(
cos
4
1
DCT
7
0
7
0
p
p
u
x
u
x
F
C
C
f
u
uv
u
v
v
xy
F
t
(2)
1/
2
for u, v = 0
=
v
u
C
C
C
(3)
1 for otherwise
1DCT88DCTChen 2) 256416DCTCPU DSPDCT
2DCT
LSI2DCT21DCTRAM2DCT1DCT1DCTASIC32IDCT 3)
DCT8DCT41IDCTIDCT4881IDCT8IDCTRAM8DCT1IDCT8881IDCTRAMRAM21IDCT2IDCT8882IDCT1IDCT34
322DCT
(2)
2DCT1DC63ACDCSSSS2 DCAC2DCT1RUNRRRRLEVELLLLLRRRRLLLLLLLL2DCTDCT
DCDCDCACRRRR/LLLLRUN/LEVELRUNLEVELACRUNACACIQLSI11616
3-1-2 JPEG2000[20091 ]
JPEG2000 4, 5, 6) JPEGDCTDWTDiscreet Wavelet TransformationEBCOTEmbedded Block Coding with Optimized TruncationDCTJPEG200033JPEG20002DWT2D-DWTLL; LH; HL; HHLL2D-DWT 7) QTZEBCOTJPEG20002D-DWTJPEG20002D-DWTEBCOTEBCOTDWT
33JPEG2000
(1)2DWT
2D-DWT1D-DWTRAMJPEGVLSI DWT 9) 34 (9-7)91V-filter2H-filter(L)H-filter(H)V-filter9p0-p82LV, HVLV H-filter(L) HV H-filter(H) V-filter92H-filter(L) L0-L8LL, HLH-filter(H) H0-H8LH, HHDWT
342D-DWT
35DWT 10) 72D-DWTDWT2D-DWT1DWTsystlicDWT 12) 8) 11)
35DWT
DWT 13) DWT364BH42
2
[(0,0),(1,0)] - [(0,1),(1,1)] - [(0,2),(1,2)] [(2n,i),(2n+1,i)]
2CnDnEnFn2WHWLCn+1Dn+1En+1Fn+1WHWLH-CnDnEnFnCn+1Dn+1En+1Fn+1
2WH
H
n
H
n
H
n
H
n
F
E
D
C
,
,
,
2WHHWLH
H
n
H
n
H
n
H
n
F
E
D
C
1
1
1
1
,
,
,
+
+
+
+
2WL
,
,
L
n
L
n
D
C
L
n
L
n
F
E
,
2WLLWHL
,
,
,
1
1
1
L
n
L
n
L
n
E
D
C
+
+
+
L
n
F
1
+
LHLH2DWT
36DWT
(2)
DWTQTZ1RAMEBCOT
(3)EBCOT
EBCOTBPC(1) DWT (2) (3) 3Significance passRefinement passCleanup pass2MQCMQ-Coder37EBCOTBP & MQ CoderEBCOT BP & MQ Coder3MQC4
37EBCOT
1)ISO/IEC IS 10918-1, Information Technology - Digital compression and coding of continuous-tone still images: Requirement and guidlines, ISO/IEC JTC1/SC9/WG10, Feb. 1994.
2)W. H. Chen, C. H. Smith and S. C. Fraick, Afast computation algorithm for the discrete cosine transform, IEEE Trans. On Communication, Vol. 25, No.9, pp.1004-1009, 1977.
3), , , , 4MPEGLSI, , No.639, pp.107-121, 1995.
4)ISO/IEC IS 15444-1, Information Technology - JPEG2000 image coding system - Part 1: Core coding system, ISO/IEC JTC1/SC29/WG1, Dec. 2000.
5)ISO/IEC IS 15444-1, Information Technology - JPEG2000 image coding system - Part 1: Core coding system, AMENDMENT 1: Codestream restrictions, Mar. 2002.
6)David S. Taubman and MichaelW. Marcellin, JPEG2000 Image Compression Fundamentals, Standard and Practice, KAP, 2002.
7)S. G. Mallat, A theory for multiresolution signal decompositopn: The wavelet representation, IEEE Trans. PAMI, Vol.11, pp.674-693, 1989.
8)I. Daubechies and W.Sweldens, Factoring Wavelet Transforms into Lifting Steps, J.Fourier.Appl., Vol.4, pp.247-269, 1998.
9)C. Chrysafis and A. Ortega, Line-basedreduced memory, wavelet image compression, IEEE Trans. on Image Processing, Vol. 9, issue 3, pp.378-389, Mar. 2000.
10)T. C. Denk and K. K. Parhi, Calculation of mini-mum number of registers in 2-D discrete wavelet trans-form using lapped block processing, in Proc. IEEE ISCAS94, Vol. 3, pp.77-80, May 1994.
11)K. Andraand T. Acharya, A VLSI Architrcture for Lifting-based Forward and Inverse Wavelet transform, IEEE Trans. on Signal processing, Vol. 50, No. 4, pp.966-977, April 2002.
12)T. Acharya and P. Chen, VLSI of a DWT Arhiteture, Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, CA, May 31-June 3, 1998.
13)H. Yamauchiet al., 14401080 pixel, 30 frames per second motion-JPEG 2000 codec for HD-movie transmission, IEEE Journal of Solid-State Circuits, Vol.40, pp.331-341, 2005.
3-1-3 MPEG-2[200911 ]
(1)
MPEG-2DCT1616MBMPEG-2(1)38DCTQVLCVLDDCTIDCTIQMCIDCTIQ
38MPEG-2
MBMBMEMPEG-20.5ME
(2)
MVMEMBMBMBSADSum of Absolute DifferenceBMPMBNNm, nSAD
=
=
=
N
j
N
i
n
m
j
i
Distortion
n
m
SAD
1
1
)
,
,
,
(
)
,
(
-P m,n < P
(1)
Distortion (i,j,m,n) = |cur(i,j) ref(i+m, j+n)|
(2)
cur (i, j)ref (i+m, j+n) MBMB(m, n) Distortion (i, j, m, n) SAD (m ,n)
a bPE
PE0
PE1
PE2
PE3
clk
Diff
SAD
Diff
SAD
Diff
SAD
Diff
SAD
0
C00-R00
1
C10-R10
C00-R10
2
C20-R20
C10-R20
C00-R20
3
C30-R30
C20-R30
C10-R30
C00-R30
4
C01-R01
C30-R40
C20-R40
C10-R40
--
---
---
---
---
14
C23-R23
C13-R23
C03-R23
C32-R62
15
C33-R33
SAD00
C23-R33
C13-R33
C03-R33
16
C00-R01
C33-R43
SAD10
C23-R43
C13-R43
17
C00-R11
C00-R11
C33-R53
SAD20
C23-R53
18
C00-R21
C00-R21
C00-R21
C33-R63
SAD30
c
39ME
BMMBPESIMDSAVLSISASABMPEPESADDistortionPE 2)
39(a) 3) PEPEMB(b) PE(c) NNSADPE
310 3)(a) MBPE4)(b) MBPE 5)(a)
P
0,0
P
0,1
P
0,2
P
1,0
P
1,1
P
1,2
P
2,0
P
2,1
P
2,2
X
1,7
X
1,6
X
1,5
X
1,4
X
1,3
X
0,7
X
0,6
X
0,5
X
0,4
X
0,3
X
2,7
X
2,6
X
2,5
X
2,4
X
2,3
P
0,0
P
0,1
P
0,2
X
1,0
X
1,1
X
1,2
P
1,1
P
1,2
X
2,0
X
2,1
X
2,2
P
2,0
P
2,1
P
2,2
X
3,2
X
3,1
X
3,0
X
2,2
X
2,1
X
2,0
X
1,2
X
1,1
X
1,0
X
0,2
X
0,1
X
0,0
X
2,4
X
2,3
X
1,4
X
1,3
X
0,4
X
0,3
0
P
i,j
: : :
X
0,0
X
0,2
P
1,0
P
0,0
P
0,1
P
0,2
P
1,0
P
1,1
P
1,2
P
2,0
P
2,1
P
2,2
X
1,7
X
1,6
X
1,5
X
1,4
X
1,3
X
0,7
X
0,6
X
0,5
X
0,4
X
0,3
X
2,7
X
2,6
X
2,5
X
2,4
X
2,3
P
0,0
P
0,1
P
0,2
X
1,0
X
1,1
X
1,2
P
1,1
P
1,2
X
2,0
X
2,1
X
2,2
P
2,0
P
2,1
P
2,2
X
3,2
X
3,1
X
3,0
X
2,2
X
2,1
X
2,0
X
1,2
X
1,1
X
1,0
X
0,2
X
0,1
X
0,0
X
2,4
X
2,3
X
1,4
X
1,3
X
0,4
X
0,3
0
P
i,j
: : :
X
0,0
X
0,2
P
1,0
310
(3)
HD2001012/(2) LSI
31VLSISAD10.55) 13) 6) SAD 7), 10)SAD 8) 9)
31
11)12), 13) MBMVMBMV 13)
(4)
(a)11)
M = 3P22DD311
2210.531MB411/16PSNR0.3 dB
.
.
311
(b) 13)
221MB21/4312
312
PSNRMPEG0.40.8 dB
210.553SAD0.53topbottom
MPEG-2DCT3-1-1JPEG
3-1-4 H.264[200911 ]
(1)
H.264 14) 313MPEG-2DCTCABAC1/4RDMPEG-2210ME/MCCABACVLSI
ME/MC:
IPD:
Trans.:Q:
Inv. Trans.:
IQ: RC:
DBF:
EC:
IQ
Inv.
Trans,
EC
FIFO
RC
ME
MC
Q
Frame M
DBF
IPD
Trans.
ME/MC:
IPD:
Trans.:Q:
Inv. Trans.:
IQ: RC:
DBF:
EC:
IQ
Inv.
Trans,
EC
FIFO
RC
ME
MC
Q
Frame M
DBF
IPD
Trans.
313H.264
(2)ME/MC
MPEG-216161/2ME/MCH.26431416161688168848448443741MV61/4ME/MCMRFMultiple Reference FramesMPEG-2IMEInteger ME1/4FMEFractional pixel MEFMEMV
16 16
16 8
8 16
8 8
8 8
8 4
4 8
4 4
7MV1+2+2+4+4(2+2+4) = 41
16 16
16 8
8 16
8 8
8 8
8 4
4 8
4 4
7MV1+2+2+4+4(2+2+4) = 41
314H.264
IMELSIBM1616BMSA4416SADSAD41SAD
315SA41MEBMPE 15)32SAPESAD81621844SAD262SADPE261541SAD16PE161641SADSADBUSSADME1616BM2561640962SABM 16), 17)
BM3232LSISDTVTVMVHDHD3-1-2(3)SDTV88ME
AD
D
MuxA
MuxC
MuxB
D0D1
D7
D8D9D13
MuxDMuxFMuxE
C
R
SAD BUS
PE
0
D
PE
1
D
PE
15
D
Contro
l
SADBUS
0
SADBUS
1
5
R0
R1
Min
Min
MV
MV
.
Stage2
accumulator
Stage1
accumulator
12388810,11,14,15130258
2411616Full MB81261
1408162,3,6,7,10,11,14,15111260
0391688,9,10,11,12,13,14,1580259
11378414,15130257
4544313363
3444213259
23440,113256
1244113155
0144013051
SAD
Bus
VectorSizeBLKStage
2 reg
Stage
1 reg
CLK
AD
D
MuxA
MuxC
MuxB
D0D1
D7
D8D9D13
MuxDMuxFMuxE
C
R
SAD BUS
PE
0
D
PE
1
D
PE
15
D
Contro
l
SADBUS
0
SADBUS
1
5
R0
R1
Min
Min
MV
MV
.
Stage2
accumulator
Stage1
accumulator
12388810,11,14,15130258
2411616Full MB81261
1408162,3,6,7,10,11,14,15111260
0391688,9,10,11,12,13,14,1580259
11378414,15130257
4544313363
3444213259
23440,113256
1244113155
0144013051
SAD
Bus
VectorSizeBLKStage
2 reg
Stage
1 reg
CLK
12388810,11,14,15130258
2411616Full MB81261
1408162,3,6,7,10,11,14,15111260
0391688,9,10,11,12,13,14,1580259
11378414,15130257
4544313363
3444213259
23440,113256
1244113155
0144013051
SAD
Bus
VectorSizeBLKStage
2 reg
Stage
1 reg
CLK
315MV
FMEIME41MV1/21/41MVMBPMVMVMVMCMBMBMVFME
(3)ME/MC
(a)IME 18)
MBAFFIME2VLSICRCSComplementary Recursive Cross SearchMVMB
IME4CRCS1-DSSADRCS2SAD1/2field-168field-88frame-1616frame-168frame-816frame-886SAD
field-168Top FieldBottom fieldMB PairMB8MVMBFSMPFSSBFSSB3IME3RBM4FSFSMPMB16324SAD
IME128H64VJMUMHS41 %HDPSNR0.06 dB
from
lower
(a)SA(b)PESRE
PU0
SRU0PU1
SRU1
SRU2
PU2
PU3
SRU3
SRU5
SRU7
PEPEPE
PEPEPE
PE
PEPE
SRESRE
SRE
SRESRE
SRE
SRESRE
SRE
REG_VS
SWRAM
Cross path
MUX
Reg.
Abs.
Reg.
PE
from
right
from
right
for
right
for
right
for
upper
from
SWRAM
SAD
from
lower
from
template
MUX
Reg.
SRE
from
right
from
right
for
right
for
right
for
upper
from
SWRAM
from
lower
(a)SA(b)PESRE
PU0
SRU0PU1
SRU1
SRU2
PU2
PU3
SRU3
SRU5
SRU7
PEPEPE
PEPEPE
PE
PEPE
SRESRE
SRE
SRESRE
SRE
SRESRE
SRE
REG_VS
SWRAM
Cross path
MUX
Reg.
Abs.
Reg.
PE
from
right
from
right
for
right
for
right
for
upper
from
SWRAM
SAD
from
lower
from
template
MUX
Reg.
SRE
from
right
from
right
for
right
for
right
for
upper
from
SWRAM
PU0
SRU0PU1
SRU1
SRU2
PU2
PU3
SRU3
SRU5
SRU7
PEPEPE
PEPEPE
PE
PEPE
SRESRE
SRE
SRESRE
SRE
SRESRE
SRE
REG_VS
SWRAM
Cross path
MUX
Reg.
Abs.
Reg.
PE
from
right
from
right
for
right
for
right
for
upper
from
SWRAM
SAD
from
lower
from
template
MUX
Reg.
SRE
from
right
from
right
for
right
for
right
for
upper
from
SWRAM
316IME
IMEVLSIRRSARAMSWRAM316(a) RRSARRSASASBSA(b) SBSAPESRSBSA88PE88SR88SAD33(a)SARRSA24SBSASBSA88SADSWRAM1-DSRBMFS
(b)ME/MC 19)
422HDTVHDME/MCBM
317(a) IME2488BMMVIME33(b) SA44 PE2
(1)
(3)
(2)
(4)
MB
88
(1)
(3)
(2)
(4)
MB
88
a24
88 (1)
88 (2)
88 (3)
88 (4)
16 16
16 16
16 16
16 16
168
168
168
168
816
816
816
816
WP
16
PA
PA
PA
TQ
SIMD
MC SIMD
RISC
88 (1)
88 (2)
88 (3)
88 (4)
16 16
16 16
16 16
16 16
168
168
168
168
816
816
816
816
WP
WP
16
PA
PA
PA
TQ
SIMD
MC SIMD
RISC
b88FMEcFME
317ME/MC
IMEMV317(b) 1616168816881/21/4FME168IME88 (1)88 (2)MV168FME2FME317(c)
IME1/2SIMD8PE1/21/49PE1682161616881688L0L1MESADMVMV
MVMCMCSIMD8PEFME318IMEFMEMC1 MB
HQHQ
H
Q
HQHQ
H
Q
16
PA
PA
MC
1
2
3
4
1
2
3
4
L0
Direct
Mode
L1
168U
1616
168D
816L
816R
88UL
88UR
88DL
88DR
L0L0L0
L1
L1L1
L0
L1
L0L0L0
L1L1
L1
L0
L1
L0L0L0
L1L1
L1
L0
L0
L1
L1
L0
L0
L1
L1
1616
168U
1616
168D
1616
816L
816R
88
UL
168
U
816
L
88
UR
168
D
816
R
88
DL
1616
88
DR
H
Q1/4
L0List 0
L1List 1
UD
LR
ULUR
DLDR
IMEMV
HQHQ
H
Q
HQHQ
H
Q
16
PA
PA
MC
1
2
3
4
1
2
3
4
L0
Direct
Mode
L1
168U
1616
168D
816L
816R
88UL
88UR
88DL
88DR
L0L0L0
L1
L1L1
L0
L1
L0L0L0
L1L1
L1
L0
L1
L0L0L0
L1L1
L1
L0
L0
L1
L1
L0
L0
L1
L1
1616
168U
1616
168D
1616
816L
816R
88
UL
168
U
816
L
88
UR
168
D
816
R
88
DL
1616
88
DR
H
Q1/4
L0List 0
L1List 1
UD
LR
ULUR
DLDR
IMEMV
318MVMC
ME/MCSIMDPEPAFFMBAFF422Intra/Inter 271.75 / +199.75 109.75 / +145.75JMPSNR
(4)
H.264MPEG-2DCTICD16
44161644916164MCMB
(5)DF
DFDCT361 MB32164821MB1921DF 20), 21)
(6)CABAC
CABAC3202222 22), 23)
22Bin2
2Bin01RangeLowRangeLow9100x1FE0x000MPSLPSLPSrLPSstateRangeMPSRangerLPS
312LPSRangerLPSLowMPSState0MPSLPSMPSLowRangeMPSMPSstateRange= 0x200
Low = Low-0x100
Low= Low-0x200
Rang e= Range 0
B
Y
N
Renorm
Sub
Renorm
Sub
Renorm
Sub
Renorm
Sub
Renorm
Sub
Renorm
Sub
Range
Low
Range#0
Low#0
Range#1
Low#1
xRange
xLow
d#1
d#2
n
d#n
n
Range
Low
xd
Range
Low
xd
Range#2
Low#2
Range#n
Low#n
xd
xd#1
xd#m
xvalid
Selector
R < 0x100
Low < 0x100
Low >= 0x200
Low = Low-0x100
Low= Low-0x200
Rang e= Range 0
B
Y
N
Renorm
Sub
Renorm
Sub
Renorm
Sub
Renorm
Sub
Renorm
Sub
Renorm
Sub
Range
Low
Range#0
Low#0
Range#1
Low#1
xRange
xLow
d#1
d#2
n
d#n
n
Range
Low
xd
Range
Low
xd
Range#2
Low#2
Range#n
Low#n
xd
xd#1
xd#m
xvalid
Selector
313CABAC
bin
3-1-5 [200911 ]
CBRVBR
MBSAD212
GOPMB3GOPGOP1MB2MBMBMB3VBRCBRFIFO
3-1-6 LSI[200911 ]
LSI3-1-33-1-4DHBMPUDSPMPAMPADSPSIMDHMAVPACPADHBMPA
(1)DHB
MPEG-2LSI314 24)MEVLCMBMBFIFOSDRAM
VIF
M
E
MC
Host I/F
Memory Interface
DCTQ
IDCT!Q
VLC
SDRAM
II:
I:
III:
RISC
-CPU
-FIFOMB
A
B
MB
: MBFIFO
:
:
VIF
M
E
MC
Host I/F
Memory Interface
DCTQ
IDCT!Q
VLC
SDRAM
II:
I:
III:
RISC
-CPU
-FIFOMB
A
B
MB
: MBFIFO
:
:
314DHBMPEG-2LSI315
315RCRISCLSIDHB
H.264LSI316 25)M-CoreME/MCIPDTQMBC-CoreLFECMB32bit RISCMRISCCRISCTRISC2DRAMeDRAMDDR
TRISC
TRISC
TS
Host I/F
Host I/F
C-core
C-core
M-core
M-core
VIF
VIF
Video
Audio/User
Data
MUX
MUX
MIF
MIF
FME
TQ
SMEIPD
MRISC
EC
CRISC
eDRAM
eDRAM
Host CPU
TME
IR
MBP
RIT
IR
MBP
RIT
LF
MDT
MDT
Mobile DDR
From/to
upper/lower chip
TRISC
TRISC
TS
Host I/F
Host I/F
C-core
C-core
M-core
M-core
VIF
VIF
Video
Audio/User
Data
MUX
MUX
MIF
MIF
FME
TQ
SMEIPD
MRISC
EC
CRISC
eDRAM
eDRAM
Host CPU
TME
IR
MBP
RIT
IR
MBP
RIT
LF
MDT
MDT
Mobile DDR
From/to
upper/lower chip
316DHBH.264/MPEG-2LSI
(2)HMA
LSI3.17 26)MIPSSPARC32RISCDSPMEDCTIPDMCVLIWSIMDH.264MB 27)MECABAC
CPUPCIHDTVIPCDDR
LSIMPEG-2H.264VC14221080/60pCPUDSP
MIPS
Processor core
MIPS
Processor core
Video
I/O
Scaling
Video
I/O
Scaling
Video DSP
Video DSP
Motion
Estimation
Motion
Estimation
Entropy
Coding
Entropy
Coding
Memory
Controller
Memory
Controller
Transport
I/O
Transport
I/O
IPC
I/F
IPC
I/F
Audio
I/O
Audio
I/O
Serial
I/O
Serial
I/O
Host
I/F
Host
I/F
SPARC
Processor core
SPARC
Processor core
TS
Video
Audio
Host CPU
stream
Adjacent
chip
MIPS
Processor core
MIPS
Processor core
Video
I/O
Scaling
Video
I/O
Scaling
Video DSP
Video DSP
Motion
Estimation
Motion
Estimation
Entropy
Coding
Entropy
Coding
Memory
Controller
Memory
Controller
Transport
I/O
Transport
I/O
IPC
I/F
IPC
I/F
Audio
I/O
Audio
I/O
Serial
I/O
Serial
I/O
Host
I/F
Host
I/F
SPARC
Processor core
SPARC
Processor core
TS
Video
Audio
Host CPU
stream
Adjacent
chip
317H.264LSI
(3)VLIWVPA
318 28)VP TileSP Tile2PEPEVLIW2DMALSIAHBOCPAXIVP TileSIMDSP Tile
VP Tile
DMA
Vector
Processor
1
VP Tile
DMA
Vector
Processor
2
System
Controller
Memory
Subsystem
Display
Driver
SP Tile
DMA
Vector
Processor
System Bus
VLIW Processor
Program Memory
Scalar P
Processing
Vector P
Processing
Data Memory
(Scalar Access)
Data Memory
(Vector Access)
2D Vector
Memory
DMA
Program Counter
VLIW Instruction
Load/Store
Load/Store
Load/Store
ctrl
VP Tile
VLIW
Processor
Program
Memory
Scalar P
Data Memory
(Scalar Access)
DMA
Program Counter
Load/Store
ctrl
SP Tile
SP Tile
DMA
Vector
Processor
SP Tile
DMA
Scalar
Processor
VP Tile
DMA
Vector
Processor
1
VP Tile
DMA
Vector
Processor
2
VP Tile
DMA
Vector
Processor
1
VP Tile
DMA
Vector
Processor
2
VP Tile
DMA
Vector
Processor
1
VP Tile
DMA
Vector
Processor
2
System
Controller
Memory
Subsystem
Display
Driver
SP Tile
DMA
Vector
Processor
SP Tile
DMA
Vector
Processor
System Bus
VLIW Processor
Program Memory
Scalar P
Processing
Vector P
Processing
Data Memory
(Scalar Access)
Data Memory
(Vector Access)
2D Vector
Memory
DMA
Program Counter
VLIW Instruction
Load/Store
Load/Store
Load/Store
ctrl
VP Tile
VLIW
Processor
Program
Memory
Scalar P
Data Memory
(Scalar Access)
DMA
Program Counter
Load/Store
ctrl
SP Tile
SP Tile
DMA
Vector
Processor
SP Tile
DMA
Vector
Processor
SP Tile
DMA
Scalar
Processor
VP Tile
DMA
Vector
Processor
1
VP Tile
DMA
Vector
Processor
2
318VLIW
(4)CPA
RISCSIMD 30) 29)
319RISC128SIMDSIMDSIMDSIMDDCTMCMECABACSIMDDMASIMDSIMDMPEG-4H.26410
Hardware Platform
SIMD MP #1,128-bit
SIMD MP #2, 128-bit
Local Scratchpad Memory (SDM1)
ARC 700 CPU
Instruction
Cache
Data
Cache
Entropy
Encode
(EE)
Entropy
Decode
(ED)
Motion
Estimation
(ME)
Video
Optimized
DMA
Engine
Channel
Configurable Ports, AHB, AXI, BVCI
Hardware Platform
SIMD MP #1,128-bit
SIMD MP #2, 128-bit
Local Scratchpad Memory (SDM1)
ARC 700 CPU
Instruction
Cache
Data
Cache
Entropy
Encode
(EE)
Entropy
Decode
(ED)
Motion
Estimation
(ME)
Video
Optimized
DMA
Engine
Channel
Configurable Ports, AHB, AXI, BVCI
319
(5)GPPGeneral Purpose Processor
SIMDMMXMulti Media eXtension31) SSEStreaming SIMD Extension
MMXSIMDx86IBMPowerPCAltiVec(1)(4)GPPMMXSSEMPEG-2H.264HDTVPC 32)
MMXGPPRSVPReconfigurable Streaming Vector Processor33) GPUGPPGPUGPP
(6)
Time-to-MarketGPPDHBDHBGPPH.264DHBGPPMPAH.264IPGPPDHBCPUMPACPAVPADHBGPPMPAHDTVH.264GPPMPADHBRTLDHBDHB
GPPMPAGPPDHBHMACPA
1)ISO/IEC 13818-1/2/3 International Standard, Information technology Generic Coding of Moving Pictures and Ass0ciated Audio: Systems/Visual/Audio, Nov. 1994.
2)Ching-Yeh Chen, Shao-Yi Chien, Yu-Wen Huang, Tung-Chien Chen, Tu-Chih Wang, and Liang-Gee Chen, Analysis and Architecture Design of Variable Block-Size Motion Estimation for H.264/AVC, IEEE Trans. Circuits and Systems, vol. 53, no. 2, pp.578-593, Feb. 2006.
3)T.Minami, T. Kondo, K. Suguri, and R. Kasai, A Proposal of a one-dimensional Array Architecture for Full-search Block Matching Algorothm, IEICE Trans. Inf. & Syst. (Japanese Edition), vol. J78-D-1, no. 12, pp. 913-925, Dec. 1995.
4)S. Uramoto, A. Takabatake, M. Suzuki, H.Sakurai, and M. yoshimoto, A Half-pel Precision motion estimation processor for NTSC-resolution video, IEICE Trans. Electron., vol. E77-C, no.12, pp.1930-1936, Dec. 1994.
5), , S. Scotznivosky, , , MPEG-2LSIME3, , ICD98-116, pp.29-36, 1998-08.
6)E. Ogura, M. Takashima, D. Hiranaka, T. Ishikawa, Y. Yanagita, S. Suzuki, T. Fukuda, and T. Ishii, A 1.2W Single-Chip MPEG-2 MP@ML Video Encoder LSI including Wide Search Range Motion Estimation and 81MOPS Controller, Digest of ISSCC, pp.32-33, Feb. 1998.
7)L. K. Liu and E. Feig, A block-based gradient descent search algorithm for block motion estimation in video coding, IEEE Trans. Circuits Syst. Video Technology, vol. 6, pp. 419423, Aug. 1996.
8), , , , , , ICD97-94, pp.1-8, 1997-8.
9)Z. He and M.-I. Liou, Reducing hardware complexity of motion estimation algorithms using truncated pixels, in Proc. IEEE Int. Symp.Circuits Syst., pp. 28092812, 1997.
10)ISO/IEC ITU-T VCEG, Fast Integer Pel and Fractional Pel Motion Estimation for JVT, JVT-F017, 2002.
11)Kazuhito Suguri, Toshihiro Minami, Hiroaki Matsuda, Ritsu Kusaba, Toshio Kondo, Ryota Kasai, et.al , A real-time motion estimation and compensation LSI with wide search range for MPEG2 video encoding, IEEE Micro, vol. 16, no. 2, pp. 51-58, April 1996.
12), , , , 1MPEG-2LSI, , ICD98-50, pp.1-8, 1998-06.
13), , , , , , , MPEG-2LSI, , ICD97-163, pp.33-40, 1997-10.
14)ISO/IEC 14496-10:2003, Information technology Coding of audio-visual objects Part 10: Advanced Video Coding, Dec. 2003.
15)S. Y. Yap and J. V. McCanny, A VLSI Architecture for Variable Block Size Video Motion Estimation, IEEE Trans. Circuits and Systems, vol. 51, no. 7, pp.384-389, July 2004.
16)Tung-Chien Chen, Yu-Wen Huang, and Liang-Gee Chen, Fully utilized and reusable architecture for fractional motion estimation of H.264/AVC, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP04), V-9-12, vol. 5, May 2004.
17)Y.W. Huang, T. C. Chen, C. H. Tsai, C. Y. Chen, T.W. Chen, C. S. Chen, C. F. Shen, S.Y. Ma, T. C.Wang, B.Y. Hsieh, H. C. Fang, and L. G. Chen, A 1.3 tops H.264/AVC signle-chip encoder for HDTV applications, in Proc. Int. Solid-State Circuits Conf., 2005, pp. 128129.
18)Y. Murachi, J. Miyakoshi, M. Hamamoto, T. Iinuma, T. Ishihara, F. Yin, J. Lee, H. Kawaguchi, and M Yoshimoto, A Sub 100mW H.264 [email protected] Integer-pel Motion Estimation Processor core MBAFF Encoding with Reconfigurable Ring-Connected Systolic Array and Segmentation-Free, Rectangle Access Search-Window Buffer, IEICE Trans. Electron., Vol. E91-C, No.4, pp.465-478, April 2008.
19)Takayuki Onishi, Takashi Sano, Koyo Nitta, Mitsuo Ikeda, and Jiro Naganuma, Multi-Reference and Multi-Block-Size Motion Estimation with Flexible Mode Selection for Professional 4:2:2 H.264/AVC Encoder LSI, IEEE International Symposium on Circuits and Systems (ISCAS 2008), pp.800-803, May 2008.
20)L. Li, S. Goto, and T. Ikenaga, A Highly Parallel Architecture for Deblocking Filter in H.264/AVC, IEICE Trans. on Information and System, vol. E88-D, no. 7, pp.1623-1629, 2005.
21)S. Y. Shih, C. R. Charng, and Y. L. Lin, A Near Optimal Deblocking Filter for H.264 Advanced Video Coding, Asia South Pacific Design Automation Conference, 2006, pp. 170-175
22)D. Marpe, T. Wiegand, and H. Schwarz, Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard, IEEE Transactions on Circuits and Systems for Video Technology, 13(7):620-636, July 2003.
23), , 22007-214068, 20078.
24)Mitsuo Ikeda, Toshio Kondo, Koyo Nitta, Kazuhito Suguri, Takeshi Yoshitome, Toshihiro Minami, Hiroe Iwasaki, Katsuyuki Ochiai, Jiro Naganuma, Makoto Endo, Yutaka Tashiro, Hiroshi Watanabe, Naoki Kobayashi, Tsuneo Okubo, Takeshi Ogura, and Ryota Kasai, SuperENC MPEG-2 Video Encoder Chip, IEEE Micro, vol. 19, no. 4, pp. 56-65, July 1999.
25)Koyo Nitta, Mitsuo Ikeda, Hiroe Iwasaki, Takayuki Onishi, Takashi Sano, Atsushi Sagata, Yasuyuki Nakajima, Minoru Inamori, Takedhi Yoshitome, Hiroaki Matsuda, Ryuichi Tanida, Atsushi Shimizu, Ken Nakamura, and Jiro Naganuma, An H.264/AVC High422 Profile and MPEG-2 422 Profile Encoder LSI for HDTV Broadcasting Infrastructures, 2008 Symposium on VLSI Cicuits Digest of Technical Papers, pp.106-107, June 2008.
26)S. Bewick, An Advanced Media Processor Architecture for Consumer Applications, Microprocessor Forum 2006, Oct. 2006.
27)J.W. van de Waerdt, S. Vassiliadis, et.al., The TM3270 media-processor, In MICRO '05 Proceedings of the 38th International Symposium on Micro-architecture, pp. 331-342, Nov. 2005.
28)J. Leijten, New Core Enables Programmable Camera Sensor Signal Processing, Microprocessor Forum 2006, Oct. 200
29)Nigel Topham, Efficient Multi-Processing for Media Applications, Spring Pprocessor Forum 2006, May 2006.
30)Dennis Moolenaar, A Dual-Core Video Decoder/Encoder, Microprocessor Forum 2007, May 2007.
31)A. Peleg and U. Weiser, MMX Technology extension to the Intel Architecture, IEEE Micro, vol. 16, No. 4, pp.42-50, Aug. 1996.
32)Iwasaki, H., Naganuma, J., Ochiai, K., Endo, M. and Ogura, T., "Real-Time Software MPEG2 Video Encoder on Parallel and Distributed Computer System, International Conference on Computer Communication (ICCC) 1999, pp.361-369, 1999.
33)J. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Morris, M. Schuette, and A. Saidi, The Reconfigurable Streaming Vector Processor (RSVP), In MICRO'03: Proceedings of the 36th International Symposium on Micro-architecture, pp.141-150, Dec. 2003.
10 - 5 - 3
3-2
[20094 ]
2
3-2-1
2image processingimage recognitionTV3-2-1
816
1632
1-3)32
3-1
3-2-2 LSI
31
31
816
1632
32
DoGHarr
FFT
SVM 2)GLVQ 3)Adaboost 1)
32LSI
3-2 LSI
LSI
LSI
32
816
1632
VLIW
416
SIMD
SIMDSIMD
3-2-3 LSI
LSISIMDLSISIMDMIMDSIMDSIMD+MIMD
4) 5) 6)
PE7), 8)PE9)10)
SIMDPEPEPE11), 12)PEK13), 14)
MIMDPEPE15), 16)SIMDPESIMD17)PE18)
SIMDSIMD19), 20)
SIMDSIMDMIMDSIMD21)SIMDMIMD222)SIMDMIMDMIMD23)
TVTISVPSerial Video Processor11)NECIMAPCARIntegrated Memory Array Processor for CAR14) 3-2
SVPSIMDSVP3-2(a) DIRDOR102440102424PE102412801RF0RF11ALU4M, A, B, C1024PE0PE1023PE4PEPE1282RF0RF1DIRDOR
32
IMAPCAR3-2(b) 2 KBPE128PEPECPDMAEIFSVPSIMDIMAPCARPEPE233(a)PE24)331(b)(d)SIMD
33
IMAPCAR34
34
3-2-4
LSILSILSILSI
1) P.Viola, M.Jones: Rapid Object Detection using a Boosted Cascade of Simple Features, IEEE Computer Vision and Pattern Recognition Conference (CVPR), pp.I-511-518 (2001)
2) C. Cortes, V. Vapnik: "Support-Vector Networks, Machine Learning, Vol.20, No.3, pp.273-297(1995)
3) A. Sato, K. Yamada: "Generalized Learning Vector Quantization", Advances in Neural Information Processing Systems, Vol.8, pp.423-429, MIT Press (1996)
4) , , : ,, ICD2003-28, pp. 13-18 (2003).
5) , , , , , , : IP5000 , , pp.45-50 (1999)
6) Y. Hanai, Y. Hori, J. Nishimura and T. Kuroda: A Versatile Recognition Processor Employing Haar-Like Feature and Cascaded Classifier, IEEE International Solid-State Circuits Conference (ISSCC), pp.148-149 (2009)
7) P.P.Jonker, E.R.Komen: A scalable real-time image processing pipeline, Proc. of IAPR Conf. on Pattern Recognition (ICPR92), Vol.4, pp.142-146 (1992)
8) S.M.Chai, D.S.Wills: Systolic opportunities for multidimensional data streams, IEEE Transactions on Parallel and Distributed Systems, Vol.13, No.4, pp.388-398 (2002)
9) ,, , , , , : ,(D), Vol.J73-D2 , No.10, pp.1751-1760 (1990)
10)T.Temma, T.Iwashita, K.Matsumoto, H.Kurokawa, T.Nukiyama: Data Flow Processor Chip for Image Processing, IEEE Trans. on Electron Devices, Vol. ED-32, No.9, pp.1784-1791(1985)
11) J.Childers, P.Reinecke, H.Miyaguchi, S.Yamamoto, Y.Takahashi, Y.Yaguchi, M. Takeyasu: SVP: Serial Video Processor, IEEE 1990 Custom Integrated Circuits Conference, 17.3, pp.1-4 (1990)
12) R.P.Kleihorst, A.A.Abbo, A. van der Avoird, M.J.R.Op de Beeck, L. Sevat, P. Wielage: Xetal: A Lowpower HighPerformance Smart Camera Processor, IEEE Int. Symp. Circuits System, Vol.5, pp.215218 (2001)
13) D.W.Hammerstrom and D.P.Lulich: Image Processing Using One-Dimensional Processor Arrays, Proceedings of the IEEE, Vol.84, No.7,pp. 10051018 (1996)
14)S.Kyo, S.Okazaki, T.Koga, F.Hidano: A 100 GOPS In-vehicle Vision Processor for Pre-crash Safety Systems Based on a Ring Connected 128 4-Way VLIW Processing Elements, Digest of Technical Papers of Symp. on VLSI Circuits, pp.28-29 (2008).
15) K.Deguchi, K.Tago, I.Morishita: Integrated parallel image processings on a pipelined MIMD multi-processor system PSM, Proc. of 10th International Conference on Pattern Recognition, Vol.2, pp.442-444 (1990)
16) T.Matsuyama, N.Asada, M.Aoyama: Parallel Image Analysis on Recursive Torus Architecture, Proc. of Second IEEE Workshop on Computer Architecture for Machine Perception (CAMP), pp.202-214 (1993)
17) S.K.Raman, V.Pentkovski, J.Keshava: Implementing streaming SIMD extensions on the Pentium III processor, IEEE Micro, Vol.20, No.4, pp.47-57 (2000)
18)Y. Kondo et al.: 4 GOPS 3 way-VLIW image recognition processor based on a configurable media-processor, ISSCC Digest of Technical Papers, pp.148-149 (2001)
19) Chih-Chi Cheng, Chia-Hua Lin, Chung-Te Li, Samuel Chang, Chia-Jung Hsu, Liang-Gee Chen iVisual: An Intelligent Visual Sensor SoC with 2790fps CMOS Image Sensor and 205GOPS/W Vision Processor, ISSCC Digest of Technical Papers, pp.306-307 (2008)
20) Kwanho Kim, Seungjin Lee, Joo-Young Kim, Minsu Kim, Donghyun Kim, Jeong-Ho Woo, Hoi-Jun Yoo A 125GOPS 583mW Network-on-Chip Based Parallel Processor with Bio-inspired Visual-Attention Engine, ISSCC Digest of Technical Papers, pp.308-309 (2008)
21) H.J.Siegel, L.J.Siegel, F.C.Kemmerer, P.T.Mueller Jr., H.E.Smalley Jr., S.D.SmithPASM: A partitionable SIMD/MIMD system for image processing and pattern recognition, IEEE Trans. on Computer, Vol.C-30, pp.934-947, (1981)
22) C.C.Jr.Weems: The Second Generation Image Understanding Architecture, Proc. of Second IEEE Workshop on Computer Architecture for Machine Perception (CAMP), pp.276285 (1993)
23) S. Kyo, T. Koga, H. Lieske, S. Nomoto, S. Okazaki: A Mixed-Mode Parallel Processor Architecture for Embedded Systems, ACM Int. Conf. on Supercomputing(ICS), pp. 253-262 (2007)
24) S. Kyo, S. Okazaki, T. Arai: An Integrated Memory Array Processor for Embedded Image Recognition Systems, IEEE Trans. on Computers, Vol.56, No.5, May 2007.
10 - 5 - 3
3-3
[200912 ]
3-3-1
CELPCode Excited Linear Prediction coding31CELP
31CELP
MIPSLSIDSP
VSELP11.2 kbpsPsi-CELP5.6 kbps2.5G3GAMREVRC2G
32CPUDSPDSP
I/F
I/F
IrDAGPSFelica
CODEC
CODEC
I/F
I/F
IrDAGPSFelica
CODEC
CODEC
32
IPIPG.711PCMG.723ADPCMG.729CS-ACELPIPPCPCIPIPLANLANIP32LAN
3-3-2
2001iPodMP333MP3
PCM
MDCT
IMDCT
PCM
PCM
MDCT
IMDCT
PCM
33MP3
MDCTmmLSI34
I/F
I/F
I/F
I/F
34
MP3AACAdvanced Audio CodingATRACAdaptive Transform Audio CodingWMAWindows Media AudioLSI34LSILSITVDVDBDHD-DVDHD-Audio5.1 ch7.1 ch11.2 chDTSDigital Theater SystemTrueHDMHzDSP
3-3-3
ECNCTVECM
(1)
35
1stPass
2ndPass
1stPass
2ndPass
35
353521219902000PC 1PC 1352,30%2GDistributed Speech RecognitionETSIThe European Telecommunications Standard Institute2000200636
36
37CPU1990PC37CPUPCCPUCPUPCMBMBMB
37
(2)
IC.2ch
1999200438DSPDSPCODECLSICPULSI
FIR
FIR
38
(3)
TVTV16 kHz22 kHzTVTV32 ch12 chDSPLSIDSPDSPLSIFPGA
(4)
2 ch4 chECMElectlet Condensor MicrophoneMEMSMicro Electro Mechanical SystemA/D1mmECM
3-3-4
VADVoice Activity DetectionVAD
CPUDSPLSI
10 - 5 - 3
3-4
[20094 ]
3-4-1
graphicCGComputer Graphics333D3 dimensionCG3renderingLSIGPUGraphics Processing Unit
(1)
CGpointlinesurface2line segment
3trianglepolygontriangle strip
(2)
CGCGframe buffer
:pixel2double buffer
(3)rasterize
2XY2rasterize
(4)APIApplication Programming Interface
APIAPIAPIAPI
3-4-2 2D
22 dimension graphics2
(1)
(a) bitmap
2
(b) vector
scalable
(2)font
bitmap fontoutline fontFreeType
(3)2CGAPI
API
(a) SVGScalable Vector Graphics
2DW3CXML
(b) OpenVG
2DAPIThe Khronos Groupopenvg
(4)
PC2DCPUGPUCPULSIGPU2
3-4-3 3D
33 dimension computer graphics3DCGCGCG
(1)3D
331/301/602CG
332
polygonpixel
(2)
MPEG-23CGLSI
(3)
3D CG1963Sketchpad19701980LSI3CADWS3CGGWSGWSCGCG
19903CG3CGPCCPU19903PCGPU3PCGWSPC3CG3CG
(4)
3D CGCG
(a) vertex processing
T&Ltransform and lightinggeometry processing
(b)
(c) pixel processing
pixelfragment
(d)
(5)
GWS
(a) LSI
GWSLSILSIGPU2LSICPU
LSI1999GPULSIAPIDSPAPI
(b)
2001Programmable ShaderLSIVertex ShaderPixel ShaderFragment ShaderLSILSIGPU
(c)
Unified Shader Model
Unified Shading Architecture(a)
(6)CG
CG
(a)
depth bufferZ buffer
(b) Texture Mapping
GPUCGGPU
(c)
(d) shadeshadow
CPU
(e)
(7)GPU
2009GPU LSIGPUCPUGPU2
NVIDIAGeForce GTX 2851455 nm470 mm2LSI183 W1.5 GHz2401 Tflops
AMDRadeon HD 48709.5655 nm282 mm2TDPThermal Design Power160 WVLIW750 MHz8001 Tflops
GPU
GPU
IntelLarrabeeGPUGPUCPUx86GPUCPUCPUGPUCG
(8)
3CGDDRGDDRHDCG2
(9)CGAPI
CGAPIRenderManShading LanguageCCG
OpenGLDirectX GraphicsCGAPI
(a) OpenGL
OpenGLSGIIrisGLAPIThe Khronos GroupOpenGLOSCADOpenGL 1.xOpenGL 2.0OpenGLGLSLOpenGL Shading LanguageOpenGL 3.0GPUAPI
OpenGL APIMesa3DGallium3DOpenGL ESOpenGL for Embedded SystemsOpenGL ES 2.0OpenGL 2.0
(b) DirectX Graphics
DirectXWindows OSAPIDirectX3APIDirect3DDirectX GraphicsDirectX GraphicsHLSLHigh Level Shading Language
(c)
nVIDIACgC for GraphicsGPUOpenGLDirectX
(10)CG
CGScene Graph LibraryGame Engine3ShadowCGCGCGAPIPCPC
(1) OpenSG
(2) OpenSceneGraph
3-4-4 GPGPUGeneral Purpose computing on GPU
GPUCGGPGPU
(1)
GPU43CGRGB333x3RGBA4444
(2)GPGPUAPI
CGGPU
(a) CUDACompute Unified Device Architecture
NVIDIAGPGPUGPUPhysXGPUCUDA
(b) OpenCLOpen Computing Language
CGPUCPUDSPThe Khronos Group
(c)
AMDATI StreamCUDAMSDirectXDirext3DDirectX Compute Shader
(3)
GPGPU
CPUGPU
(a)
(b)
CPUHDMPEG-4 AVCH.264GPU
(c)
GPUGPU
(d)
3-4-5 3D
CPUGPU
(a) Global Illumination
3Ray TracingRadiosityPhoton MappingGlobal Illumination3
(b) Non Photo Realistic
32
(c) Image Based Rendering
(d) Volume Rendering
bitmap33
3-4-6
3GPU3CGCGCGGPUCPUCG
1)Real-Time Rendering
http://www.realtimerendering.com/
2)Khronos Group
http://www.khronos.org/
3)OpenGL
http://www.opengl.org/
http://www.khronos.org/opengl/
4)OpenGL ES
http://www.khronos.org/opengles/
5)OpenCL
http://www.khronos.org/opencl/
6)Mesa3D
http://www.mesa3d.org/
7)SVG
http://www.w3.org/Graphics/SVG/
8)Wikipedia
GPU
http://ja.wikipedia.org/wiki/Graphics_Processing_Unit
10 - 5 - 3
3-5
3-5-1[20098 ]
(1)
1)LSILSI1ACSadd-compare-select
(a) 31
ACSLIFOLast In First Out
Hx
H
r
H
H
H
=
(
)
@@
@
r
H
H
H
x
1
H
H
-
=
(
)
H
1
H
ZF
H
H
H
W
-
=
@
2
33
34
3
~
~
min
arg
x
x
R
R
y
-
-
(b) 32
kk1k
(2)
19932), 3)RSCRecursive Systematic Convolutional33r0RSCr1r2RSC
2
44
4
~
min
arg
x
R
y
-
=
4
3
2
1
44
34
33
24
23
22
14
13
12
11
4
3
2
1
0
0
0
0
0
0
x
x
x
x
R
R
R
R
R
R
R
R
R
R
y
y
y
y
341r0r1e2e1102e1r0r2e2
1
1
1
1
1
1
1
1
1
1
1
,
'
'
'
'
-
-
-
-
-
-
-
-
-
-
-
+
=
-
=
=
-
-
=
CA
BE
A
A
F
B
CA
D
E
D
C
B
A
E
CA
E
BE
A
F
C
W
N
M
N
N
M
M
h
h
h
h
h
h
h
h
h
,
,
2
,
1
2
,
2
,
2
2
,
1
1
,
1
,
2
1
,
1
L
M
O
M
M
L
L
MAPMaximum A Posteriori4)Log-MAPMax-Log-MAPSOVASoft-Output Viterbi Algorithm5)31kRSCRSCMAXLUEXPMAPLog-MAPMax-Log-MAPSOVAMAPLog-MAPMax-Log-MAPSOVA
=
D
C
B
A
W
(a) Sliding Window
SRAMLSIN35N
Sliding Window6) NLLK156NSliding Window36372
N
M
N
N
M
M
h
h
h
h
h
h
h
h
h
,
,
2
,
1
2
,
2
,
2
2
,
1
1
,
1
,
2
1
,
1
L
M
O
M
M
L
L
=
D
C
B
A
W
1
1
1
1
1
1
1
1
1
1
1
,
'
'
'
'
-
-
-
-
-
-
-
-
-
-
-
+
=
-
=
=
-
-
=
CA
BE
A
A
F
B
CA
D
E
D
C
B
A
E
CA
E
BE
A
F
C
W
(b) Block-Interleaved Pipelining
Block-Interleaved PipeliningBIP7)BIP1N/212/N+1N212238BIP2
=
4
3
2
1
44
34
33
24
23
22
14
13
12
11
4
3
2
1
0
0
0
0
0
0
x
x
x
x
R
R
R
R
R
R
R
R
R
R
y
y
y
y
2
44
4
~
min
arg
x
R
y
-
39122D1f2ff 1/2310
2
33
34
3
~
~
min
arg
x
x
R
R
y
-
-
1
N
1
N
L
ACS
BIPImproved Sliding Window8)BIP
1)Andrew J. Viterbi: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Trans. Information Theory, Vol. IT-13, No. 2, pp.260-269 (1967)
2)Claude Berrou, Alain Glavieux and Punya Thitimajshima: Claude Berrou: Near Shannon Limit error-correcting coding and decoding: Turbo-codes, Proc. IEEE Int. Conf. Comm., pp.1064-1071 (1993)
3)Claude Berrou and Alain Glavieux: Near Optimum Error Correcting Coding and Decoding: Turbo-Codes, IEEE Trans. Communications, Vol. 44, No. 10, pp.1261-1271 (1996)
4)L. R. Bahl, J. Cocke, F. Jelinek and J. Raviv: Optimal Decoding of Linear Codes for minimizing symbol error rate, IEEE Trans. Information Theory, Vol. IT-20, No. 2, pp.284-287 (1974)
5)Branka Vucetic and Jinhong Yuan: Turbo Codes: Principle and Applications, Kluwer Academic Publishers
6)S. Benedetto, D. Divsalar, G. Montorsi and F. Pollarab: Soft-Output Decoding Algorithms in Iterative Decoding of Turbo Codes, The Telecommunications and Data Acquisition Progress Report 42-124, NASA Jet Propulsion Laboratory, pp.63-87 (1996)
7)Seok-Jun Lee, Naresh R. Shanbhag and Andrew C. Singer: A 285-MHz Pipelined MAP Decoder in 0.18-um CMOS, IEEE J. Solid-State Circuits, Vol. 40, No. 8, pp.1718-1725 (2005)
8)Hiroshi Gambe, Yoshinori Tanaka, Kazuhisa Ohbuchi, Teruo Ishihara and Jifeng Li: An Improved Sliding Window Algorithm for Max-Log-MAP Turbo Decoder and Its Programmable LSI Implementation, IEICE Trans. Electron., Vol. E88-C, No. 3, pp.403-412 (2005)
3-5-2MIMOLSI[200912 ]
MIMOMulti Input Multi OutputLSIMIMO2
A. STC-MIMO
Space Time CodingSTC1)MIMOSISO
B. SDM-MIMO
Space Division MultiplexingSDMMIMOSDM-MIMO
LANIEEE 802.11nWiMAXIEEE 802.16eMANLTESDM-MIMO100 Mbps1 GbpsSTC-MIMO
STC-MIMOMRCLSILSISDM-MIMOInterference CancellerMIMOMIMO Decoder
3-5-1MIMO3-5-2SDM-MIMO3-5-3MIMOZF/MMSELSI3-5-4MLDMIMOLSI
(1)MIMOMIMO
(a) MIMO
1(a)MNMIMO(b)
1MIMO
1M
n
[
]
T
x
x
x
M
2
1
L
=
x
(1)
I = 1, 2, , MQAMn
2,NM
H = [hnm]
ACS
(2)
hnmi.i.d.
3N
n
[
]
T
r
r
r
N
2
1
L
=
r
(3)
N
2
r
1
r
N
r
De
-
Mapper
P/S
2
M
x
2
x
Stearing
Vector
A
Interference Canceller
Mapper
S/P
1
2
M
1
2
N
N
2
r
1
r
N
r
De
-
Mapper
P/S
2
M
x
2
x
Stearing
Vector
A
Interference Canceller
Mapper
S/P
1
2
M
1
2
N
a
1
1
x
2
x
M
x
1
r
2
r
N
r
1,1
h
NM
h
,
1
2
2
M
N
1
1
x
2
x
M
x
1
r
2
r
N
r
1,1
h
NM
h
,
1
2
2
M
N
b
1MIMO
MIMO
n
Hx
r
+
=
(4)
n2
MIMO1xiMnLayered Space Time CodingLSTSISOMNMSTCLST
1MapperQPSKQAMSPxSPMMASPAASDMLSTASDM2) LSTMMILSTSDM
nxMIMOyyxMIMO
MIMO3) MatlabMIMOMIMOMIMOLAN4)
(2)MIMO
MIMOMIMOZero ForcingZFMinimum Mean SquaredMMSESNV-BLAST
(a) Zero ForcingZF
INN
I
WH
=
(5)
MNW(4)x
(
)
Wn
x
n
Hx
W
Wr
y
+
=
+
=
=
(6)
HMNWHMoore-PenroseWZFNMWZFHHH
EMBED Equation.3
(
)
H
1
H
ZF
H
H
H
W
-
=
@
(7)
WWInterference cancellerMIMOMIMO Decoder(7) WZF Zero ForcingZFZF(6)WHZero ForcingZFMIMO(6)
(b) Minimum Mean SquaredMMSE
MMSExy
)
(
N
i
i
xiyiE
M
i
n
y
n
x
E
i
i
,...,
2
,
1
,
)
(
)
(
2
=
-
=
(8)
MMSE(8)Signal to Noise Interference RatioSINRZFSNR
WNM5)
(
)
H
M
H
H
I
H
H
W
1
2
-
+
=
s
MMSE
(9)
2AWGN
=
=
M
I
H
H
0
r
r
2
,
]
[
(6)(9)
(
)
r
H
H
H
y
H
1
H
-
=
(10)
(
)
H
H
H
H
H
1
-
H
Moore-PenroseMMSEZFW(9)2LAN
2MMSEZFBERMMSE(8)SINRZFSNR4
(c) MMSE
LSIMMSE6)
(1)
4422AD
RSC
1
RSC
2
r
0
r
1
r
2
(11)
1
2
r
0
r
1
r
2
e1
e2
(12)
C = BH, B' = C'HA-1B = (CA-1)H
1
1
-
-
-
BE
A
F
CA
BE
A
A
1
1
1
-
-
-
+
222A, B, C, D1(12)
2MMSE MIMO6)
(d) V-BLAST
Vertical-Bell Laboratory Space-Time CodingV-BLASTSN9)ZFMMSE
(6)
(
)
{
}
(
)
{
}
(
)
2
2
i
ZF
i
ZF
i
ZF
ZF
E
N
W
n
W
n
W
s
=
=
*
(13)
E(WZF) i WZFi
(
)
(
)
2
2
,
1
min
arg
i
ZF
M
i
i
k
W
L
=
=
(14)
SN
k1, k2, , kM
(1)
i
k
w
(13)
i
k
i
k
y
(2)
i
k
i
k
x
(3)
i
i
k
k
x
x
=
ri
(1)(3)k1, k2, , kMri
k1, k2, , kMZFki
[
]
=
=
)
(
1
)
(
0
i
j
i
j
j
i
k
k
H
w
(15)
[
]
j
k
ki
i
k
w
ri(15)
+
-
1
i
k
H
ki
k
H
Hk
(e) MIMOLSI
44 MMSE1
1MIMOLSI6)
(7)
(8)
(6)
(10)
ZF
MMSE
MMSE
MMSE V-BLAST
bps
2.6 G
212 M
ASIC
90 nm
43 k
0.25m
89 k
90 nm
1.86M
0.18m
1 mm2
ns
180
600
187.5
Kns
180K
600K
187.5
*MIMO-OFDM
(3)MIMO
Maximum Likelihood DecoderMLDMIMO
MLD(4)RHXj
X
MAX
LU
EXP
MAPLog-MAP
Max-
Log-MAP
SOVA
622
k
825
k
k
22 626
kk
2 224
224
824
k
k
22 224
922
kk
2 122
(16)
I
Q
3MLD
316QAMRMLDMIMO464QAM224QRM-MLDSphere DecodingK-best
(a) QR
QRHQR
QR
H
=
(17)
QR(17)QH(18)
n
Rx
n
Q
QRx
Q
r
Q
y
+
=
+
=
=
H
H
H
(18)
QRRxxMxMx
44 MIMO(18)
[L]
FIFO
[2L]
LIFO
[L]
[N]
[N]
(19)
1
N
12
x464QAM64
(20)
x464x3642
1
2
DD
1
2
D
/ 2
D
/ 2
(21)
x2, x1x1644
(b) Sphere Decoder
M = 3BPSKS1, S2
(1)
MLD23 = 8Sphere DecoderSCSphere Constraint
44s1,2,1,8SC
SCSCSC4s2,1,1, s2,1,2, s2,2,1, s2,2,2
4Sphere decoding algorithm
(2)
Sphere Decoder10)5
Critical path
5Sphere decoding architecture
B) A) SCSphere ConstraintSCSC
PED Storage4PED Storage4
PED StorageSCPED Storage1s1,2,1PED Storages1,1PED Storage
5Mbps
(c) K-best
(1)
K-best(10)QRKK-bestSphere Decoder6K62
K-BestKModified K-Best
BPSK, 3 transmit antenna
K=2
s
1
s
2
s
1,1
s
2,1
s
1,2
s
2,2
s
1,1,1
s
1,1,2
s
1,2,1
s
1,2,2
s
2,1,1
s
2,1,2
s
2,2,1
s
2,2,2
2 5
32 34
7
4
17
45
Stage 1
52
9 8
Stage 2
Stage 3
6 812 11
BPSK, 3 transmit antenna
K=2
s
1
s
2
s
1,1
s
2,1
s
1,2
s
2,2
s
1,1,1
s
1,1,2
s
1,2,1
s
1,2,2
s
2,1,1
s
2,1,2
s
2,2,1
s
2,2,2
2 5
32 34
7
4
17
45
Stage 1
52
9 8
Stage 2
Stage 3
6 812 11
6K-best algorithm
(2)
7K-best1Computation Unit4KMCUMetric Computation Unit1KBUK-best UnitPEDsPEDsKLKMCUKBUKKBULKKKComputation Unit
Computation UnitKK
7K-best
(3) LSI
MLDMIMOLSI2
2MLD-MIMOLSI
Algorithm
Sphere Decoder 11)
K-Best 13)
Number of antenna
Modulation
44
16QAM
44
16QAM
ThroughputMbps
80
424
Latencycycle
11.2
5
Process Technology
0.25m
0.25m
Number of gates
117 k
68 k
Areamm2
3.55
(4)
MIMOBER
(a) BER
3OFDM-SDM
BER1Viterbi
3
Data modulation
QPSK
Number of sub-carriers
512
Number of data-carriers
480
Gurard Interval length
64
Channel estimation
Ideal
Channel model
Quasi-static multipath fading(18-path)
Number of iteration
10000
(1)
8ZFMMSEBLASTMLDMLD
844 MIMOBER
(b) MLDBER
9MLDMLDASSESS 14) ViterbiK-BestMLDBERZF
9MLDBER
(c)
4MIMOV-BLASTO(M3)O(M4)MLK-best13)
4SDMOFDM-SDM1OFDM-SDML4L
4MIMO
M = N
NM+1
O(M3)
V-BLAST
NM+1+
O(M4)
M
MLD(K-Best)
N
O(M6)
SDM-MIMO489
(5)
MIMOSDMMIMOZFMMSEV-BLASTMLD
MIMO-SDM
Space Time Coding: STC1)
STBC15)
Water Filling 16)
17)
1 GbpsMIMO18)
1)S. M. Alamouti, A simple transmit diversity technique for wireless communications, IEEE Journal on Selected Areas in Communications, vol16, no.8, pp.1451-1458, O-ctober 1998.
2), , .
3)J. P. Kermoal, L. Schumacher, K. I. Pedersen, P. E. Mongensen, and F. Fredriksen, A stochastic MIMO radio channel model with experimental validation, IEEE Jour. Selec. Areas Commun., vol.20, 6, pp.1211-1226, 2002.
4)http://www.radrix.com/?page_id=158
5), , MIMO, , 2009.
6)S. Yoshizawa, Y. Yamauchi and Y. Miyanaga, VLSI Implementation of a Complete Pipeline MMSE Decoder for a 4x4 MIMO-OFDM Receiver, IEICE Trans.Fundamental, vol.EA91-A, n0.7, pp.1757-1762, 2005.
7)J. Eilert, D. Wu, and D. Liu, Efficient complex inversion for MIMO software defined radio, IEEE ISCAS, pp.2610-2613, May 2007.
8)A. Burg, and et.al, Algorithm and VLSI architecture for linear MMSE detection in MIMO-OFDM systems, IEEE ISCAS pp.4102-4105, May 2006.
9)G. D. Golden, C. J. Foschini, R. A. Valenzuela and P. W. Wolniansky, Detection algorithm and initial laboratory results using V-BLAST space-time communication architecture, Electronics Letters, vol.35, no.1, pp.14-15, January ()
10)F. Sobhanmanesh, S. Nooshabadi, K. Kiseon, A 212 Mb/s Chip for 4 4 16-QAM V-BLAST decoder, Proc. of MWSCAS 2007, pp.1437-1440, 5-8 Aug. 2007.
11)A. Burg, et.al. , VLSI implementation of the sphere decoding algorithm, ESSCIRC 2004. Proceeding of the 30th European Solid-State Circuits Conference, 2004. pp. 303-306, Sept. 2004.
12), , , Sphere DecodingMLD, , vol. J87-B, no.12, December 2004.
13)M. Wenk, M. Zellweger, A. Burg, N. Felber, W. Fichtner, K-best MIMO detection VLSI architectures achieving up to 424 Mbps, Proc. of ISCAS 2006, Page(s):4 pp. ? 1154
14)K. Higuchi, H. Kawai, N. Maeda, M. Sawahashi, Adaptive selection of surviving symbol replica candidates based on maximum reliability in QRM-MLD for OFCDM MIMO multiplexing, IEEE Proc. of GLOBECOM 2004, Volume 4,? 29 Nov.-3 Dec. 2004 Page(s):2480 - 2486 Vol.4
15)S. Wahyul, M. Kurosaki and H. Ochi, Design of 600 Mbps MIMO Wireless LAN System using GLST Coding and Its FPGA Implementation, 2009 IEEE Radio and Wireless Symposium, San Diego, Jan. 2009.
16)S. Haykin, M. Moher, Modern Wireless Communications, Prentice-Hall, 2005.
17)Mohinder Jankiraman, Space-time Codes And MIMO Systems, Artech House Universal Personal Communications, 2004.
18)W. A. Syafei, Y. Nagao, M. Kurosaki, B. Sai, and H. Ochi, Design of 1.2 Gbps MIMO WLAN for 4K Digital Cinema Transmission, The 20th IEEE Personal Indoor and Mobile Radio Communications Symposium (PIMRC) 2009, Tokyo, Japan, September 13-16, 2009.
10 - 5 - 3
3-6
[20091 ]
3-6-1
1970
3131
31
31
64128RSA
3-6-2
NN > 1
(1) Fully Loop UnrolledFLU
(2) Loop
(3) Pipeline
(1)Fully Loop UnrolledFLU
32NN
(2)Loop
33N11FLU1/N1FLU1/NNFLU1/NLoop2N/2
32FLU33Loop34Pipeline
(3)Pipeline
34FLUN11FLUFPGAPipeline
111FLUN
inner-Pipelineinner-PipelineLoop
32
FLU
Loop
Pipeline
1
1/N
1
1
1/N
1/N
1
1/N1
N
5
5
ECBCTR
LU1
1ISOECBElectronic CodeBookCBCCipher Block ChainOFBOutput FeedBackCFBCipher FeedBackCTR5CBCOFBCFBFLULoopPipelineECBCTR
ICTPM1LoopS-box
S-boxS-boxAESMISTYS-boxGF(2m)
1), 2), 3) S-boxGF(2m)
3-6-3
(1) GF(p)GF(2m)
(2)
(3) /2
(1)GF(2m) XORGF(p) GF(2m) RSA
(2)
(3)CPU4) CPU3-6-4
(1)
AB mod MA, B, MRSARSAY = XE mod M
1
0