15群( )-8編member.ieice-hbkb.org/files/upload/backup_final data/10... · web view3章...

Click here to load reader

Upload: truonglien

Post on 17-Mar-2018

226 views

Category:

Documents


2 download

TRANSCRIPT

15()-8

10 - 5LSI

3

10 - 5 - 3

3-1

JPEGJPEG2000LSI

3-1-1 JPEG[20091 ]

JPEG 1) 88/3188DCT/DCT//JPEGI/FCPU/DSPI/F

31JPEG

LSI

JPEG/DCT/DCT//11

(1)2DCT/IDCT

2DCT8888DCT4096DCTDCT(1)(2) 1DCT1DCT22DCT1024

+

+

=

=

=

16

)

1

2

(

cos

16

)

1

2

(

cos

4

1

DCT

7

0

7

0

p

p

v

y

u

x

f

C

C

F

x

xy

y

v

u

uv

F

(1)

+

+

=

=

=

16

)

1

2

(

cos

16

)

1

2

(

cos

4

1

DCT

7

0

7

0

p

p

u

x

u

x

F

C

C

f

u

uv

u

v

v

xy

F

t

(2)

1/

2

for u, v = 0

=

v

u

C

C

C

(3)

1 for otherwise

1DCT88DCTChen 2) 256416DCTCPU DSPDCT

2DCT

LSI2DCT21DCTRAM2DCT1DCT1DCTASIC32IDCT 3)

DCT8DCT41IDCTIDCT4881IDCT8IDCTRAM8DCT1IDCT8881IDCTRAMRAM21IDCT2IDCT8882IDCT1IDCT34

322DCT

(2)

2DCT1DC63ACDCSSSS2 DCAC2DCT1RUNRRRRLEVELLLLLRRRRLLLLLLLL2DCTDCT

DCDCDCACRRRR/LLLLRUN/LEVELRUNLEVELACRUNACACIQLSI11616

3-1-2 JPEG2000[20091 ]

JPEG2000 4, 5, 6) JPEGDCTDWTDiscreet Wavelet TransformationEBCOTEmbedded Block Coding with Optimized TruncationDCTJPEG200033JPEG20002DWT2D-DWTLL; LH; HL; HHLL2D-DWT 7) QTZEBCOTJPEG20002D-DWTJPEG20002D-DWTEBCOTEBCOTDWT

33JPEG2000

(1)2DWT

2D-DWT1D-DWTRAMJPEGVLSI DWT 9) 34 (9-7)91V-filter2H-filter(L)H-filter(H)V-filter9p0-p82LV, HVLV H-filter(L) HV H-filter(H) V-filter92H-filter(L) L0-L8LL, HLH-filter(H) H0-H8LH, HHDWT

342D-DWT

35DWT 10) 72D-DWTDWT2D-DWT1DWTsystlicDWT 12) 8) 11)

35DWT

DWT 13) DWT364BH42

2

[(0,0),(1,0)] - [(0,1),(1,1)] - [(0,2),(1,2)] [(2n,i),(2n+1,i)]

2CnDnEnFn2WHWLCn+1Dn+1En+1Fn+1WHWLH-CnDnEnFnCn+1Dn+1En+1Fn+1

2WH

H

n

H

n

H

n

H

n

F

E

D

C

,

,

,

2WHHWLH

H

n

H

n

H

n

H

n

F

E

D

C

1

1

1

1

,

,

,

+

+

+

+

2WL

,

,

L

n

L

n

D

C

L

n

L

n

F

E

,

2WLLWHL

,

,

,

1

1

1

L

n

L

n

L

n

E

D

C

+

+

+

L

n

F

1

+

LHLH2DWT

36DWT

(2)

DWTQTZ1RAMEBCOT

(3)EBCOT

EBCOTBPC(1) DWT (2) (3) 3Significance passRefinement passCleanup pass2MQCMQ-Coder37EBCOTBP & MQ CoderEBCOT BP & MQ Coder3MQC4

37EBCOT

1)ISO/IEC IS 10918-1, Information Technology - Digital compression and coding of continuous-tone still images: Requirement and guidlines, ISO/IEC JTC1/SC9/WG10, Feb. 1994.

2)W. H. Chen, C. H. Smith and S. C. Fraick, Afast computation algorithm for the discrete cosine transform, IEEE Trans. On Communication, Vol. 25, No.9, pp.1004-1009, 1977.

3), , , , 4MPEGLSI, , No.639, pp.107-121, 1995.

4)ISO/IEC IS 15444-1, Information Technology - JPEG2000 image coding system - Part 1: Core coding system, ISO/IEC JTC1/SC29/WG1, Dec. 2000.

5)ISO/IEC IS 15444-1, Information Technology - JPEG2000 image coding system - Part 1: Core coding system, AMENDMENT 1: Codestream restrictions, Mar. 2002.

6)David S. Taubman and MichaelW. Marcellin, JPEG2000 Image Compression Fundamentals, Standard and Practice, KAP, 2002.

7)S. G. Mallat, A theory for multiresolution signal decompositopn: The wavelet representation, IEEE Trans. PAMI, Vol.11, pp.674-693, 1989.

8)I. Daubechies and W.Sweldens, Factoring Wavelet Transforms into Lifting Steps, J.Fourier.Appl., Vol.4, pp.247-269, 1998.

9)C. Chrysafis and A. Ortega, Line-basedreduced memory, wavelet image compression, IEEE Trans. on Image Processing, Vol. 9, issue 3, pp.378-389, Mar. 2000.

10)T. C. Denk and K. K. Parhi, Calculation of mini-mum number of registers in 2-D discrete wavelet trans-form using lapped block processing, in Proc. IEEE ISCAS94, Vol. 3, pp.77-80, May 1994.

11)K. Andraand T. Acharya, A VLSI Architrcture for Lifting-based Forward and Inverse Wavelet transform, IEEE Trans. on Signal processing, Vol. 50, No. 4, pp.966-977, April 2002.

12)T. Acharya and P. Chen, VLSI of a DWT Arhiteture, Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, CA, May 31-June 3, 1998.

13)H. Yamauchiet al., 14401080 pixel, 30 frames per second motion-JPEG 2000 codec for HD-movie transmission, IEEE Journal of Solid-State Circuits, Vol.40, pp.331-341, 2005.

3-1-3 MPEG-2[200911 ]

(1)

MPEG-2DCT1616MBMPEG-2(1)38DCTQVLCVLDDCTIDCTIQMCIDCTIQ

38MPEG-2

MBMBMEMPEG-20.5ME

(2)

MVMEMBMBMBSADSum of Absolute DifferenceBMPMBNNm, nSAD

=

=

=

N

j

N

i

n

m

j

i

Distortion

n

m

SAD

1

1

)

,

,

,

(

)

,

(

-P m,n < P

(1)

Distortion (i,j,m,n) = |cur(i,j) ref(i+m, j+n)|

(2)

cur (i, j)ref (i+m, j+n) MBMB(m, n) Distortion (i, j, m, n) SAD (m ,n)

a bPE

PE0

PE1

PE2

PE3

clk

Diff

SAD

Diff

SAD

Diff

SAD

Diff

SAD

0

C00-R00

1

C10-R10

C00-R10

2

C20-R20

C10-R20

C00-R20

3

C30-R30

C20-R30

C10-R30

C00-R30

4

C01-R01

C30-R40

C20-R40

C10-R40

--

---

---

---

---

14

C23-R23

C13-R23

C03-R23

C32-R62

15

C33-R33

SAD00

C23-R33

C13-R33

C03-R33

16

C00-R01

C33-R43

SAD10

C23-R43

C13-R43

17

C00-R11

C00-R11

C33-R53

SAD20

C23-R53

18

C00-R21

C00-R21

C00-R21

C33-R63

SAD30

c

39ME

BMMBPESIMDSAVLSISASABMPEPESADDistortionPE 2)

39(a) 3) PEPEMB(b) PE(c) NNSADPE

310 3)(a) MBPE4)(b) MBPE 5)(a)

P

0,0

P

0,1

P

0,2

P

1,0

P

1,1

P

1,2

P

2,0

P

2,1

P

2,2

X

1,7

X

1,6

X

1,5

X

1,4

X

1,3

X

0,7

X

0,6

X

0,5

X

0,4

X

0,3

X

2,7

X

2,6

X

2,5

X

2,4

X

2,3

P

0,0

P

0,1

P

0,2

X

1,0

X

1,1

X

1,2

P

1,1

P

1,2

X

2,0

X

2,1

X

2,2

P

2,0

P

2,1

P

2,2

X

3,2

X

3,1

X

3,0

X

2,2

X

2,1

X

2,0

X

1,2

X

1,1

X

1,0

X

0,2

X

0,1

X

0,0

X

2,4

X

2,3

X

1,4

X

1,3

X

0,4

X

0,3

0

P

i,j

: : :

X

0,0

X

0,2

P

1,0

P

0,0

P

0,1

P

0,2

P

1,0

P

1,1

P

1,2

P

2,0

P

2,1

P

2,2

X

1,7

X

1,6

X

1,5

X

1,4

X

1,3

X

0,7

X

0,6

X

0,5

X

0,4

X

0,3

X

2,7

X

2,6

X

2,5

X

2,4

X

2,3

P

0,0

P

0,1

P

0,2

X

1,0

X

1,1

X

1,2

P

1,1

P

1,2

X

2,0

X

2,1

X

2,2

P

2,0

P

2,1

P

2,2

X

3,2

X

3,1

X

3,0

X

2,2

X

2,1

X

2,0

X

1,2

X

1,1

X

1,0

X

0,2

X

0,1

X

0,0

X

2,4

X

2,3

X

1,4

X

1,3

X

0,4

X

0,3

0

P

i,j

: : :

X

0,0

X

0,2

P

1,0

310

(3)

HD2001012/(2) LSI

31VLSISAD10.55) 13) 6) SAD 7), 10)SAD 8) 9)

31

11)12), 13) MBMVMBMV 13)

(4)

(a)11)

M = 3P22DD311

2210.531MB411/16PSNR0.3 dB

.

.

311

(b) 13)

221MB21/4312

312

PSNRMPEG0.40.8 dB

210.553SAD0.53topbottom

MPEG-2DCT3-1-1JPEG

3-1-4 H.264[200911 ]

(1)

H.264 14) 313MPEG-2DCTCABAC1/4RDMPEG-2210ME/MCCABACVLSI

ME/MC:

IPD:

Trans.:Q:

Inv. Trans.:

IQ: RC:

DBF:

EC:

IQ

Inv.

Trans,

EC

FIFO

RC

ME

MC

Q

Frame M

DBF

IPD

Trans.

ME/MC:

IPD:

Trans.:Q:

Inv. Trans.:

IQ: RC:

DBF:

EC:

IQ

Inv.

Trans,

EC

FIFO

RC

ME

MC

Q

Frame M

DBF

IPD

Trans.

313H.264

(2)ME/MC

MPEG-216161/2ME/MCH.26431416161688168848448443741MV61/4ME/MCMRFMultiple Reference FramesMPEG-2IMEInteger ME1/4FMEFractional pixel MEFMEMV

16 16

16 8

8 16

8 8

8 8

8 4

4 8

4 4

7MV1+2+2+4+4(2+2+4) = 41

16 16

16 8

8 16

8 8

8 8

8 4

4 8

4 4

7MV1+2+2+4+4(2+2+4) = 41

314H.264

IMELSIBM1616BMSA4416SADSAD41SAD

315SA41MEBMPE 15)32SAPESAD81621844SAD262SADPE261541SAD16PE161641SADSADBUSSADME1616BM2561640962SABM 16), 17)

BM3232LSISDTVTVMVHDHD3-1-2(3)SDTV88ME

AD

D

MuxA

MuxC

MuxB

D0D1

D7

D8D9D13

MuxDMuxFMuxE

C

R

SAD BUS

PE

0

D

PE

1

D

PE

15

D

Contro

l

SADBUS

0

SADBUS

1

5

R0

R1

Min

Min

MV

MV

.

Stage2

accumulator

Stage1

accumulator

12388810,11,14,15130258

2411616Full MB81261

1408162,3,6,7,10,11,14,15111260

0391688,9,10,11,12,13,14,1580259

11378414,15130257

4544313363

3444213259

23440,113256

1244113155

0144013051

SAD

Bus

VectorSizeBLKStage

2 reg

Stage

1 reg

CLK

AD

D

MuxA

MuxC

MuxB

D0D1

D7

D8D9D13

MuxDMuxFMuxE

C

R

SAD BUS

PE

0

D

PE

1

D

PE

15

D

Contro

l

SADBUS

0

SADBUS

1

5

R0

R1

Min

Min

MV

MV

.

Stage2

accumulator

Stage1

accumulator

12388810,11,14,15130258

2411616Full MB81261

1408162,3,6,7,10,11,14,15111260

0391688,9,10,11,12,13,14,1580259

11378414,15130257

4544313363

3444213259

23440,113256

1244113155

0144013051

SAD

Bus

VectorSizeBLKStage

2 reg

Stage

1 reg

CLK

12388810,11,14,15130258

2411616Full MB81261

1408162,3,6,7,10,11,14,15111260

0391688,9,10,11,12,13,14,1580259

11378414,15130257

4544313363

3444213259

23440,113256

1244113155

0144013051

SAD

Bus

VectorSizeBLKStage

2 reg

Stage

1 reg

CLK

315MV

FMEIME41MV1/21/41MVMBPMVMVMVMCMBMBMVFME

(3)ME/MC

(a)IME 18)

MBAFFIME2VLSICRCSComplementary Recursive Cross SearchMVMB

IME4CRCS1-DSSADRCS2SAD1/2field-168field-88frame-1616frame-168frame-816frame-886SAD

field-168Top FieldBottom fieldMB PairMB8MVMBFSMPFSSBFSSB3IME3RBM4FSFSMPMB16324SAD

IME128H64VJMUMHS41 %HDPSNR0.06 dB

from

lower

(a)SA(b)PESRE

PU0

SRU0PU1

SRU1

SRU2

PU2

PU3

SRU3

SRU5

SRU7

PEPEPE

PEPEPE

PE

PEPE

SRESRE

SRE

SRESRE

SRE

SRESRE

SRE

REG_VS

SWRAM

Cross path

MUX

Reg.

Abs.

Reg.

PE

from

right

from

right

for

right

for

right

for

upper

from

SWRAM

SAD

from

lower

from

template

MUX

Reg.

SRE

from

right

from

right

for

right

for

right

for

upper

from

SWRAM

from

lower

(a)SA(b)PESRE

PU0

SRU0PU1

SRU1

SRU2

PU2

PU3

SRU3

SRU5

SRU7

PEPEPE

PEPEPE

PE

PEPE

SRESRE

SRE

SRESRE

SRE

SRESRE

SRE

REG_VS

SWRAM

Cross path

MUX

Reg.

Abs.

Reg.

PE

from

right

from

right

for

right

for

right

for

upper

from

SWRAM

SAD

from

lower

from

template

MUX

Reg.

SRE

from

right

from

right

for

right

for

right

for

upper

from

SWRAM

PU0

SRU0PU1

SRU1

SRU2

PU2

PU3

SRU3

SRU5

SRU7

PEPEPE

PEPEPE

PE

PEPE

SRESRE

SRE

SRESRE

SRE

SRESRE

SRE

REG_VS

SWRAM

Cross path

MUX

Reg.

Abs.

Reg.

PE

from

right

from

right

for

right

for

right

for

upper

from

SWRAM

SAD

from

lower

from

template

MUX

Reg.

SRE

from

right

from

right

for

right

for

right

for

upper

from

SWRAM

316IME

IMEVLSIRRSARAMSWRAM316(a) RRSARRSASASBSA(b) SBSAPESRSBSA88PE88SR88SAD33(a)SARRSA24SBSASBSA88SADSWRAM1-DSRBMFS

(b)ME/MC 19)

422HDTVHDME/MCBM

317(a) IME2488BMMVIME33(b) SA44 PE2

(1)

(3)

(2)

(4)

MB

88

(1)

(3)

(2)

(4)

MB

88

a24

88 (1)

88 (2)

88 (3)

88 (4)

16 16

16 16

16 16

16 16

168

168

168

168

816

816

816

816

WP

16

PA

PA

PA

TQ

SIMD

MC SIMD

RISC

88 (1)

88 (2)

88 (3)

88 (4)

16 16

16 16

16 16

16 16

168

168

168

168

816

816

816

816

WP

WP

16

PA

PA

PA

TQ

SIMD

MC SIMD

RISC

b88FMEcFME

317ME/MC

IMEMV317(b) 1616168816881/21/4FME168IME88 (1)88 (2)MV168FME2FME317(c)

IME1/2SIMD8PE1/21/49PE1682161616881688L0L1MESADMVMV

MVMCMCSIMD8PEFME318IMEFMEMC1 MB

HQHQ

H

Q

HQHQ

H

Q

16

PA

PA

MC

1

2

3

4

1

2

3

4

L0

Direct

Mode

L1

168U

1616

168D

816L

816R

88UL

88UR

88DL

88DR

L0L0L0

L1

L1L1

L0

L1

L0L0L0

L1L1

L1

L0

L1

L0L0L0

L1L1

L1

L0

L0

L1

L1

L0

L0

L1

L1

1616

168U

1616

168D

1616

816L

816R

88

UL

168

U

816

L

88

UR

168

D

816

R

88

DL

1616

88

DR

H

Q1/4

L0List 0

L1List 1

UD

LR

ULUR

DLDR

IMEMV

HQHQ

H

Q

HQHQ

H

Q

16

PA

PA

MC

1

2

3

4

1

2

3

4

L0

Direct

Mode

L1

168U

1616

168D

816L

816R

88UL

88UR

88DL

88DR

L0L0L0

L1

L1L1

L0

L1

L0L0L0

L1L1

L1

L0

L1

L0L0L0

L1L1

L1

L0

L0

L1

L1

L0

L0

L1

L1

1616

168U

1616

168D

1616

816L

816R

88

UL

168

U

816

L

88

UR

168

D

816

R

88

DL

1616

88

DR

H

Q1/4

L0List 0

L1List 1

UD

LR

ULUR

DLDR

IMEMV

318MVMC

ME/MCSIMDPEPAFFMBAFF422Intra/Inter 271.75 / +199.75 109.75 / +145.75JMPSNR

(4)

H.264MPEG-2DCTICD16

44161644916164MCMB

(5)DF

DFDCT361 MB32164821MB1921DF 20), 21)

(6)CABAC

CABAC3202222 22), 23)

22Bin2

2Bin01RangeLowRangeLow9100x1FE0x000MPSLPSLPSrLPSstateRangeMPSRangerLPS

312LPSRangerLPSLowMPSState0MPSLPSMPSLowRangeMPSMPSstateRange= 0x200

Low = Low-0x100

Low= Low-0x200

Rang e= Range 0

B

Y

N

Renorm

Sub

Renorm

Sub

Renorm

Sub

Renorm

Sub

Renorm

Sub

Renorm

Sub

Range

Low

Range#0

Low#0

Range#1

Low#1

xRange

xLow

d#1

d#2

n

d#n

n

Range

Low

xd

Range

Low

xd

Range#2

Low#2

Range#n

Low#n

xd

xd#1

xd#m

xvalid

Selector

R < 0x100

Low < 0x100

Low >= 0x200

Low = Low-0x100

Low= Low-0x200

Rang e= Range 0

B

Y

N

Renorm

Sub

Renorm

Sub

Renorm

Sub

Renorm

Sub

Renorm

Sub

Renorm

Sub

Range

Low

Range#0

Low#0

Range#1

Low#1

xRange

xLow

d#1

d#2

n

d#n

n

Range

Low

xd

Range

Low

xd

Range#2

Low#2

Range#n

Low#n

xd

xd#1

xd#m

xvalid

Selector

313CABAC

bin

3-1-5 [200911 ]

CBRVBR

MBSAD212

GOPMB3GOPGOP1MB2MBMBMB3VBRCBRFIFO

3-1-6 LSI[200911 ]

LSI3-1-33-1-4DHBMPUDSPMPAMPADSPSIMDHMAVPACPADHBMPA

(1)DHB

MPEG-2LSI314 24)MEVLCMBMBFIFOSDRAM

VIF

M

E

MC

Host I/F

Memory Interface

DCTQ

IDCT!Q

VLC

SDRAM

II:

I:

III:

RISC

-CPU

-FIFOMB

A

B

MB

: MBFIFO

:

:

VIF

M

E

MC

Host I/F

Memory Interface

DCTQ

IDCT!Q

VLC

SDRAM

II:

I:

III:

RISC

-CPU

-FIFOMB

A

B

MB

: MBFIFO

:

:

314DHBMPEG-2LSI315

315RCRISCLSIDHB

H.264LSI316 25)M-CoreME/MCIPDTQMBC-CoreLFECMB32bit RISCMRISCCRISCTRISC2DRAMeDRAMDDR

TRISC

TRISC

TS

Host I/F

Host I/F

C-core

C-core

M-core

M-core

VIF

VIF

Video

Audio/User

Data

MUX

MUX

MIF

MIF

FME

TQ

SMEIPD

MRISC

EC

CRISC

eDRAM

eDRAM

Host CPU

TME

IR

MBP

RIT

IR

MBP

RIT

LF

MDT

MDT

Mobile DDR

From/to

upper/lower chip

TRISC

TRISC

TS

Host I/F

Host I/F

C-core

C-core

M-core

M-core

VIF

VIF

Video

Audio/User

Data

MUX

MUX

MIF

MIF

FME

TQ

SMEIPD

MRISC

EC

CRISC

eDRAM

eDRAM

Host CPU

TME

IR

MBP

RIT

IR

MBP

RIT

LF

MDT

MDT

Mobile DDR

From/to

upper/lower chip

316DHBH.264/MPEG-2LSI

(2)HMA

LSI3.17 26)MIPSSPARC32RISCDSPMEDCTIPDMCVLIWSIMDH.264MB 27)MECABAC

CPUPCIHDTVIPCDDR

LSIMPEG-2H.264VC14221080/60pCPUDSP

MIPS

Processor core

MIPS

Processor core

Video

I/O

Scaling

Video

I/O

Scaling

Video DSP

Video DSP

Motion

Estimation

Motion

Estimation

Entropy

Coding

Entropy

Coding

Memory

Controller

Memory

Controller

Transport

I/O

Transport

I/O

IPC

I/F

IPC

I/F

Audio

I/O

Audio

I/O

Serial

I/O

Serial

I/O

Host

I/F

Host

I/F

SPARC

Processor core

SPARC

Processor core

TS

Video

Audio

Host CPU

stream

Adjacent

chip

MIPS

Processor core

MIPS

Processor core

Video

I/O

Scaling

Video

I/O

Scaling

Video DSP

Video DSP

Motion

Estimation

Motion

Estimation

Entropy

Coding

Entropy

Coding

Memory

Controller

Memory

Controller

Transport

I/O

Transport

I/O

IPC

I/F

IPC

I/F

Audio

I/O

Audio

I/O

Serial

I/O

Serial

I/O

Host

I/F

Host

I/F

SPARC

Processor core

SPARC

Processor core

TS

Video

Audio

Host CPU

stream

Adjacent

chip

317H.264LSI

(3)VLIWVPA

318 28)VP TileSP Tile2PEPEVLIW2DMALSIAHBOCPAXIVP TileSIMDSP Tile

VP Tile

DMA

Vector

Processor

1

VP Tile

DMA

Vector

Processor

2

System

Controller

Memory

Subsystem

Display

Driver

SP Tile

DMA

Vector

Processor

System Bus

VLIW Processor

Program Memory

Scalar P

Processing

Vector P

Processing

Data Memory

(Scalar Access)

Data Memory

(Vector Access)

2D Vector

Memory

DMA

Program Counter

VLIW Instruction

Load/Store

Load/Store

Load/Store

ctrl

VP Tile

VLIW

Processor

Program

Memory

Scalar P

Data Memory

(Scalar Access)

DMA

Program Counter

Load/Store

ctrl

SP Tile

SP Tile

DMA

Vector

Processor

SP Tile

DMA

Scalar

Processor

VP Tile

DMA

Vector

Processor

1

VP Tile

DMA

Vector

Processor

2

VP Tile

DMA

Vector

Processor

1

VP Tile

DMA

Vector

Processor

2

VP Tile

DMA

Vector

Processor

1

VP Tile

DMA

Vector

Processor

2

System

Controller

Memory

Subsystem

Display

Driver

SP Tile

DMA

Vector

Processor

SP Tile

DMA

Vector

Processor

System Bus

VLIW Processor

Program Memory

Scalar P

Processing

Vector P

Processing

Data Memory

(Scalar Access)

Data Memory

(Vector Access)

2D Vector

Memory

DMA

Program Counter

VLIW Instruction

Load/Store

Load/Store

Load/Store

ctrl

VP Tile

VLIW

Processor

Program

Memory

Scalar P

Data Memory

(Scalar Access)

DMA

Program Counter

Load/Store

ctrl

SP Tile

SP Tile

DMA

Vector

Processor

SP Tile

DMA

Vector

Processor

SP Tile

DMA

Scalar

Processor

VP Tile

DMA

Vector

Processor

1

VP Tile

DMA

Vector

Processor

2

318VLIW

(4)CPA

RISCSIMD 30) 29)

319RISC128SIMDSIMDSIMDSIMDDCTMCMECABACSIMDDMASIMDSIMDMPEG-4H.26410

Hardware Platform

SIMD MP #1,128-bit

SIMD MP #2, 128-bit

Local Scratchpad Memory (SDM1)

ARC 700 CPU

Instruction

Cache

Data

Cache

Entropy

Encode

(EE)

Entropy

Decode

(ED)

Motion

Estimation

(ME)

Video

Optimized

DMA

Engine

Channel

Configurable Ports, AHB, AXI, BVCI

Hardware Platform

SIMD MP #1,128-bit

SIMD MP #2, 128-bit

Local Scratchpad Memory (SDM1)

ARC 700 CPU

Instruction

Cache

Data

Cache

Entropy

Encode

(EE)

Entropy

Decode

(ED)

Motion

Estimation

(ME)

Video

Optimized

DMA

Engine

Channel

Configurable Ports, AHB, AXI, BVCI

319

(5)GPPGeneral Purpose Processor

SIMDMMXMulti Media eXtension31) SSEStreaming SIMD Extension

MMXSIMDx86IBMPowerPCAltiVec(1)(4)GPPMMXSSEMPEG-2H.264HDTVPC 32)

MMXGPPRSVPReconfigurable Streaming Vector Processor33) GPUGPPGPUGPP

(6)

Time-to-MarketGPPDHBDHBGPPH.264DHBGPPMPAH.264IPGPPDHBCPUMPACPAVPADHBGPPMPAHDTVH.264GPPMPADHBRTLDHBDHB

GPPMPAGPPDHBHMACPA

1)ISO/IEC 13818-1/2/3 International Standard, Information technology Generic Coding of Moving Pictures and Ass0ciated Audio: Systems/Visual/Audio, Nov. 1994.

2)Ching-Yeh Chen, Shao-Yi Chien, Yu-Wen Huang, Tung-Chien Chen, Tu-Chih Wang, and Liang-Gee Chen, Analysis and Architecture Design of Variable Block-Size Motion Estimation for H.264/AVC, IEEE Trans. Circuits and Systems, vol. 53, no. 2, pp.578-593, Feb. 2006.

3)T.Minami, T. Kondo, K. Suguri, and R. Kasai, A Proposal of a one-dimensional Array Architecture for Full-search Block Matching Algorothm, IEICE Trans. Inf. & Syst. (Japanese Edition), vol. J78-D-1, no. 12, pp. 913-925, Dec. 1995.

4)S. Uramoto, A. Takabatake, M. Suzuki, H.Sakurai, and M. yoshimoto, A Half-pel Precision motion estimation processor for NTSC-resolution video, IEICE Trans. Electron., vol. E77-C, no.12, pp.1930-1936, Dec. 1994.

5), , S. Scotznivosky, , , MPEG-2LSIME3, , ICD98-116, pp.29-36, 1998-08.

6)E. Ogura, M. Takashima, D. Hiranaka, T. Ishikawa, Y. Yanagita, S. Suzuki, T. Fukuda, and T. Ishii, A 1.2W Single-Chip MPEG-2 MP@ML Video Encoder LSI including Wide Search Range Motion Estimation and 81MOPS Controller, Digest of ISSCC, pp.32-33, Feb. 1998.

7)L. K. Liu and E. Feig, A block-based gradient descent search algorithm for block motion estimation in video coding, IEEE Trans. Circuits Syst. Video Technology, vol. 6, pp. 419423, Aug. 1996.

8), , , , , , ICD97-94, pp.1-8, 1997-8.

9)Z. He and M.-I. Liou, Reducing hardware complexity of motion estimation algorithms using truncated pixels, in Proc. IEEE Int. Symp.Circuits Syst., pp. 28092812, 1997.

10)ISO/IEC ITU-T VCEG, Fast Integer Pel and Fractional Pel Motion Estimation for JVT, JVT-F017, 2002.

11)Kazuhito Suguri, Toshihiro Minami, Hiroaki Matsuda, Ritsu Kusaba, Toshio Kondo, Ryota Kasai, et.al , A real-time motion estimation and compensation LSI with wide search range for MPEG2 video encoding, IEEE Micro, vol. 16, no. 2, pp. 51-58, April 1996.

12), , , , 1MPEG-2LSI, , ICD98-50, pp.1-8, 1998-06.

13), , , , , , , MPEG-2LSI, , ICD97-163, pp.33-40, 1997-10.

14)ISO/IEC 14496-10:2003, Information technology Coding of audio-visual objects Part 10: Advanced Video Coding, Dec. 2003.

15)S. Y. Yap and J. V. McCanny, A VLSI Architecture for Variable Block Size Video Motion Estimation, IEEE Trans. Circuits and Systems, vol. 51, no. 7, pp.384-389, July 2004.

16)Tung-Chien Chen, Yu-Wen Huang, and Liang-Gee Chen, Fully utilized and reusable architecture for fractional motion estimation of H.264/AVC, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP04), V-9-12, vol. 5, May 2004.

17)Y.W. Huang, T. C. Chen, C. H. Tsai, C. Y. Chen, T.W. Chen, C. S. Chen, C. F. Shen, S.Y. Ma, T. C.Wang, B.Y. Hsieh, H. C. Fang, and L. G. Chen, A 1.3 tops H.264/AVC signle-chip encoder for HDTV applications, in Proc. Int. Solid-State Circuits Conf., 2005, pp. 128129.

18)Y. Murachi, J. Miyakoshi, M. Hamamoto, T. Iinuma, T. Ishihara, F. Yin, J. Lee, H. Kawaguchi, and M Yoshimoto, A Sub 100mW H.264 [email protected] Integer-pel Motion Estimation Processor core MBAFF Encoding with Reconfigurable Ring-Connected Systolic Array and Segmentation-Free, Rectangle Access Search-Window Buffer, IEICE Trans. Electron., Vol. E91-C, No.4, pp.465-478, April 2008.

19)Takayuki Onishi, Takashi Sano, Koyo Nitta, Mitsuo Ikeda, and Jiro Naganuma, Multi-Reference and Multi-Block-Size Motion Estimation with Flexible Mode Selection for Professional 4:2:2 H.264/AVC Encoder LSI, IEEE International Symposium on Circuits and Systems (ISCAS 2008), pp.800-803, May 2008.

20)L. Li, S. Goto, and T. Ikenaga, A Highly Parallel Architecture for Deblocking Filter in H.264/AVC, IEICE Trans. on Information and System, vol. E88-D, no. 7, pp.1623-1629, 2005.

21)S. Y. Shih, C. R. Charng, and Y. L. Lin, A Near Optimal Deblocking Filter for H.264 Advanced Video Coding, Asia South Pacific Design Automation Conference, 2006, pp. 170-175

22)D. Marpe, T. Wiegand, and H. Schwarz, Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard, IEEE Transactions on Circuits and Systems for Video Technology, 13(7):620-636, July 2003.

23), , 22007-214068, 20078.

24)Mitsuo Ikeda, Toshio Kondo, Koyo Nitta, Kazuhito Suguri, Takeshi Yoshitome, Toshihiro Minami, Hiroe Iwasaki, Katsuyuki Ochiai, Jiro Naganuma, Makoto Endo, Yutaka Tashiro, Hiroshi Watanabe, Naoki Kobayashi, Tsuneo Okubo, Takeshi Ogura, and Ryota Kasai, SuperENC MPEG-2 Video Encoder Chip, IEEE Micro, vol. 19, no. 4, pp. 56-65, July 1999.

25)Koyo Nitta, Mitsuo Ikeda, Hiroe Iwasaki, Takayuki Onishi, Takashi Sano, Atsushi Sagata, Yasuyuki Nakajima, Minoru Inamori, Takedhi Yoshitome, Hiroaki Matsuda, Ryuichi Tanida, Atsushi Shimizu, Ken Nakamura, and Jiro Naganuma, An H.264/AVC High422 Profile and MPEG-2 422 Profile Encoder LSI for HDTV Broadcasting Infrastructures, 2008 Symposium on VLSI Cicuits Digest of Technical Papers, pp.106-107, June 2008.

26)S. Bewick, An Advanced Media Processor Architecture for Consumer Applications, Microprocessor Forum 2006, Oct. 2006.

27)J.W. van de Waerdt, S. Vassiliadis, et.al., The TM3270 media-processor, In MICRO '05 Proceedings of the 38th International Symposium on Micro-architecture, pp. 331-342, Nov. 2005.

28)J. Leijten, New Core Enables Programmable Camera Sensor Signal Processing, Microprocessor Forum 2006, Oct. 200

29)Nigel Topham, Efficient Multi-Processing for Media Applications, Spring Pprocessor Forum 2006, May 2006.

30)Dennis Moolenaar, A Dual-Core Video Decoder/Encoder, Microprocessor Forum 2007, May 2007.

31)A. Peleg and U. Weiser, MMX Technology extension to the Intel Architecture, IEEE Micro, vol. 16, No. 4, pp.42-50, Aug. 1996.

32)Iwasaki, H., Naganuma, J., Ochiai, K., Endo, M. and Ogura, T., "Real-Time Software MPEG2 Video Encoder on Parallel and Distributed Computer System, International Conference on Computer Communication (ICCC) 1999, pp.361-369, 1999.

33)J. Ciricescu, R. Essick, B. Lucas, P. May, K. Moat, J. Morris, M. Schuette, and A. Saidi, The Reconfigurable Streaming Vector Processor (RSVP), In MICRO'03: Proceedings of the 36th International Symposium on Micro-architecture, pp.141-150, Dec. 2003.

10 - 5 - 3

3-2

[20094 ]

2

3-2-1

2image processingimage recognitionTV3-2-1

816

1632

1-3)32

3-1

3-2-2 LSI

31

31

816

1632

32

DoGHarr

FFT

SVM 2)GLVQ 3)Adaboost 1)

32LSI

3-2 LSI

LSI

LSI

32

816

1632

VLIW

416

SIMD

SIMDSIMD

3-2-3 LSI

LSISIMDLSISIMDMIMDSIMDSIMD+MIMD

4) 5) 6)

PE7), 8)PE9)10)

SIMDPEPEPE11), 12)PEK13), 14)

MIMDPEPE15), 16)SIMDPESIMD17)PE18)

SIMDSIMD19), 20)

SIMDSIMDMIMDSIMD21)SIMDMIMD222)SIMDMIMDMIMD23)

TVTISVPSerial Video Processor11)NECIMAPCARIntegrated Memory Array Processor for CAR14) 3-2

SVPSIMDSVP3-2(a) DIRDOR102440102424PE102412801RF0RF11ALU4M, A, B, C1024PE0PE1023PE4PEPE1282RF0RF1DIRDOR

32

IMAPCAR3-2(b) 2 KBPE128PEPECPDMAEIFSVPSIMDIMAPCARPEPE233(a)PE24)331(b)(d)SIMD

33

IMAPCAR34

34

3-2-4

LSILSILSILSI

1) P.Viola, M.Jones: Rapid Object Detection using a Boosted Cascade of Simple Features, IEEE Computer Vision and Pattern Recognition Conference (CVPR), pp.I-511-518 (2001)

2) C. Cortes, V. Vapnik: "Support-Vector Networks, Machine Learning, Vol.20, No.3, pp.273-297(1995)

3) A. Sato, K. Yamada: "Generalized Learning Vector Quantization", Advances in Neural Information Processing Systems, Vol.8, pp.423-429, MIT Press (1996)

4) , , : ,, ICD2003-28, pp. 13-18 (2003).

5) , , , , , , : IP5000 , , pp.45-50 (1999)

6) Y. Hanai, Y. Hori, J. Nishimura and T. Kuroda: A Versatile Recognition Processor Employing Haar-Like Feature and Cascaded Classifier, IEEE International Solid-State Circuits Conference (ISSCC), pp.148-149 (2009)

7) P.P.Jonker, E.R.Komen: A scalable real-time image processing pipeline, Proc. of IAPR Conf. on Pattern Recognition (ICPR92), Vol.4, pp.142-146 (1992)

8) S.M.Chai, D.S.Wills: Systolic opportunities for multidimensional data streams, IEEE Transactions on Parallel and Distributed Systems, Vol.13, No.4, pp.388-398 (2002)

9) ,, , , , , : ,(D), Vol.J73-D2 , No.10, pp.1751-1760 (1990)

10)T.Temma, T.Iwashita, K.Matsumoto, H.Kurokawa, T.Nukiyama: Data Flow Processor Chip for Image Processing, IEEE Trans. on Electron Devices, Vol. ED-32, No.9, pp.1784-1791(1985)

11) J.Childers, P.Reinecke, H.Miyaguchi, S.Yamamoto, Y.Takahashi, Y.Yaguchi, M. Takeyasu: SVP: Serial Video Processor, IEEE 1990 Custom Integrated Circuits Conference, 17.3, pp.1-4 (1990)

12) R.P.Kleihorst, A.A.Abbo, A. van der Avoird, M.J.R.Op de Beeck, L. Sevat, P. Wielage: Xetal: A Lowpower HighPerformance Smart Camera Processor, IEEE Int. Symp. Circuits System, Vol.5, pp.215218 (2001)

13) D.W.Hammerstrom and D.P.Lulich: Image Processing Using One-Dimensional Processor Arrays, Proceedings of the IEEE, Vol.84, No.7,pp. 10051018 (1996)

14)S.Kyo, S.Okazaki, T.Koga, F.Hidano: A 100 GOPS In-vehicle Vision Processor for Pre-crash Safety Systems Based on a Ring Connected 128 4-Way VLIW Processing Elements, Digest of Technical Papers of Symp. on VLSI Circuits, pp.28-29 (2008).

15) K.Deguchi, K.Tago, I.Morishita: Integrated parallel image processings on a pipelined MIMD multi-processor system PSM, Proc. of 10th International Conference on Pattern Recognition, Vol.2, pp.442-444 (1990)

16) T.Matsuyama, N.Asada, M.Aoyama: Parallel Image Analysis on Recursive Torus Architecture, Proc. of Second IEEE Workshop on Computer Architecture for Machine Perception (CAMP), pp.202-214 (1993)

17) S.K.Raman, V.Pentkovski, J.Keshava: Implementing streaming SIMD extensions on the Pentium III processor, IEEE Micro, Vol.20, No.4, pp.47-57 (2000)

18)Y. Kondo et al.: 4 GOPS 3 way-VLIW image recognition processor based on a configurable media-processor, ISSCC Digest of Technical Papers, pp.148-149 (2001)

19) Chih-Chi Cheng, Chia-Hua Lin, Chung-Te Li, Samuel Chang, Chia-Jung Hsu, Liang-Gee Chen iVisual: An Intelligent Visual Sensor SoC with 2790fps CMOS Image Sensor and 205GOPS/W Vision Processor, ISSCC Digest of Technical Papers, pp.306-307 (2008)

20) Kwanho Kim, Seungjin Lee, Joo-Young Kim, Minsu Kim, Donghyun Kim, Jeong-Ho Woo, Hoi-Jun Yoo A 125GOPS 583mW Network-on-Chip Based Parallel Processor with Bio-inspired Visual-Attention Engine, ISSCC Digest of Technical Papers, pp.308-309 (2008)

21) H.J.Siegel, L.J.Siegel, F.C.Kemmerer, P.T.Mueller Jr., H.E.Smalley Jr., S.D.SmithPASM: A partitionable SIMD/MIMD system for image processing and pattern recognition, IEEE Trans. on Computer, Vol.C-30, pp.934-947, (1981)

22) C.C.Jr.Weems: The Second Generation Image Understanding Architecture, Proc. of Second IEEE Workshop on Computer Architecture for Machine Perception (CAMP), pp.276285 (1993)

23) S. Kyo, T. Koga, H. Lieske, S. Nomoto, S. Okazaki: A Mixed-Mode Parallel Processor Architecture for Embedded Systems, ACM Int. Conf. on Supercomputing(ICS), pp. 253-262 (2007)

24) S. Kyo, S. Okazaki, T. Arai: An Integrated Memory Array Processor for Embedded Image Recognition Systems, IEEE Trans. on Computers, Vol.56, No.5, May 2007.

10 - 5 - 3

3-3

[200912 ]

3-3-1

CELPCode Excited Linear Prediction coding31CELP

31CELP

MIPSLSIDSP

VSELP11.2 kbpsPsi-CELP5.6 kbps2.5G3GAMREVRC2G

32CPUDSPDSP

I/F

I/F

IrDAGPSFelica

CODEC

CODEC

I/F

I/F

IrDAGPSFelica

CODEC

CODEC

32

IPIPG.711PCMG.723ADPCMG.729CS-ACELPIPPCPCIPIPLANLANIP32LAN

3-3-2

2001iPodMP333MP3

PCM

MDCT

IMDCT

PCM

PCM

MDCT

IMDCT

PCM

33MP3

MDCTmmLSI34

I/F

I/F

I/F

I/F

34

MP3AACAdvanced Audio CodingATRACAdaptive Transform Audio CodingWMAWindows Media AudioLSI34LSILSITVDVDBDHD-DVDHD-Audio5.1 ch7.1 ch11.2 chDTSDigital Theater SystemTrueHDMHzDSP

3-3-3

ECNCTVECM

(1)

35

1stPass

2ndPass

1stPass

2ndPass

35

353521219902000PC 1PC 1352,30%2GDistributed Speech RecognitionETSIThe European Telecommunications Standard Institute2000200636

36

37CPU1990PC37CPUPCCPUCPUPCMBMBMB

37

(2)

IC.2ch

1999200438DSPDSPCODECLSICPULSI

FIR

FIR

38

(3)

TVTV16 kHz22 kHzTVTV32 ch12 chDSPLSIDSPDSPLSIFPGA

(4)

2 ch4 chECMElectlet Condensor MicrophoneMEMSMicro Electro Mechanical SystemA/D1mmECM

3-3-4

VADVoice Activity DetectionVAD

CPUDSPLSI

10 - 5 - 3

3-4

[20094 ]

3-4-1

graphicCGComputer Graphics333D3 dimensionCG3renderingLSIGPUGraphics Processing Unit

(1)

CGpointlinesurface2line segment

3trianglepolygontriangle strip

(2)

CGCGframe buffer

:pixel2double buffer

(3)rasterize

2XY2rasterize

(4)APIApplication Programming Interface

APIAPIAPIAPI

3-4-2 2D

22 dimension graphics2

(1)

(a) bitmap

2

(b) vector

scalable

(2)font

bitmap fontoutline fontFreeType

(3)2CGAPI

API

(a) SVGScalable Vector Graphics

2DW3CXML

(b) OpenVG

2DAPIThe Khronos Groupopenvg

(4)

PC2DCPUGPUCPULSIGPU2

3-4-3 3D

33 dimension computer graphics3DCGCGCG

(1)3D

331/301/602CG

332

polygonpixel

(2)

MPEG-23CGLSI

(3)

3D CG1963Sketchpad19701980LSI3CADWS3CGGWSGWSCGCG

19903CG3CGPCCPU19903PCGPU3PCGWSPC3CG3CG

(4)

3D CGCG

(a) vertex processing

T&Ltransform and lightinggeometry processing

(b)

(c) pixel processing

pixelfragment

(d)

(5)

GWS

(a) LSI

GWSLSILSIGPU2LSICPU

LSI1999GPULSIAPIDSPAPI

(b)

2001Programmable ShaderLSIVertex ShaderPixel ShaderFragment ShaderLSILSIGPU

(c)

Unified Shader Model

Unified Shading Architecture(a)

(6)CG

CG

(a)

depth bufferZ buffer

(b) Texture Mapping

GPUCGGPU

(c)

(d) shadeshadow

CPU

(e)

(7)GPU

2009GPU LSIGPUCPUGPU2

NVIDIAGeForce GTX 2851455 nm470 mm2LSI183 W1.5 GHz2401 Tflops

AMDRadeon HD 48709.5655 nm282 mm2TDPThermal Design Power160 WVLIW750 MHz8001 Tflops

GPU

GPU

IntelLarrabeeGPUGPUCPUx86GPUCPUCPUGPUCG

(8)

3CGDDRGDDRHDCG2

(9)CGAPI

CGAPIRenderManShading LanguageCCG

OpenGLDirectX GraphicsCGAPI

(a) OpenGL

OpenGLSGIIrisGLAPIThe Khronos GroupOpenGLOSCADOpenGL 1.xOpenGL 2.0OpenGLGLSLOpenGL Shading LanguageOpenGL 3.0GPUAPI

OpenGL APIMesa3DGallium3DOpenGL ESOpenGL for Embedded SystemsOpenGL ES 2.0OpenGL 2.0

(b) DirectX Graphics

DirectXWindows OSAPIDirectX3APIDirect3DDirectX GraphicsDirectX GraphicsHLSLHigh Level Shading Language

(c)

nVIDIACgC for GraphicsGPUOpenGLDirectX

(10)CG

CGScene Graph LibraryGame Engine3ShadowCGCGCGAPIPCPC

(1) OpenSG

(2) OpenSceneGraph

3-4-4 GPGPUGeneral Purpose computing on GPU

GPUCGGPGPU

(1)

GPU43CGRGB333x3RGBA4444

(2)GPGPUAPI

CGGPU

(a) CUDACompute Unified Device Architecture

NVIDIAGPGPUGPUPhysXGPUCUDA

(b) OpenCLOpen Computing Language

CGPUCPUDSPThe Khronos Group

(c)

AMDATI StreamCUDAMSDirectXDirext3DDirectX Compute Shader

(3)

GPGPU

CPUGPU

(a)

(b)

CPUHDMPEG-4 AVCH.264GPU

(c)

GPUGPU

(d)

3-4-5 3D

CPUGPU

(a) Global Illumination

3Ray TracingRadiosityPhoton MappingGlobal Illumination3

(b) Non Photo Realistic

32

(c) Image Based Rendering

(d) Volume Rendering

bitmap33

3-4-6

3GPU3CGCGCGGPUCPUCG

1)Real-Time Rendering

http://www.realtimerendering.com/

2)Khronos Group

http://www.khronos.org/

3)OpenGL

http://www.opengl.org/

http://www.khronos.org/opengl/

4)OpenGL ES

http://www.khronos.org/opengles/

5)OpenCL

http://www.khronos.org/opencl/

6)Mesa3D

http://www.mesa3d.org/

7)SVG

http://www.w3.org/Graphics/SVG/

8)Wikipedia

GPU

http://ja.wikipedia.org/wiki/Graphics_Processing_Unit

10 - 5 - 3

3-5

3-5-1[20098 ]

(1)

1)LSILSI1ACSadd-compare-select

(a) 31

ACSLIFOLast In First Out

Hx

H

r

H

H

H

=

(

)

@@

@

r

H

H

H

x

1

H

H

-

=

(

)

H

1

H

ZF

H

H

H

W

-

=

@

2

33

34

3

~

~

min

arg

x

x

R

R

y

-

-

(b) 32

kk1k

(2)

19932), 3)RSCRecursive Systematic Convolutional33r0RSCr1r2RSC

2

44

4

~

min

arg

x

R

y

-

=

4

3

2

1

44

34

33

24

23

22

14

13

12

11

4

3

2

1

0

0

0

0

0

0

x

x

x

x

R

R

R

R

R

R

R

R

R

R

y

y

y

y

341r0r1e2e1102e1r0r2e2

1

1

1

1

1

1

1

1

1

1

1

,

'

'

'

'

-

-

-

-

-

-

-

-

-

-

-

+

=

-

=

=

-

-

=

CA

BE

A

A

F

B

CA

D

E

D

C

B

A

E

CA

E

BE

A

F

C

W

N

M

N

N

M

M

h

h

h

h

h

h

h

h

h

,

,

2

,

1

2

,

2

,

2

2

,

1

1

,

1

,

2

1

,

1

L

M

O

M

M

L

L

MAPMaximum A Posteriori4)Log-MAPMax-Log-MAPSOVASoft-Output Viterbi Algorithm5)31kRSCRSCMAXLUEXPMAPLog-MAPMax-Log-MAPSOVAMAPLog-MAPMax-Log-MAPSOVA

=

D

C

B

A

W

(a) Sliding Window

SRAMLSIN35N

Sliding Window6) NLLK156NSliding Window36372

N

M

N

N

M

M

h

h

h

h

h

h

h

h

h

,

,

2

,

1

2

,

2

,

2

2

,

1

1

,

1

,

2

1

,

1

L

M

O

M

M

L

L

=

D

C

B

A

W

1

1

1

1

1

1

1

1

1

1

1

,

'

'

'

'

-

-

-

-

-

-

-

-

-

-

-

+

=

-

=

=

-

-

=

CA

BE

A

A

F

B

CA

D

E

D

C

B

A

E

CA

E

BE

A

F

C

W

(b) Block-Interleaved Pipelining

Block-Interleaved PipeliningBIP7)BIP1N/212/N+1N212238BIP2

=

4

3

2

1

44

34

33

24

23

22

14

13

12

11

4

3

2

1

0

0

0

0

0

0

x

x

x

x

R

R

R

R

R

R

R

R

R

R

y

y

y

y

2

44

4

~

min

arg

x

R

y

-

39122D1f2ff 1/2310

2

33

34

3

~

~

min

arg

x

x

R

R

y

-

-

1

N

1

N

L

ACS

BIPImproved Sliding Window8)BIP

1)Andrew J. Viterbi: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Trans. Information Theory, Vol. IT-13, No. 2, pp.260-269 (1967)

2)Claude Berrou, Alain Glavieux and Punya Thitimajshima: Claude Berrou: Near Shannon Limit error-correcting coding and decoding: Turbo-codes, Proc. IEEE Int. Conf. Comm., pp.1064-1071 (1993)

3)Claude Berrou and Alain Glavieux: Near Optimum Error Correcting Coding and Decoding: Turbo-Codes, IEEE Trans. Communications, Vol. 44, No. 10, pp.1261-1271 (1996)

4)L. R. Bahl, J. Cocke, F. Jelinek and J. Raviv: Optimal Decoding of Linear Codes for minimizing symbol error rate, IEEE Trans. Information Theory, Vol. IT-20, No. 2, pp.284-287 (1974)

5)Branka Vucetic and Jinhong Yuan: Turbo Codes: Principle and Applications, Kluwer Academic Publishers

6)S. Benedetto, D. Divsalar, G. Montorsi and F. Pollarab: Soft-Output Decoding Algorithms in Iterative Decoding of Turbo Codes, The Telecommunications and Data Acquisition Progress Report 42-124, NASA Jet Propulsion Laboratory, pp.63-87 (1996)

7)Seok-Jun Lee, Naresh R. Shanbhag and Andrew C. Singer: A 285-MHz Pipelined MAP Decoder in 0.18-um CMOS, IEEE J. Solid-State Circuits, Vol. 40, No. 8, pp.1718-1725 (2005)

8)Hiroshi Gambe, Yoshinori Tanaka, Kazuhisa Ohbuchi, Teruo Ishihara and Jifeng Li: An Improved Sliding Window Algorithm for Max-Log-MAP Turbo Decoder and Its Programmable LSI Implementation, IEICE Trans. Electron., Vol. E88-C, No. 3, pp.403-412 (2005)

3-5-2MIMOLSI[200912 ]

MIMOMulti Input Multi OutputLSIMIMO2

A. STC-MIMO

Space Time CodingSTC1)MIMOSISO

B. SDM-MIMO

Space Division MultiplexingSDMMIMOSDM-MIMO

LANIEEE 802.11nWiMAXIEEE 802.16eMANLTESDM-MIMO100 Mbps1 GbpsSTC-MIMO

STC-MIMOMRCLSILSISDM-MIMOInterference CancellerMIMOMIMO Decoder

3-5-1MIMO3-5-2SDM-MIMO3-5-3MIMOZF/MMSELSI3-5-4MLDMIMOLSI

(1)MIMOMIMO

(a) MIMO

1(a)MNMIMO(b)

1MIMO

1M

n

[

]

T

x

x

x

M

2

1

L

=

x

(1)

I = 1, 2, , MQAMn

2,NM

H = [hnm]

ACS

(2)

hnmi.i.d.

3N

n

[

]

T

r

r

r

N

2

1

L

=

r

(3)

N

2

r

1

r

N

r

De

-

Mapper

P/S

2

M

x

2

x

Stearing

Vector

A

Interference Canceller

Mapper

S/P

1

2

M

1

2

N

N

2

r

1

r

N

r

De

-

Mapper

P/S

2

M

x

2

x

Stearing

Vector

A

Interference Canceller

Mapper

S/P

1

2

M

1

2

N

a

1

1

x

2

x

M

x

1

r

2

r

N

r

1,1

h

NM

h

,

1

2

2

M

N

1

1

x

2

x

M

x

1

r

2

r

N

r

1,1

h

NM

h

,

1

2

2

M

N

b

1MIMO

MIMO

n

Hx

r

+

=

(4)

n2

MIMO1xiMnLayered Space Time CodingLSTSISOMNMSTCLST

1MapperQPSKQAMSPxSPMMASPAASDMLSTASDM2) LSTMMILSTSDM

nxMIMOyyxMIMO

MIMO3) MatlabMIMOMIMOMIMOLAN4)

(2)MIMO

MIMOMIMOZero ForcingZFMinimum Mean SquaredMMSESNV-BLAST

(a) Zero ForcingZF

INN

I

WH

=

(5)

MNW(4)x

(

)

Wn

x

n

Hx

W

Wr

y

+

=

+

=

=

(6)

HMNWHMoore-PenroseWZFNMWZFHHH

EMBED Equation.3

(

)

H

1

H

ZF

H

H

H

W

-

=

@

(7)

WWInterference cancellerMIMOMIMO Decoder(7) WZF Zero ForcingZFZF(6)WHZero ForcingZFMIMO(6)

(b) Minimum Mean SquaredMMSE

MMSExy

)

(

N

i

i

xiyiE

M

i

n

y

n

x

E

i

i

,...,

2

,

1

,

)

(

)

(

2

=

-

=

(8)

MMSE(8)Signal to Noise Interference RatioSINRZFSNR

WNM5)

(

)

H

M

H

H

I

H

H

W

1

2

-

+

=

s

MMSE

(9)

2AWGN

=

=

M

I

H

H

0

r

r

2

,

]

[

(6)(9)

(

)

r

H

H

H

y

H

1

H

-

=

(10)

(

)

H

H

H

H

H

1

-

H

Moore-PenroseMMSEZFW(9)2LAN

2MMSEZFBERMMSE(8)SINRZFSNR4

(c) MMSE

LSIMMSE6)

(1)

4422AD

RSC

1

RSC

2

r

0

r

1

r

2

(11)

1

2

r

0

r

1

r

2

e1

e2

(12)

C = BH, B' = C'HA-1B = (CA-1)H

1

1

-

-

-

BE

A

F

CA

BE

A

A

1

1

1

-

-

-

+

222A, B, C, D1(12)

2MMSE MIMO6)

(d) V-BLAST

Vertical-Bell Laboratory Space-Time CodingV-BLASTSN9)ZFMMSE

(6)

(

)

{

}

(

)

{

}

(

)

2

2

i

ZF

i

ZF

i

ZF

ZF

E

N

W

n

W

n

W

s

=

=

*

(13)

E(WZF) i WZFi

(

)

(

)

2

2

,

1

min

arg

i

ZF

M

i

i

k

W

L

=

=

(14)

SN

k1, k2, , kM

(1)

i

k

w

(13)

i

k

i

k

y

(2)

i

k

i

k

x

(3)

i

i

k

k

x

x

=

ri

(1)(3)k1, k2, , kMri

k1, k2, , kMZFki

[

]

=

=

)

(

1

)

(

0

i

j

i

j

j

i

k

k

H

w

(15)

[

]

j

k

ki

i

k

w

ri(15)

+

-

1

i

k

H

ki

k

H

Hk

(e) MIMOLSI

44 MMSE1

1MIMOLSI6)

(7)

(8)

(6)

(10)

ZF

MMSE

MMSE

MMSE V-BLAST

bps

2.6 G

212 M

ASIC

90 nm

43 k

0.25m

89 k

90 nm

1.86M

0.18m

1 mm2

ns

180

600

187.5

Kns

180K

600K

187.5

*MIMO-OFDM

(3)MIMO

Maximum Likelihood DecoderMLDMIMO

MLD(4)RHXj

X

MAX

LU

EXP

MAPLog-MAP

Max-

Log-MAP

SOVA

622

k

825

k

k

22 626

kk

2 224

224

824

k

k

22 224

922

kk

2 122

(16)

I

Q

3MLD

316QAMRMLDMIMO464QAM224QRM-MLDSphere DecodingK-best

(a) QR

QRHQR

QR

H

=

(17)

QR(17)QH(18)

n

Rx

n

Q

QRx

Q

r

Q

y

+

=

+

=

=

H

H

H

(18)

QRRxxMxMx

44 MIMO(18)

[L]

FIFO

[2L]

LIFO

[L]

[N]

[N]

(19)

1

N

12

x464QAM64

(20)

x464x3642

1

2

DD

1

2

D

/ 2

D

/ 2

(21)

x2, x1x1644

(b) Sphere Decoder

M = 3BPSKS1, S2

(1)

MLD23 = 8Sphere DecoderSCSphere Constraint

44s1,2,1,8SC

SCSCSC4s2,1,1, s2,1,2, s2,2,1, s2,2,2

4Sphere decoding algorithm

(2)

Sphere Decoder10)5

Critical path

5Sphere decoding architecture

B) A) SCSphere ConstraintSCSC

PED Storage4PED Storage4

PED StorageSCPED Storage1s1,2,1PED Storages1,1PED Storage

5Mbps

(c) K-best

(1)

K-best(10)QRKK-bestSphere Decoder6K62

K-BestKModified K-Best

BPSK, 3 transmit antenna

K=2

s

1

s

2

s

1,1

s

2,1

s

1,2

s

2,2

s

1,1,1

s

1,1,2

s

1,2,1

s

1,2,2

s

2,1,1

s

2,1,2

s

2,2,1

s

2,2,2

2 5

32 34

7

4

17

45

Stage 1

52

9 8

Stage 2

Stage 3

6 812 11

BPSK, 3 transmit antenna

K=2

s

1

s

2

s

1,1

s

2,1

s

1,2

s

2,2

s

1,1,1

s

1,1,2

s

1,2,1

s

1,2,2

s

2,1,1

s

2,1,2

s

2,2,1

s

2,2,2

2 5

32 34

7

4

17

45

Stage 1

52

9 8

Stage 2

Stage 3

6 812 11

6K-best algorithm

(2)

7K-best1Computation Unit4KMCUMetric Computation Unit1KBUK-best UnitPEDsPEDsKLKMCUKBUKKBULKKKComputation Unit

Computation UnitKK

7K-best

(3) LSI

MLDMIMOLSI2

2MLD-MIMOLSI

Algorithm

Sphere Decoder 11)

K-Best 13)

Number of antenna

Modulation

44

16QAM

44

16QAM

ThroughputMbps

80

424

Latencycycle

11.2

5

Process Technology

0.25m

0.25m

Number of gates

117 k

68 k

Areamm2

3.55

(4)

MIMOBER

(a) BER

3OFDM-SDM

BER1Viterbi

3

Data modulation

QPSK

Number of sub-carriers

512

Number of data-carriers

480

Gurard Interval length

64

Channel estimation

Ideal

Channel model

Quasi-static multipath fading(18-path)

Number of iteration

10000

(1)

8ZFMMSEBLASTMLDMLD

844 MIMOBER

(b) MLDBER

9MLDMLDASSESS 14) ViterbiK-BestMLDBERZF

9MLDBER

(c)

4MIMOV-BLASTO(M3)O(M4)MLK-best13)

4SDMOFDM-SDM1OFDM-SDML4L

4MIMO

M = N

NM+1

O(M3)

V-BLAST

NM+1+

O(M4)

M

MLD(K-Best)

N

O(M6)

SDM-MIMO489

(5)

MIMOSDMMIMOZFMMSEV-BLASTMLD

MIMO-SDM

Space Time Coding: STC1)

STBC15)

Water Filling 16)

17)

1 GbpsMIMO18)

1)S. M. Alamouti, A simple transmit diversity technique for wireless communications, IEEE Journal on Selected Areas in Communications, vol16, no.8, pp.1451-1458, O-ctober 1998.

2), , .

3)J. P. Kermoal, L. Schumacher, K. I. Pedersen, P. E. Mongensen, and F. Fredriksen, A stochastic MIMO radio channel model with experimental validation, IEEE Jour. Selec. Areas Commun., vol.20, 6, pp.1211-1226, 2002.

4)http://www.radrix.com/?page_id=158

5), , MIMO, , 2009.

6)S. Yoshizawa, Y. Yamauchi and Y. Miyanaga, VLSI Implementation of a Complete Pipeline MMSE Decoder for a 4x4 MIMO-OFDM Receiver, IEICE Trans.Fundamental, vol.EA91-A, n0.7, pp.1757-1762, 2005.

7)J. Eilert, D. Wu, and D. Liu, Efficient complex inversion for MIMO software defined radio, IEEE ISCAS, pp.2610-2613, May 2007.

8)A. Burg, and et.al, Algorithm and VLSI architecture for linear MMSE detection in MIMO-OFDM systems, IEEE ISCAS pp.4102-4105, May 2006.

9)G. D. Golden, C. J. Foschini, R. A. Valenzuela and P. W. Wolniansky, Detection algorithm and initial laboratory results using V-BLAST space-time communication architecture, Electronics Letters, vol.35, no.1, pp.14-15, January ()

10)F. Sobhanmanesh, S. Nooshabadi, K. Kiseon, A 212 Mb/s Chip for 4 4 16-QAM V-BLAST decoder, Proc. of MWSCAS 2007, pp.1437-1440, 5-8 Aug. 2007.

11)A. Burg, et.al. , VLSI implementation of the sphere decoding algorithm, ESSCIRC 2004. Proceeding of the 30th European Solid-State Circuits Conference, 2004. pp. 303-306, Sept. 2004.

12), , , Sphere DecodingMLD, , vol. J87-B, no.12, December 2004.

13)M. Wenk, M. Zellweger, A. Burg, N. Felber, W. Fichtner, K-best MIMO detection VLSI architectures achieving up to 424 Mbps, Proc. of ISCAS 2006, Page(s):4 pp. ? 1154

14)K. Higuchi, H. Kawai, N. Maeda, M. Sawahashi, Adaptive selection of surviving symbol replica candidates based on maximum reliability in QRM-MLD for OFCDM MIMO multiplexing, IEEE Proc. of GLOBECOM 2004, Volume 4,? 29 Nov.-3 Dec. 2004 Page(s):2480 - 2486 Vol.4

15)S. Wahyul, M. Kurosaki and H. Ochi, Design of 600 Mbps MIMO Wireless LAN System using GLST Coding and Its FPGA Implementation, 2009 IEEE Radio and Wireless Symposium, San Diego, Jan. 2009.

16)S. Haykin, M. Moher, Modern Wireless Communications, Prentice-Hall, 2005.

17)Mohinder Jankiraman, Space-time Codes And MIMO Systems, Artech House Universal Personal Communications, 2004.

18)W. A. Syafei, Y. Nagao, M. Kurosaki, B. Sai, and H. Ochi, Design of 1.2 Gbps MIMO WLAN for 4K Digital Cinema Transmission, The 20th IEEE Personal Indoor and Mobile Radio Communications Symposium (PIMRC) 2009, Tokyo, Japan, September 13-16, 2009.

10 - 5 - 3

3-6

[20091 ]

3-6-1

1970

3131

31

31

64128RSA

3-6-2

NN > 1

(1) Fully Loop UnrolledFLU

(2) Loop

(3) Pipeline

(1)Fully Loop UnrolledFLU

32NN

(2)Loop

33N11FLU1/N1FLU1/NNFLU1/NLoop2N/2

32FLU33Loop34Pipeline

(3)Pipeline

34FLUN11FLUFPGAPipeline

111FLUN

inner-Pipelineinner-PipelineLoop

32

FLU

Loop

Pipeline

1

1/N

1

1

1/N

1/N

1

1/N1

N

5

5

ECBCTR

LU1

1ISOECBElectronic CodeBookCBCCipher Block ChainOFBOutput FeedBackCFBCipher FeedBackCTR5CBCOFBCFBFLULoopPipelineECBCTR

ICTPM1LoopS-box

S-boxS-boxAESMISTYS-boxGF(2m)

1), 2), 3) S-boxGF(2m)

3-6-3

(1) GF(p)GF(2m)

(2)

(3) /2

(1)GF(2m) XORGF(p) GF(2m) RSA

(2)

(3)CPU4) CPU3-6-4

(1)

AB mod MA, B, MRSARSAY = XE mod M

1

0