link adaptation 기술throughput c/i qpsk, r=1/4 8psk, r=1/4 16qam, r=1/4 16qam, r=1/2 hull of amc...

Link Adaptation 기술

ThroughputThroughput

C/IC/I

QPSK, R=1/4QPSK, R=1/4

8PSK, R=1/48PSK, R=1/4

16QAM, R=1/416QAM, R=1/4

16QAM, R=1/216QAM, R=1/2

Hull of AMCHull of AMC

Adaptive Modulation/Coding transition, Adaptive Modulation/Coding transition, 8PSK->16QAM8PSK->16QAM

Homodyne Multi-Mode Radio Receiver

Programmable Channel Filter

I

Q

A/D Converter

GSM

802.11

UMTS

GSM

802.11

UMTS

LO

A SDR Baseband Solution

재구성 Video Encoding

• Motion Estimation: Most Computationally demanding part of Video Encoding

• Example: CCIR 601 format• 720 by 576 pixel• 16 by 16 macro block (n = 16)• 32 by 32 search area (p = 8)• 25 Hz Frame rate (f frame = 25)• 9 Giga Operations/Sec is needed for Full

Search Block Matching Algorithm.

Why Reconguration in Motion Estimation?

• 비디오의 특성 변화에 따른 탐색 면적을 조정

• 불필요한 연산 제거

Motion Vector Distributions

임베디드 프로세서 (ARM) 0.5 MOPS/mW

신호처리 프로세서ASIPs, DSPs

3 MOPS/mW

신호처리ASIC

가용성

에너

지 효

율(M

OP

S/m

W)

0.1

1

10

100

1000

200 MOPS/mW

10-80 MOPS/mW

6

FPFA

Energy-Flexibility GapFPFA(Field Programmable Function Array)

Sensor network design space

Wireless embedded systems

ME w/ MorphoSys

RC Array

•Array of reconfigurable cells: 64 cells in a 2-D matrix

•SIMD model•Same row(column) share configuration• Each RC operates on different data

Implementation & Performance

•0.35 micron tech. w/ 4 metal layers•Operation at 100MHz•170 mm2

Motion Estimation

Block size : 16x16 pixel, Image size : 352x288 pixel

Field Programmable Function Array

ALU ALU ALU ALU ALU

M M M M M M M M M M Memory

CrossBar

Registers

ALUs

• Processor tiles– This structure is convenient for the Fast Fourier

Transform(6-input,4-output) and the Finite impulse response

Mapping of DSP Alg.’s on FPFA• Five-tap finite-impulse response filter

병렬처리를 이용한 저전력 설계

Road Map to MP-SoC

• mask NRE: Over 1M$, design NRE:10M$ to 75M$– ASICs replaced by programmable ASSP, FPGA’s

• number of embedded processors– DVD/STB/HDTV, mobile phones: 5 to 8– Image proc, networking, basestation: 8 to

100+• Enabled and compelled by Moore’s Law

ITRS: 2009, 90nm process, 100M gates = 2500 ARM7 cores

1~8 2~6

병렬 구조 탐색

GP O/SThread-LevelParallelism

Instruction-Level

Parallelism

1

10 000’sInstructions

Min parallel grain size (instrns.)

Exploitable parallelism

1~100

Thread-Level

Parallelism

100’s

기존기술의 문제점

▷ 같은 ( 호모지니어스 ) 프로세서를 여러 개 사용하는 것은 자원 유용도가 낮아서 리니어로 전력량이 늘어날 수 있다 .

▷ 프로세서가 와이어와 메모리 지연시간에 의해서 제약된다 .

▷ 온 칩 인터콘넥션의 설계 : 타스크 매핑 ,IPC 선택 ( 파이프 , FIFO, 메시지 대기열 , 차단 표시기 , 공유 메모리 등 )

EV8 is 80 times bigger but provides only two to three times more single-threaded performance

병렬 구조 비교 분석

HP : 이종 멀티프로세서 코어

Multi-ISA multicore architecture 는 다른 ISA 를 가진 프로세서들로 구성되며 vector/data-level parallellism, instruction level parallelism 을 동시에 처리 가능하도록 설계되었다 .

이종 멀티프로세서의 우수 가능성

NEC MP211: Homogeneous MP core

• Asymmetric mp with very coarse grain multitasking• 3 ARM9’s utilized as predefined function units• NO complex overhead : e.g. no cache coherency, dynamic

scheduling/load balancing

Bus and Memory Architecture

모바일 응용 프로세서 MP211

Power Distribution Block Diagram

인텔 제온 프로세서

Deep-submicron 저전력 설계

디바이스 스케일링에 따른 기술 로드맵

ADVANTAGES 소형 높은 클럭 주파수

DISADVANTAGES 높은 전력 소모 낮은 신뢰도

Increased uncertainty with process scaling

• Process, voltage, temperature variations, noise, coupling• Affects design margin over design, power & performanc

e loss– Increased power constraints– Increasing leakage, power (density, delivery) limitatio

ns• More transistors mean:

– Larger clock distribution networks– Higher capacitance (more load and parasitics)

• With each new technology:– Gate delay decreases ~25%– Wire delay increases ~100%– Cross-chip communication increases

온칩 버스에서의 소모 에너지

NoC (Network on chip)응용 분야

• Turbo-Decoder UMTS compliant, 100Mbit: large flexibilty w/ 14 parallel units, area = 16.84 mm2 (14mm2 PUs, 2.8mm2 NoC)

• When, Univ. of Kaiserslautern: LPDC decoder: 500Mhz vs 64 Mhz (fixed bus), but 30W vs. 700mW, twice the die size.

– 1024 Bit block size, 1.2Gb/s, R=0.75 – NoC: 5x5 2D mesh, dimension-order routing, large flexibilit

y– 160nm CMOS Technology, 1.8V, 500 MHz, 110 mm2, ~30

Watt• SonicsMX: power-efficient mobile-handset w/ power manage

ment• STNoC, Spidergon: topology w/ degree 2-3

http://www.eit.uni-kl.de/wehn, EE Times,7/2005

MPSoC Clock and PowerOlivier Franza, Intel

Multiple clock domains

• Low skew and jitter ALWAYS a must• Clock modeling requires more accuracy• Within-die variations, inductance, crosstalk, electromigration, self-heat, …• Hierarchical clock partitioning• Reduce global clock and possibly relax its req

uirements

DEC/Compaq Alpha

Source: DEC/Compaq – Gronoski & al., JSSC 1998 – Xanthopoulos & al., ISSCC 2001 – Barroso & al., ISCA 2000

Clock and Power ConvergenceIntel® It

anium® Montecito

Reliable design, G. De Micheli

1. Manufacturing imperfections: More likely to happen as lithography scales down

2. Approximations during design: Uncertainty about details of design

3. Aging: Oxide breakdown,electromigration4. Environment-induced Soft-errors (Data corruption du

e external radiation exposure), electro-magnetic interference

5. Operating-mode induced: Extremely-low voltage supply

Adaptive low-power transmission scheme

Frédéric Worm, Patrick Thiran, Giovanni De Micheli, and Paolo Ienne. Self-calibrating Networks-on-Chip.In Proceedings of the IEEE International Symposium on Circuits and Systems, Kobe, Japan, May 2005.

Reduced Energy Consumption

DSPDSP

ASICASIC

GIRemoval

GIRemoval FFTFFT Phase

RotatorPhase

Rotator

CRCR

FineSTRFineSTR

ChannelEstimator/Equalizer


ViterbiFEC

ViterbiFEC

Coarse STR

Coarse STR

GI/FFTDetectorGI/FFTDetector

ADCADC

CPE CSI

TimingProcessorTiming

Processor

IF

RFSERSER

DemodDemod

NCONCO

DPAGCDP

AGC

GIRemoval

GIRemoval FFTFFT Phase

RotatorPhase

Rotator

CRCR

FineSTRFineSTR



ViterbiFEC

ViterbiFEC

Coarse STR

Coarse STR

GI/FFTDetectorGI/FFTDetector

ADCADC

CPE CSI

TimingProcessorTiming

Processor

IF

RFSERSER

DemodDemod

NCONCO

DPAGCDP

AGC

Key_add

Mux_1

Mux_2

Mix_Column

Byte_Sub

Shift_Low

Key_add

DIN_Reg

DOUT_Reg

Control

KeyGeneration

clksel_1

enb

sel_2

clk

enb

rst

Key

subKey

clk

rst

start sel_2

enb

sel_1

HOSTCPU

ADDRESS BUS(8BIT)

RESET

CS

RD

WR

CLK

DW

CryptoProcessor

DATA BUS(32BIT)

DATA BUS(32BIT)

C o e ff ic ie n tU p d a te

C o n ju g a to r

E rro rC o n tro l

L e a rn in gC o n s ta n tC o n tro l

x

x *

y z

c

-5

0

5

10

15

20

25

30

35

40

Conventional FEQ Low-Power FEQ

Conventional FEQ

Low-Power FEQ

buffer

PE PE PE PE

comparator comparator comparator comparator

Control Generator

MemoryPDF

( )

Transition( )

1( )j tb w

ija

1( )i tw

( )i tw

search data buffer reference data buffer

addressgenerator

externalmemorysearch

data

clock generator

contorl signalgenerator

comparator

Motion Vector

comparator

c3_sum

c4_sum

comparator

comparator

modified PE

modified PE

modified PE

modified PE

modified PE

modified PE

modified PE

modified PE

modified PE

modified PE

modified PE

modified PE

shift registors

c2_sum

c1_sum

shift register

externalmemorycurrent

data

modified PE

modified PE

modified PE

modified PE

Low-Power Equalizer for xDSL21% 전력 감소 , SNR=40dBLow-Power Equalizer for xDSL21% 전력 감소 , SNR=40dB

Fast and Low Power Viterbi Search Engine using Inverse Hidden Markov Model68% 전력 감소 , 71% 속도개선 , 1.9 배면적증가삼성 휴먼 테크 우수논문상 , ‘02

Fast and Low Power Viterbi Search Engine using Inverse Hidden Markov Model68% 전력 감소 , 71% 속도개선 , 1.9 배면적증가삼성 휴먼 테크 우수논문상 , ‘02

Maximizing Memory Data Reuse for Lower Power Motion Estimation33% 전력 감소 , 52Mhz 2.1 배 면적증가(SCI 논문 )

Maximizing Memory Data Reuse for Lower Power Motion Estimation33% 전력 감소 , 52Mhz 2.1 배 면적증가(SCI 논문 )

IS-95 기반 CDMA 의 Double Dwell Searcher 저전력 및 co-design 설계 67% 전력 감소 , 41% 면적감소

IS-95 기반 CDMA 의 Double Dwell Searcher 저전력 및 co-design 설계 67% 전력 감소 , 41% 면적감소

OFDM-based high-speed wireless LAN platform20.7Mhz, 237000 gates

OFDM-based high-speed wireless LAN platform20.7Mhz, 237000 gates

스마트 카드용 차세대 저전력 보안 프로세서 칩 설계ECC, Rijndael, DES, SHA

스마트 카드용 차세대 저전력 보안 프로세서 칩 설계ECC, Rijndael, DES, SHA

High-Flexible Design of OFDM Tranceiver for DVB-T

High-Flexible Design of OFDM Tranceiver for DVB-T

VADA Lab’s VADA Lab’s 저전력 저전력 IP’s IP’s (~2003)(~2003)

감사합니다 .

link adaptation 기술throughput c/i qpsk, r=1/4 8psk, r=1/4 16qam, r=1/4 16qam, r=1/2 hull of amc...

Documents