link adaptation 기술throughput c/i qpsk, r=1/4 8psk, r=1/4 16qam, r=1/4 16qam, r=1/2 hull of amc...
TRANSCRIPT
Link Adaptation 기술
ThroughputThroughput
C/IC/I
QPSK, R=1/4QPSK, R=1/4
8PSK, R=1/48PSK, R=1/4
16QAM, R=1/416QAM, R=1/4
16QAM, R=1/216QAM, R=1/2
Hull of AMCHull of AMC
Adaptive Modulation/Coding transition, Adaptive Modulation/Coding transition, 8PSK->16QAM8PSK->16QAM
Homodyne Multi-Mode Radio Receiver
Programmable Channel Filter
I
Q
A/D Converter
GSM
802.11
UMTS
GSM
802.11
UMTS
LO
A SDR Baseband Solution
재구성 Video Encoding
• Motion Estimation: Most Computationally demanding part of Video Encoding
• Example: CCIR 601 format• 720 by 576 pixel• 16 by 16 macro block (n = 16)• 32 by 32 search area (p = 8)• 25 Hz Frame rate (f frame = 25)• 9 Giga Operations/Sec is needed for Full
Search Block Matching Algorithm.
Why Reconguration in Motion Estimation?
• 비디오의 특성 변화에 따른 탐색 면적을 조정
• 불필요한 연산 제거
Motion Vector Distributions
임베디드 프로세서 (ARM) 0.5 MOPS/mW
신호처리 프로세서ASIPs, DSPs
3 MOPS/mW
신호처리ASIC
가용성
에너
지 효
율(M
OP
S/m
W)
0.1
1
10
100
1000
200 MOPS/mW
10-80 MOPS/mW
6
FPFA
Energy-Flexibility GapFPFA(Field Programmable Function Array)
Sensor network design space
Wireless embedded systems
ME w/ MorphoSys
RC Array
•Array of reconfigurable cells: 64 cells in a 2-D matrix
•SIMD model•Same row(column) share configuration• Each RC operates on different data
Implementation & Performance
•0.35 micron tech. w/ 4 metal layers•Operation at 100MHz•170 mm2
Motion Estimation
Block size : 16x16 pixel, Image size : 352x288 pixel
Field Programmable Function Array
ALU ALU ALU ALU ALU
M M M M M M M M M M Memory
CrossBar
Registers
ALUs
• Processor tiles– This structure is convenient for the Fast Fourier
Transform(6-input,4-output) and the Finite impulse response
Mapping of DSP Alg.’s on FPFA• Five-tap finite-impulse response filter
병렬처리를 이용한 저전력 설계
Road Map to MP-SoC
• mask NRE: Over 1M$, design NRE:10M$ to 75M$– ASICs replaced by programmable ASSP, FPGA’s
• number of embedded processors– DVD/STB/HDTV, mobile phones: 5 to 8– Image proc, networking, basestation: 8 to
100+• Enabled and compelled by Moore’s Law
ITRS: 2009, 90nm process, 100M gates = 2500 ARM7 cores
1~8 2~6
병렬 구조 탐색
GP O/SThread-LevelParallelism
Instruction-Level
Parallelism
1
10 000’sInstructions
Min parallel grain size (instrns.)
Exploitable parallelism
1~100
Thread-Level
Parallelism
100’s
기존기술의 문제점
▷ 같은 ( 호모지니어스 ) 프로세서를 여러 개 사용하는 것은 자원 유용도가 낮아서 리니어로 전력량이 늘어날 수 있다 .
▷ 프로세서가 와이어와 메모리 지연시간에 의해서 제약된다 .
▷ 온 칩 인터콘넥션의 설계 : 타스크 매핑 ,IPC 선택 ( 파이프 , FIFO, 메시지 대기열 , 차단 표시기 , 공유 메모리 등 )
EV8 is 80 times bigger but provides only two to three times more single-threaded performance
병렬 구조 비교 분석
HP : 이종 멀티프로세서 코어
Multi-ISA multicore architecture 는 다른 ISA 를 가진 프로세서들로 구성되며 vector/data-level parallellism, instruction level parallelism 을 동시에 처리 가능하도록 설계되었다 .
이종 멀티프로세서의 우수 가능성
NEC MP211: Homogeneous MP core
• Asymmetric mp with very coarse grain multitasking• 3 ARM9’s utilized as predefined function units• NO complex overhead : e.g. no cache coherency, dynamic
scheduling/load balancing
Bus and Memory Architecture
모바일 응용 프로세서 MP211
Power Distribution Block Diagram
인텔 제온 프로세서
Deep-submicron 저전력 설계
디바이스 스케일링에 따른 기술 로드맵
ADVANTAGES 소형 높은 클럭 주파수
DISADVANTAGES 높은 전력 소모 낮은 신뢰도
Increased uncertainty with process scaling
• Process, voltage, temperature variations, noise, coupling• Affects design margin over design, power & performanc
e loss– Increased power constraints– Increasing leakage, power (density, delivery) limitatio
ns• More transistors mean:
– Larger clock distribution networks– Higher capacitance (more load and parasitics)
• With each new technology:– Gate delay decreases ~25%– Wire delay increases ~100%– Cross-chip communication increases
온칩 버스에서의 소모 에너지
NoC (Network on chip)응용 분야
• Turbo-Decoder UMTS compliant, 100Mbit: large flexibilty w/ 14 parallel units, area = 16.84 mm2 (14mm2 PUs, 2.8mm2 NoC)
• When, Univ. of Kaiserslautern: LPDC decoder: 500Mhz vs 64 Mhz (fixed bus), but 30W vs. 700mW, twice the die size.
– 1024 Bit block size, 1.2Gb/s, R=0.75 – NoC: 5x5 2D mesh, dimension-order routing, large flexibilit
y– 160nm CMOS Technology, 1.8V, 500 MHz, 110 mm2, ~30
Watt• SonicsMX: power-efficient mobile-handset w/ power manage
ment• STNoC, Spidergon: topology w/ degree 2-3
http://www.eit.uni-kl.de/wehn, EE Times,7/2005
MPSoC Clock and PowerOlivier Franza, Intel
Multiple clock domains
• Low skew and jitter ALWAYS a must• Clock modeling requires more accuracy• Within-die variations, inductance, crosstalk, electromigration, self-heat, …• Hierarchical clock partitioning• Reduce global clock and possibly relax its req
uirements
DEC/Compaq Alpha
Source: DEC/Compaq – Gronoski & al., JSSC 1998 – Xanthopoulos & al., ISSCC 2001 – Barroso & al., ISCA 2000
Clock and Power ConvergenceIntel® It
anium® Montecito
Reliable design, G. De Micheli
1. Manufacturing imperfections: More likely to happen as lithography scales down
2. Approximations during design: Uncertainty about details of design
3. Aging: Oxide breakdown,electromigration4. Environment-induced Soft-errors (Data corruption du
e external radiation exposure), electro-magnetic interference
5. Operating-mode induced: Extremely-low voltage supply
Adaptive low-power transmission scheme
Frédéric Worm, Patrick Thiran, Giovanni De Micheli, and Paolo Ienne. Self-calibrating Networks-on-Chip.In Proceedings of the IEEE International Symposium on Circuits and Systems, Kobe, Japan, May 2005.
Reduced Energy Consumption
DSPDSP
ASICASIC
GIRemoval
GIRemoval FFTFFT Phase
RotatorPhase
Rotator
CRCR
FineSTRFineSTR
ChannelEstimator/Equalizer
ChannelEstimator/Equalizer
ViterbiFEC
ViterbiFEC
Coarse STR
Coarse STR
GI/FFTDetectorGI/FFTDetector
ADCADC
CPE CSI
TimingProcessorTiming
Processor
IF
RFSERSER
DemodDemod
NCONCO
DPAGCDP
AGC
GIRemoval
GIRemoval FFTFFT Phase
RotatorPhase
Rotator
CRCR
FineSTRFineSTR
ChannelEstimator/Equalizer
ChannelEstimator/Equalizer
ViterbiFEC
ViterbiFEC
Coarse STR
Coarse STR
GI/FFTDetectorGI/FFTDetector
ADCADC
CPE CSI
TimingProcessorTiming
Processor
IF
RFSERSER
DemodDemod
NCONCO
DPAGCDP
AGC
Key_add
Mux_1
Mux_2
Mix_Column
Byte_Sub
Shift_Low
Key_add
DIN_Reg
DOUT_Reg
Control
KeyGeneration
clksel_1
enb
sel_2
clk
enb
rst
Key
subKey
clk
rst
start sel_2
enb
sel_1
HOSTCPU
ADDRESS BUS(8BIT)
RESET
CS
RD
WR
CLK
DW
CryptoProcessor
DATA BUS(32BIT)
DATA BUS(32BIT)
C o e ff ic ie n tU p d a te
C o n ju g a to r
E rro rC o n tro l
L e a rn in gC o n s ta n tC o n tro l
x
x *
y z
c
-5
0
5
10
15
20
25
30
35
40
Conventional FEQ Low-Power FEQ
Conventional FEQ
Low-Power FEQ
buffer
PE PE PE PE
comparator comparator comparator comparator
Control Generator
MemoryPDF
( )
Transition( )
1( )j tb w
ija
1( )i tw
( )i tw
search data buffer reference data buffer
addressgenerator
externalmemorysearch
data
clock generator
contorl signalgenerator
comparator
Motion Vector
comparator
c3_sum
c4_sum
comparator
comparator
modified PE
modified PE
modified PE
modified PE
modified PE
modified PE
modified PE
modified PE
modified PE
modified PE
modified PE
modified PE
shift registors
c2_sum
c1_sum
shift register
externalmemorycurrent
data
modified PE
modified PE
modified PE
modified PE
Low-Power Equalizer for xDSL21% 전력 감소 , SNR=40dBLow-Power Equalizer for xDSL21% 전력 감소 , SNR=40dB
Fast and Low Power Viterbi Search Engine using Inverse Hidden Markov Model68% 전력 감소 , 71% 속도개선 , 1.9 배면적증가삼성 휴먼 테크 우수논문상 , ‘02
Fast and Low Power Viterbi Search Engine using Inverse Hidden Markov Model68% 전력 감소 , 71% 속도개선 , 1.9 배면적증가삼성 휴먼 테크 우수논문상 , ‘02
Maximizing Memory Data Reuse for Lower Power Motion Estimation33% 전력 감소 , 52Mhz 2.1 배 면적증가(SCI 논문 )
Maximizing Memory Data Reuse for Lower Power Motion Estimation33% 전력 감소 , 52Mhz 2.1 배 면적증가(SCI 논문 )
IS-95 기반 CDMA 의 Double Dwell Searcher 저전력 및 co-design 설계 67% 전력 감소 , 41% 면적감소
IS-95 기반 CDMA 의 Double Dwell Searcher 저전력 및 co-design 설계 67% 전력 감소 , 41% 면적감소
OFDM-based high-speed wireless LAN platform20.7Mhz, 237000 gates
OFDM-based high-speed wireless LAN platform20.7Mhz, 237000 gates
스마트 카드용 차세대 저전력 보안 프로세서 칩 설계ECC, Rijndael, DES, SHA
스마트 카드용 차세대 저전력 보안 프로세서 칩 설계ECC, Rijndael, DES, SHA
High-Flexible Design of OFDM Tranceiver for DVB-T
High-Flexible Design of OFDM Tranceiver for DVB-T
VADA Lab’s VADA Lab’s 저전력 저전력 IP’s IP’s (~2003)(~2003)
감사합니다 .