a system solution for high- performance, low power sdr yuan lin 1, hyunseok lee 1, yoav harel 1,...

30
A System Solution for High- Performance, Low Power SDR Yuan Lin 1 , Hyunseok Lee 1 , Yoav Harel 1 , Mark Woh 1 , Scott Mahlke 1 , Trevor Mudge 1 and Krisztian Flautner 2 1 Advanced Computer Architecture Laboratory University of Michigan 2 ARM, Ltd.

Post on 20-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

A System Solution for High- Performance, Low Power SDR

Yuan Lin1, Hyunseok Lee1, Yoav Harel1, Mark Woh1,

Scott Mahlke1, Trevor Mudge1 and Krisztian Flautner2

1Advanced Computer Architecture Laboratory

University of Michigan2ARM, Ltd.

Page 2: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

2

SDR Design Challenges:

Hardware design challenges High computational throughput (~40 Gops) Low power consumption (~200mW) Meet real-time requirements

DSP programming support System-level development

Inter-algorithm communication Algorithm-level development

Efficient DSP representations

Page 3: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

SDR Benchmark Design & Analysis

Page 4: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

4

W-CDMA Protocol: 2Mbps

Frontend

LPF-Tx scrambler spreader InterleaverChannelencoder

LPF-Rx

searcher

descrambler despreader combiner

descrambler despreader

...

modulator

demodulator

deinteleaverChanneldecoder

(turbo/viterbi)

Upper layersTransmitter

Receiver

Page 5: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

5

W-CDMA Characteristics

Plenty of vector parallelism 8 & 16-bit DSP algorithms Multiplication is not dominant No floating-point operation Small instruction/data memory Has periodic real-time tasks

Page 6: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

6

802.11a Protocol: 24Mbps

Frontend

LPF-TxQAM

MappingInterleaver

Channelencoder

LPF-Rx

FFT

Frequency, TimeSynchronization

modulator

demodulator

deinteleaverChanneldecoder(viterbi)

Upper layersTransmitter

Receiver

IFFTPreambleInsertion

ChannelEstimation

FrequencyEqualization

QAMdemapping

Page 7: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

7

802.11a Characteristics

Similar to W-CDMA Plenty of vector parallelism No floating-point operation Small instruction/data memory

Different from W-CDMA Mostly 16-bit DSP algorithms Multiplication is more dominant No periodic real-time tasks

Page 8: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

SDR Processor Architecture Design

Page 9: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

9

System Architecture Design Tradeoffs

Fine grainprocessors

Inter-processorNetwork

PE

Coarse grainprocessors

Uniprocessoralu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

PEalu alu alu alu

PEalu alu alu alu

PEalu alu alu alu

PEalu alu alu alu

Inter-processorNetwork

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

Amortized fetch

Intra-processorCommunication

Inter-processor Communication

Page 10: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

10

System Architecture Design Tradeoffs

128x1

64x2

32x4

16x8

8x16

4x32 2x64 1x1281x256

(50% Utilization)

0

50

100

150

200

250

300

0 50 100 150 200 250 300

SIMD width (arithmetic units)

MO

PS

/mW

Intra-processor shuffle networkcritical delay path

ALU dominated critical delay path

critical delay path

Number of Processing Elements x SIMD width For W-CDMA 2Mbps (51.2GOP/sec) 90nm 1V @400MHz

Fine grain processors

Inter-processorNetwork

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PEalu

PE

Uniprocessor

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

alu

Inter-processorNetwork

Coarse grain processorsPE

alu alu alu alu

PEalu alu alu alu

PEalu alu alu alu

PEalu alu alu alu

Page 11: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

12

System Architecture Design

Interconnect

GlobalMemory

SIMDMemory

SIMD ALU

SIMD RF

ScalarMemory

ScalarALU

ScalarRF

SoCInterface

ScalarPipeline

SIMDPipeline

SoCInterface

GlobalMemory

SoCInterface

MEM MEM

Memory

ALU

SoCInterface

Controller

PE

SIMDMemory

SIMD ALU

SIMD RF

ScalarMemory

ScalarALU

ScalarRF

SoCInterface

ScalarPipeline

SIMDPipeline

PE

SIMDMemory

SIMD ALU

SIMD RF

ScalarMemory

ScalarALU

ScalarRF

SoCInterface

ScalarPipeline

SIMDPipeline

PE

Page 12: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

13

PE Design (Area < 1mm2 Power<50mW)

SIMDMemory

4 KB

ScalarMemory

4KB

16x8bitRegFile

16x8bitRegFile

16x8bitRegFile

16x8bitRegFile

DataShuffle

Network

8bit ALU

8bit ALU

8bit ALU

8bit ALU

16x16bitRegFile

16bit ALU

DMA

SIMD Unit

Scalar Unit

PE

8bit

8bit

8bit

8bit

8bit

8bit

8bit

8bit

8bit

8bit

8bit

8bit

32x8bit

16bit 16bit 16bit

16bit

Page 13: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

14

Mapping DSP Algorithms: Filters

Z-1 Z-1 Z-1

b0b1b2b3

In

Out

Out

Inb

z-1

z-1spread Vin, Sin

shift z, z, upmac z, Vin, Sin

Page 14: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

15

SIMDMemory

4 KB

ScalarMemory

4KB

16x8bitRegFile

16x8bitRegFile

16x8bitRegFile

16x8bitRegFile

DataShuffle

Network

16x16bitRegFile

DMA

SIMD Unit

Scalar Unit

PE

8bit

8bit

8bit

8bit

8bit

8bit

8bit

8bit

8bit

8bit

8bit

8bit

32x8bit

16bit 16bit 16bit

16bit

8bit ALU

8bit ALU

8bit ALU

8bit ALU

16bit ALU

Mapping DSP Algorithm: Filter

spread Vin, Sin

shift z, z, upmac z, Vin, Sin

Page 15: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

16

Efficient Design

Wide SIMD width Small register file with minimum ports Small memories Narrow system BUS Data-path optimized for 8bits Vector shuffle reduce memory ports

Page 16: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

18

2 LPF-Rx182 MopsMisc. Control

1 Mops

Searcher200 Mops

Deinterleaver16 Mops

PowerControl15 Kops

PN CodeTX/RX

23 Mops

Turbo Decoder324 Mops

Buffer(1360 Bytes)

Buffer(1280 Bytes) Buffer

(2560 Bytes)FIFO Queue(12.5 KBytes)

Buffer(10 Bytes)

Buffer(20 KBytes)

Buffer(20 KBytes)

Buffer(1024 Bytes)

ARM PE PE PE PEGlobal

Memory

Buffer(1024 Bytes)

WCDMA Receiver WCDMA Transmitter

4 LPF-Rx307 Mops

Scrambler8.6 Mops

Spreader4.7 Mops

Turbo Encoder2 Mops

Interleaver2 Mops

Descrambler22.5 Mops

Despreader11.3 Mops

Combiner3 Mops

Frontend

LPF-Tx scrambler spreader InterleaverChannelencoder

LPF-Rx

searcher

descrambler despreader combiner

descrambler despreader

...

modulator

demodulator

deinteleaverChanneldecoder

(turbo/viterbi)

Upper layersTransmitter

Receiver

Channeldecoder

(turbo/viterbi)

4 LPF-Rx307 Mops

Scrambler8.6 Mops

Spreader4.7 Mops

Turbo Encoder2 Mops

Interleaver2 Mops

Turbo Decoder324 Mops

Searcher200 Mops

2 LPF-Rx182 Mops

Descrambler22.5 Mops

Despreader11.3 Mops

Combiner3 Mops

PowerControl15 Kops

Deinterleaver16 Mops

PN CodeTX/RX

23 Mops

Misc. Control1 Mops

descrambler despreader combiner

descrambler despreader

...LPF-Rx deinteleaver

searcher

Channelencoder

InterleaverspreaderscramblerLPF-Tx

Buffer(1280 Bytes)

Buffer(1360 Bytes)

Buffer(2560 Bytes)

Buffer(1024 Bytes) FIFO Queue

(12.5 KBytes)

Buffer(20 KBytes)

Buffer(20 KBytes)Buffer

(10 Bytes)

Buffer(1024 Bytes)

Page 17: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

19

2 LPF-Rx182 MopsMisc. Control

1 Mops

Searcher200 Mops

Deinterleaver16 Mops

PowerControl15 Kops

PN CodeTX/RX

23 Mops

Turbo Decoder324 Mops

Buffer(1360 Bytes)

Buffer(1280 Bytes) Buffer

(2560 Bytes)FIFO Queue(12.5 KBytes)

Buffer(10 Bytes)

Buffer(20 KBytes)

Buffer(20 KBytes)

Buffer(1024 Bytes)

ARM PE PE PE PEGlobal

Memory

Buffer(1024 Bytes)

WCDMA Receiver WCDMA Transmitter

4 LPF-Rx307 Mops

Scrambler8.6 Mops

Spreader4.7 Mops

Turbo Encoder2 Mops

Interleaver2 Mops

Descrambler22.5 Mops

Despreader11.3 Mops

Combiner3 Mops

Frontend

LPF-Tx scrambler spreader InterleaverChannelencoder

LPF-Rx

searcher

descrambler despreader combiner

descrambler despreader

...

modulator

demodulator

deinteleaverChanneldecoder

(turbo/viterbi)

Upper layersTransmitter

Receiver

Channelencoder

InterleaverspreaderscramblerLPF-Tx

Channeldecoder

(turbo/viterbi)deinteleaver

searcher

descrambler despreader combiner

descrambler despreader

...LPF-Rx

4 LPF-Rx307 Mops

Scrambler8.6 Mops

Spreader4.7 Mops

Turbo Encoder2 Mops

Interleaver2 Mops

Turbo Decoder324 Mops

Searcher200 Mops

2 LPF-Rx182 Mops

Descrambler22.5 Mops

Despreader11.3 Mops

Combiner3 Mops

PowerControl15 Kops

Deinterleaver16 Mops

PN CodeTX/RX

23 Mops

Misc. Control1 Mops

MACFrontend A/D

MAC

Frontend D/A

Buffer(1280 Bytes)

Buffer(1360 Bytes)

Buffer(2560 Bytes)

Buffer(1024 Bytes) FIFO Queue

(12.5 KBytes)

Buffer(20 KBytes)

Buffer(20 KBytes)Buffer

(10 Bytes)

Buffer(1024 Bytes)

Page 18: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

20

802.11a PE Mapping

FIR (Rx)320 Mops

Misc. Control80 Mops

FFT120 Mops

Deinterleaver60 Mops

Buffer(2048 Bytes)

Buffer(2048 Bytes) Buffer

(30 KBytes)

Buffer(30 KBytes)

ARM PE PE PE PEGlobal

Memory

Interleaver60 Mops

Freq. Eq.120 Mops

QAM Demod2 Mops

FIR (Tx)320 Mops

Buffer(2048 Bytes)

Channel Enc.20 Mops

Buffer(22048 Bytes)

IFFT120 Mops

Buffer(1024 Bytes)

QAM Mod2 Mops

Viterbi Dec.398 Mops

Buffer(2048 Bytes)

Preamble Ins.50 Mops

Buffer(30 KBytes)

Buffer(30 KBytes)

Interplator250 Mops

Buffer(2048 Bytes)

Sync.20 Mops

Buffer(1024 Bytes)

802.11a Receiver

802.11a Transmitter

Page 19: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

21

Power Results

0

50

100

150

200

250

300

350

400

PE Mem

ory

PE Reg

ister

File

PE ALU

PE Oth

ersARM

Glob

al M

emor

y

Inte

r-pro

cess

or Bus

Oth

ersTota

l

Po

wer

(m

W)

WCDMA

802.11a

Configuration 4 PEs, 1 ARM (Cortex M3) controller Global scratchpad memory (64Kb) 90nm (1V @ 400 MHZ), Synthesized conservatively

Page 20: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

22

Area Results

0

0.5

1

1.5

2

2.5

3

3.5

mm

^2 (9

0nm

)

Page 21: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

SDR Programming Language Support

Page 22: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

24

Software Development Flow

Matlab-Simulink/CFloating Point

Algorithm Prototyping

TimingRequirements

C/SPEXFixed Point

Algorithm KernelImplementations

IO/ControlCode

C/SPEXSystem

Description

Timing/MachineIndependent

asm code

TimingRequirements

MemoryAccessPatterns

SystemDescriptionasm code

Controllerasmcode

PEasmcode

c

ProgrammerDefined

CompilerGenerated

Algorithm-levelDesign Flow

System-LevelDesign Flow

Page 23: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

25

SPEX (Signal Processing EXtension)

Implemented as a library extension to C System-level development

Support concurrent DSP kernel function definitions Channel variables for inter-kernel communications

Algorithm-level development Native vector & matrix variables Explicit DSP variable attribute definition Native vector & matrix operations

Page 24: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

26

SPEX Overview

SPEX

DSP variableextension

DSP kernelextension

Scalarobject

SIMDobject

DSP variableobjects

DSPvariable

attributes

DSP variable

operations

ModeBit-

widthType

Arith. &logicop.

Predi-cation

Permu-tation

Concurrentvariableobject

Concurrentvariable

operations

Kernelobject

Channelobject

Concurrentkernel

mangement

Channelcommunication

operations

Page 25: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

27

SPEX Example Code: Viterbi ACSConcurrent DSP kernel definitions

void* acs(void*) { /* variable declaration */ saturated char<64> metrics1, metrics2; saturated char<64> states; saturated char<64> t1, t2;

while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive();

/* add */ metrics1 += states; metrics2 += states;

/* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2;

/* sending data to TB */ acs_to_tb.send(states); }}

Page 26: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

28

SPEX Example Code: Viterbi ACS

Native SIMD variable definition with explicit attributes

SPEX variable supports1. saturated/overflow2. various variable bit-width3. vector & matrices

void* acs(void*) { /* variable declaration */ saturated char<64> metrics1, metrics2; saturated char<64> states; saturated char<64> t1, t2;

while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive();

/* add */ metrics1 += states; metrics2 += states;

/* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2;

/* sending data to TB */ acs_to_tb.send(states); }}

Page 27: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

29

SPEX Example Code: Viterbi ACS

Inter-kernel communication through channel operations

Channel types:1. FIFO queue2. Broadcast queue3. Sync/control channel4. Random-read FIFO queue

void* acs(void*) { /* variable declaration */ saturated char<64> metrics1, metrics2; saturated char<64> states; saturated char<64> t1, t2;

while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive();

/* add */ metrics1 += states; metrics2 += states;

/* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2;

/* sending data to TB */ acs_to_tb.send(states); }}

Page 28: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

30

SPEX Example Code: Viterbi ACS

SPEX vector operations Supports(Matlab-like C code)

1. SIMD arithmetic operations

2. SIMD permutation 3. SIMD predication

void* acs(void*) { /* variable declaration */ saturated char<64> metrics1, metrics2; saturated char<64> states; saturated char<64> t1, t2;

while (!viterbi.stop()) { /* receiving data from BMC */ metrics1 = bmc_to_acs.receive(); metrics2 = bmc_to_acs.receive();

/* add */ metrics1 += states; metrics2 += states;

/* compare and select */ t1 = (metrics1(0,2,62),metrics2(0,2,62)); t2 = (metrics1(1,2,63),metrics2(1,2,63)); states(t1<t2) = t1; states(t1>=t2) = t2;

/* sending data to TB */ acs_to_tb.send(states); }}

Page 29: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

31

Summary

- Hardware & software solutions for SDR- Hardware

- 4 dual-issue asymmetric SIMD processing elements- Consumes 200~300mW for 90nm- Meets the performance requirements for WCDMA &

802.11a

- Software- SPEX provides efficient DSP algorithm and system

implementation

Page 30: A System Solution for High- Performance, Low Power SDR Yuan Lin 1, Hyunseok Lee 1, Yoav Harel 1, Mark Woh 1, Scott Mahlke 1, Trevor Mudge 1 and Krisztian

Advanced Computer Architecture LaboratoryUniversity of Michigan

32

Questions?