sora: high performance software radio using general ... · sora: high performance software radio...

Post on 12-Mar-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Sora: High Performance Software Radio using General Purpose Multi-

Core Processors

Kun Tan† Jiansong Zhang† Ji Fang‡ He Liu§ Yusheng Ye§

Shen Wang§ Yongguang Zhang† Haitao Wu† Wei Wang†

Geoffrey M. Voelker◊

† Microsoft Research Asia‡ Tsinghua University, Beijing, China

§ Beijing Jiaotong University, Beijing, China ◊ UCSD, La Jolla, USA

NSDI 2009, Boston, USA 1

Fundamental Challenges

• Large volume of high-fidelity digital signals

– Require a high-speed system I/O

NSDI 2009, Boston, USA 3

Antenna

Processor

SoftwareHardwareDigital

Samples

D/A

A/D

RF

Frontend

1.2Gbps for 802.11 (20MHz channel, 16b A/D, 4x)

~up to 5 Gbps for 11n (4x4MIMO) ;

Over 10Gbps for future high-speed wireless

Fundamental Challenges

• Large volume of high-fidelity digital signals

– Require a high-speed system I/O

• Computation-intensive signal processing

NSDI 2009, Boston, USA 4

InterleavingConvolutional

encoderQAM Mod IFFT GI Addition

Symbol Wave

ShapingScramble

To RF

Demod +

InterleavingFFT

Viterbi

decodingRemove GI

From RF

Descramble

Transmitter:

Receiver:

Bits

@48MbpsBits

@48Mbps

Samples

@512Mbps

Samples

@1.28Gbps

Samples

@640Mbps

Decimation

Samples

@384MbpsBits

@24Mbps

Samples

@1.28Gbps

Samples

@640Mbps

Samples

@512Mbps

Samples

@384MbpsBits

@48Mbps

Bits

@24Mbps

Bits

@24Mbps

From MAC

To MAC

Bits

@24Mbps

Fundamental Challenges

• Large volume of high-fidelity digital signals

– Require a high-speed system I/O

• Computation-intensive signal processing

NSDI 2009, Boston, USA 5

InterleavingConvolutional

encoderQAM Mod IFFT GI Addition

Symbol Wave

ShapingScramble

To RF

Demod +

InterleavingFFT

Viterbi

decodingRemove GI

From RF

Descramble

Transmitter:

Receiver:

Bits

@48MbpsBits

@48Mbps

Samples

@512Mbps

Samples

@1.28Gbps

Samples

@640Mbps

Decimation

Samples

@384MbpsBits

@24Mbps

Samples

@1.28Gbps

Samples

@640Mbps

Samples

@512Mbps

Samples

@384MbpsBits

@48Mbps

Bits

@24Mbps

Bits

@24Mbps

From MAC

To MAC

Bits

@24Mbps

Raw computation power required: 802.11b => 10Gops, 802.11a => 40Gops!(now server-class CPU runs at 3GHz clock)

Fundamental Challenges

• Large volume of high-fidelity digital signals

– Require a high-speed system I/O

• Computation-intensive signal processing

• Hard deadline and accurate timing control

– 802.11 MAC requires response within a few s

– Event trigger timing accuracy at s level

NSDI 2009, Boston, USA 6

Approaches

NSDI 2009, Boston, USA 7

Programmability

Low High

Low

Hig

h

Perf

orm

ance

EmbeddedDSP

Example: Rice WARP, TI SFF-SDR

Programmable hardware

(FPGA)

Example: GNU Radio/USRP(v1&2)• Interface USB/GbE: <1Gbps, >1ms• Achievable wireless xput: ~100Kbps

Low-performanceGPP-based SDR

SoraSora

Resolving the SDR platform dilemma• Commodity PC w/ C program• High performance

• sys tput:10Gbps; ~s latency • target wireless xput:10M~1Gbps

Sora Approach

• New PCIe-based Interface card => high system throughput

• New optimizations to implement PHY algorithms and streamline processing on multi-core CPU=> efficient PHY processing

• Core dedication => real-time support

NSDI 2009, Boston, USA 8

MemRF

RFRF

Sora

APP

Multi-core CPU

Sora Soft-Radio Stack

PCIe bus

Digital Samples

@Multiple Gbps

RCBA/D

D/A RFSora

APP

APP

APP

APP

APP

Sora Hardware

Sora Architecture

NSDI 2009, Boston, USA9

General radio front-end: 700M/1.8G/2.4G/5GHz

MemRF

RFRF

Sora

APP

Multi-core CPU

Sora Soft-Radio Stack

PCIe bus

Digital Samples

@Multiple Gbps

RCBA/D

D/A RFSora

APP

APP

APP

APP

APP

Sora Hardware

Radio Control Board

NSDI 2009, Boston, USA10

PCIe-based High-speed Interface card

PCIe is commodity in most modern PCs High throughput: 16Gbps at PCIe-8x Low latency: ~ 1 s Separated with other I/O devices

RCB Details

NSDI 2009, Boston, USA 11

PCIe-8x interface: up to 16Gbps throughput

Versatile RF interface: up to 8 channels (8x8 MIMO)

Versatile RF interface: up to 8 channels (8x8 MIMO)

RCB Details

NSDI 2009, Boston, USA 12

s

Buffered data path: bridging the synchronous ops at RF and asynchronous processing at CPU (12.3Gbps measured )

Low latency control path for software (0.36 s measured)

A/D

D/ARF Circuit

RF Front-endPCIE

Controller SDRAM

Controller

FIFO

FIFODMA

Controller

DDR

SDRAM

FPGA

RCB

PCIe

bus

Antenna

RF

Controller

Registers

MemRF

RFRF

Sora

APP

Multi-core CPU

Sora Soft-Radio Stack

PCIe bus

Digital Samples

@Multiple Gbps

RCBA/D

D/A RFSora

APP

APP

APP

APP

APP

Sora Hardware

Sora Software

13

High-performance SDR processing w/ key software techniques

Efficient PHY implementation using SIMD and LUTs Speed up PHY using multi-core streamline processing Core dedication for real-time support

NSDI 2009, Boston, USA

Efficient PHY Implementation

• Exploit large high-speed cache memory– Extensive use of lookup tables (LUT): trade memory

for calculation; still well fit into L2 cache

– Applicable for more than half of the common algorithms; speedup ranges from 1.5x to 22x

NSDI 2009, Boston, USA 14

Direct impl. 8 ops per bit

LUT impl. 2 Look-up op for 8 bits!(size 32KB)

Ex: Convolutional encoder+

+

Tb Tb Tb Tb Tb Tb

Output Data A

Output Data B

Efficient PHY Implementation

• Exploit data parallelism in PHY

– Utilize wide-vector SIMD extension in CPU

– Applicable to many PHY algorithms with significant speedups (1.6x ~ 50x)

NSDI 2009, Boston, USA 15

Ex. (I)FFT

Core 2Core 1

Speed up PHY using multi-core streamline processing

• Efficiently partition and schedule the PHY processing across cores– Interconnecting sub-pipeline with light-weight,

synchronized FIFOs

– Static scheduling of processing modules in PHY pipeline

NSDI 2009, Boston, USA 16

Demod +

InterleavingFFT

Viterbi

decodingRemove GI DescrambleDecimation

Synchronized FIFO

Core Dedication for Real-time Support

• Exclusively allocate enough cores for SDR processing in multi-core systems

– Guarantee the CPU, cache and memory bandwidth resources for predictable performance

– Achieve s-level timing control

– Simple abstraction, and easier to implement in standard OSes than RT-scheduler

• Implemented in WinXP without modifications to Kernel

NSDI 2009, Boston, USA 17

Implementation

• Sora software platform on Win XP

– 14K lines of C code, including PCIe driver framework, memory management, FIFO management, etc

• SoftWiFi: full implementation of IEEE 802.11a/b/g PHY and DCF MAC

– 9K lines of C code; 4 man-month for dev & test

– DSSS 1, 2, 5.5, 11Mbps for 11b; OFDM 6, 9, 12, 18, 24, 36, 48, 54Mbps for 11a/g

NSDI 2009, Boston, USA 18

0

1

2

3

4

5

6

7

8

9

10

1M 2M 5.5M 11M 6M 24M 54M

Re

qu

ire

d c

om

pu

tati

on

(Gig

a cy

cle

s p

er

seco

nd

)

0

1

2

3

4

5

6

7

8

9

10

1M 2M 5.5M 11M 6M 24M 54M

Re

qu

ire

d c

om

pu

tati

on

(Gig

a cy

cle

s p

er

seco

nd

)

Results: PHY Processing

11.6 11.7 11.7 11.818.3 60.4 132.4

NSDI 2009, Boston, USA 19

>30

x s

peedup

~10

x s

peedup

802.11b 802.11a/g 802.11b 802.11a/g

After Sora Optimization

0

1

2

3

4

5

6

7

8

9

10

1M 2M 5.5M 11M 6M 24M 54M

Re

qu

ire

d c

om

pu

tati

on

(Gig

a cy

cle

s p

er

seco

nd

)

0

1

2

3

4

5

6

7

8

9

10

1M 2M 5.5M 11M 6M 24M 54M

Re

qu

ire

d c

om

pu

tati

on

(Gig

a cy

cle

s p

er

seco

nd

)

Results: PHY Processing

11.6 11.7 11.7 11.818.3 60.4 132.4

NSDI 2009, Boston, USA 20

>30

x s

peedup

~10

x s

peedup

802.11b 802.11a/g 802.11b 802.11a/g

Sora enables software implementation of today’s high-speed wireless system in

standard PC with a few cores

After Sora Optimization

0

5

10

15

20

25

1M 2M 5.5M 11M 6M 24M 54M

Th

rou

gh

pu

t (M

bp

s)

Modulation Mode

Sora-Commercial

Commercial-Commercial

Commercial-Sora

Results: End-to-end Throughput

NSDI 2009, Boston, USA 21

Communicating with commercial 802.11a/b/g card

0

5

10

15

20

25

1M 2M 5.5M 11M 6M 24M 54M

Th

rou

gh

pu

t (M

bp

s)

Modulation Mode

Sora-Commercial

Commercial-Commercial

Commercial-Sora

Results: End-to-end Throughput

NSDI 2009, Boston, USA 22

Communicating with commercial 802.11a/b/g card

Seamlessly interoperate with commercial WiFi• Correctness of all PHY algorithms• Satisfying timing requirements of standards• Commercial equivalent performance

Extensions

NSDI 2009, Boston, USA 23

Jumbo frames in 802.11 TDMA MAC

Extensions: New Applications

NSDI 2009, Boston, USA 24

Conclusion

• Sora is a fully programmable software radio platform on commodity PC architecture– Easy C programming on multi-core CPU

– High performance: high processing speed, low latency, and performance guarantee• Confirmed by SoftWiFi, the first fully interoperable IEEE

802.11 (PHY and MAC) on general purpose processors

• Plan to release Sora SDK to research community– H/W: RCB + 2.4G RF front-end set (~$2K USD)

NSDI 2009, Boston, USA 25

Q&A

NSDI 2009, Boston, USA 27

top related