gpu-accelerated sdr implementation of multi-user detector for satellite return links

29
GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links > Sino-German Workshop > Chen Tang > 03.2014 DLR.de Chart 1 Chen Tang Institute of Communication and Navigation German Aerospace Center

Upload: tawny

Post on 25-Feb-2016

44 views

Category:

Documents


1 download

DESCRIPTION

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links. Chen Tang Institute of Communication and Navigation German Aerospace Center. Overview. Introduction and Motivation MUD System Design GPU CUDA Architecture GPU-accelerated Implementation of MUD - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

GPU-accelerated SDR Implementation of Multi-User Detector for Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 1

Chen Tang

Institute of Communication and NavigationGerman Aerospace Center

Page 2: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

Overview

• Introduction and Motivation

• MUD System Design

• GPU CUDA Architecture

• GPU-accelerated Implementation of MUD

• Simulation Result

• Summary

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 2

Page 3: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

Overview

• Introduction and Motivation

• MUD System Design

• GPU CUDA Architecture

• GPU-accelerated Implementation of MUD

• Simulation Result

• Summary

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 3

Page 4: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

Introduction and Motivation

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 4

• Bidirectional satellite communication

• Multi-user access issue• MF-TDMA (e.g. DVB-RCS)

• Multiuser Detection (MUD) • Increase spectrum efficiency• Few practical MUD implementations for satellite systems

• High complexity• Sensitive to synchronization and channel estimation errors

Page 5: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 5

Introduction and Motivation

• NEXT project - Network Coding Satellite Experiment paved the way to the GEO research communication satellite H2Sat.• H2Sat: explore and test new broadband (high data rate) satellite

communication • NEXT Exp 3: Multiuser detection (MUD) for satellite return links

Packet A3 Packet A2 Packet A1

Packet B3 Packet B2 Packet B1Multiuser Detection

Packet A3 Packet A2 Packet A1Packet B3 Packet B2 Packet B1

• Two users transmit at the same frequency and time

• A transparent satellite return link

• Main objectives:• Develop a MUD receiver in SDR• Increase decoding throughput

real-time processing

Page 6: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

Overview

• Introduction and Motivation

• MUD System Design

• GPU CUDA Architecture

• GPU-accelerated Implementation of MUD

• Simulation Result

• Summary

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 6

Page 7: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 7

MUD System Design

• Multiuser detection (MUD) complexity• Optimal MUD proposed by Verdú:

• exponential complexity on number of users

• Suboptimal MUD algorithms:• e.g. PIC; SIC

• We use Successive Interference Cancellation (SIC)• Linear complexity on number of users• Straightforward extension to support more users

Page 8: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 8

MUD System Design

• Successive Interference Cancellation (SIC)• Sequentially decode users & cancel interference• Multi-stage SIC improve PER• Error propagation• Sensitive to channel estimation errors• Phase noise

• Expectation Maximization Channel Estimation (EM-CE)

LDPC

Page 9: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 9

MUD System Design

• Real-time implementation of MUD is challenging

• Processing bottlenecks:• LDPC channel decoding• EM channel estimation• Resampling and interference cancellation

• Programmable hardware devices• DSP; FPGA (hard to develop, low flexibility)• Attractive alternative: GPGPU

• High performance• High flexibility

Page 10: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

Overview

• Introduction and Motivation

• MUD System Design

• GPU CUDA Architecture

• GPU-accelerated Implementation of MUD

• Simulation Result

• Summary

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 10

Page 11: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 11

GPGPU

• GPUs are massively multithreaded multi-cores chips• Image and video rendering• General-purpose computations

Ref: Nvidia CUDA_C_Programming_Guide 2013

Nvidia Tesla c2070:448 cores; 515 GFLOPs of double-precision peak performance

Page 12: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 12

GPGPU

• GPU is specialized for computation-intensive, highly parallel computation (exactly what graphics rendering is about)• More transistors for data processing rather than data caching and flow control

ALU: Arithmetic Logic Unit

• Limited number of concurrent threads• Server with four hex-core processors 24 concurrent active threads (or 48, if HyperThreading supported)

• Much more concurrent threads• Hundreds-cores of processor• more than thousands of

concurrent active threads

Page 13: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

CUDA Architecture

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 13

• In Nov. 2006, first GPU built with Nvidia’s CUDA architecture

• CUDA: Compute Unified Device Architecture• Each ALU can be used for general-purpose computations• All execution units can arbitrarily read and write memory• Allows to use high-level programming languages

(C/C++; OpenCL; Fortran; Java&Python)

Page 14: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 14

CUDA Architecture

• Serial program with parallel kernels • Serial code executes in a host (CPU) thread• Parallel kernel code executes in many device (GPU) threads• Host (CPU) and device (GPU) maintain separate memory spaces

Page 15: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 15

LDPC Decoder on GPU

• Assign one CUDA thread to work on each edge of each check node

U1:n = 4800k = 3200

𝐶 𝑗→𝑉 𝑖

C1 C2 C3 Cn-k

V1 V2 V3 V4 Vn

…...

…...

U2:n = 4800k = 2400

Page 16: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 16

LDPC Decoder on GPU

• Assign one CUDA thread to work on each edge of each check node• Speedup: 10x• Throughput: 1.6Mbps (coderate: 2/3, )

U1:n = 4800k = 3200

𝐶 𝑗→𝑉 𝑖

C1 C2 C3 Cn-k

V1 V2 V3 V4 Vn

…...

…...

U2:n = 4800k = 2400

Page 17: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

Overview

• Introduction and Motivation

• MUD System Design

• GPU CUDA Architecture

• GPU-accelerated Implementation of MUD

• Simulation Result

• Summary

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 17

Page 18: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

MUD receiver on GPU

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 18

• Processing bottlenecks:• LDPC channel decoding• EM channel estimation• Resampling and interference cancellation• Data transfer between host and device memory

(144GB/s of Nvidia Tesla vs. 8GB/s of PCIe*16)

• All parts of each single user receiver and interference cancellation on GPU

• Minimize the latency of intermediate data transfer between host and device memory

GPUCPU

GPUCPU

GPUCPU

GPUCPU

Page 19: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

Overview

• Introduction and Motivation

• MUD System Design

• GPU CUDA Architecture

• GPU-accelerated Implementation of MUD

• Simulation Result

• Summary

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 19

Page 20: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 20

Simulation Setup

• GPU Nvidia Tesla c2070 (1.15GHz)• Comparison benchmark: Intel Xeon CPU E5620 (2.4GHz)

• BPSK modulation• Two user terminals (power imbalance: U1 3dB higher than U2)• Channel coding: LDPC

• Irregular Repeat Accumulate • Blocklength: 4800 bits• U1 coderate: 2/3 , U2 coderate: 1/2

• Baud-rate: 62500 symbols/second real-time threshold: ca. 85ms (66 kbps)

Page 21: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 21

Simulation Result

Real-time threshold

Page 22: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

Overview

• Introduction and Motivation

• MUD System Design

• GPU CUDA Architecture

• GPU-accelerated Implementation of MUD

• Simulation Result

• Summary

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 22

Page 23: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 23

Summary

• SDR implementation of MUD receiver• High flexibility and low cost• Extension to support more users

• GPU acceleration• 1.8x ~ 3.8x faster than the real-time threshold• Still space to improve• New GPU better performance

• GPU CUDA is very promising for powerful parallel computing• Low learning curve• Heterogeneous: mixed serial-parallel programming• Scalable

• CUDA-powered Matlab (MATLAB® with Parallel Computing Toolbox; Jacket™ from AccelerEyes)

• Days/weeks of simulation hours

Page 24: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

• “GNU Radio is a free & open-source software development toolkit that provides signal processing blocks to implement software radios”

• Software Architecture• Main processing of the blocks are in C++

functions processed by CPU on PC

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 24

GNURadio

Python Module

C++ Shared Library

Python Script /GNU Radio Companion

SWIG

Page 25: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 25

GNURadio + CUDA

• , • CPU LDPC Decoder

• Throughtput: • GPU LDPC Decoder

• Throughput:

Irregular Repeat Accumulate LDPC(IRA)

n = 4800k = 2400

Page 26: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 26

CUDA core

CPUCPU

monster

CUDAmonster

Thank you !Q&A ?

Page 27: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 27

Page 28: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 28

GPGPU

• Advantages of GPU: • High computational processing

power• High memory bandwidth• High flexibility

• Drawbacks of GPU: • Non stand-alone device• Bad at serial processing• Separate memory space• Additional hands-on effort

Page 29: GPU-accelerated SDR Implementation of Multi-User Detector  for  Satellite Return Links

Comparison of total processing time of MUD between CPU and GPU

> Sino-German Workshop > Chen Tang > 03.2014DLR.de • Chart 29