© 2005 mercury computer systems, inc. yael steinsaltz, [email protected] scott geaghan,...

25
© 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, [email protected] Scott Geaghan, [email protected] Myra Jean Prelle, [email protected] Brian Bouzas, [email protected] Michael Pepe, [email protected] Leveraging Multicomputer Frameworks for Use in Multi- Core Processors High Performance Embedded Computing Workshop September 21, 2006

Upload: mitchell-washington

Post on 11-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.

•Yael Steinsaltz, [email protected]•Scott Geaghan, [email protected]•Myra Jean Prelle, [email protected]•Brian Bouzas, [email protected]•Michael Pepe, [email protected]

Leveraging Multicomputer Frameworks for Use in Multi-Core Processors

High Performance Embedded Computing Workshop

September 21, 2006

Page 2: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.2

Outline

• Introduction

• Channelizer Problem

• Preliminary Results

• Summary

Page 3: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.3

Multi-Core Processors

• Multi-Core processors vary in architecture from 2-4 identical cores (Intel Xeon, Freescale

8641), to a single Manager, several Workers on a die (IBM Cell Broadband Engine™ (BE) processor).

• Focusing on the IBM Cell BE processor, and using the standard presented in www.data-re.org, we implemented an API ‘Multi-Core Framework’ (MCF).

• MCF is applicable across architectures as long as one process acts as a Manager; more established APIs would work as well.

Page 4: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.4

Multi-Core Framework

• MCF is based on Mercury's prior implementation of www.data-re.org, a product named “Parallel Acceleration System” or PAS.

• Distributed data flows in a Manager-Worker fashion enabling concurrent I/O and parallel processing.

• Function Offload model, where user programs both Manager and Workers. MCF simplifies development.

• LS memory is used efficiently (< 5% for MCF kernel).

• Runs tasks on SPE without Linux® overhead (thread create is bypassed).

Page 5: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.5

Data Movement

• Multi-buffered, strip mining of N-dimensional data sets between a large main memory (XDR) and small worker memories.

• Provides for overlap and duplication when distributing data as well as different partitioning.

• Data re-organization enables easy transfer of data between local stores.

Page 6: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.6

Outline

• Introduction

• Channelizer Problem

• Preliminary Results

• Summary

Page 7: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.7

Objective and Motivation

•Objective : Develop a Cell BE based real-time signal acquisition system composed of frequency channelizers and signal detectors in a single ~6U slot.

•Motivation : Benchmark computational density between PPCs, FPGAs & Cell-BE for a typical streaming application

Page 8: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.8

The Channelizer Problem

• FM3TR Signal (Hopping, Multi-Waveform, Multiband)

• Channelization using 16K real FFT with 75% overlap of the input (Computation signal independent).

• Simple threshold for detection of the active channels (Computation is data dependent).

Page 9: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.9

Channelizer Problem

• The signal acquisition system separates a wide radio frequency band into a set of narrow frequency bands.

• Implementation Specifications 4:1 Overlap Buffer: 16K sample buffer -> 8K complex FFT. Blackman Window (Embedded Multipliers). Log-magnitude Threshold: adjustable register and comparator to determine detections

Page 10: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.10

Data Flow and Work Distribution

• manager

• thread of

• manager

manager

manager

manager

Teams perform

data parallel math

Manager

thread of

execution High speed

Alarm worker

Channelizer

workers

Input data

Channelizer output

worker

HSA output

Unused processing elements

Unused processing elements

worker worker

worker workerworker

Page 11: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.11

Data Flow – Re-org Channels

Channelizer team Local StoreXDR

HSA team LS

Page 12: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.12

Data Flow – Re-org Channels

Channelizer team Local StoreXDR

HSA team LS

Page 13: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.13

Data Flow – Re-org Channels

Channelizer team Local StoreXDR

HSA team LS

Page 14: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.14

Data Flow – Re-org Channels

Channelizer team Local StoreXDR

HSA team LS

Page 15: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.15

Data Flow – Re-org Channels

Channelizer team Local StoreXDR

HSA team LS

Page 16: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.16

Data Flow – Re-org Channels

Channelizer team Local StoreXDR

HSA team LS

Page 17: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.17

Data Flow – Re-org Channels

Channelizer team Local StoreXDR

HSA team LS

Page 18: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.18

Data Flow – Re-org Channels

Channelizer team Local StoreXDR

HSA team LS

Page 19: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.19

Data Flow – Re-org Channels

Channelizer team Local StoreXDR

HSA team LS

Page 20: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.20

Data Flow – Re-org Channels

Channelizer team Local StoreXDR

HSA team LS

Page 21: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.21

Outline

• Introduction

• Channelizer Problem

• Preliminary Results

• Summary

Page 22: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.22

Development Time and Hardware Use

• PPC – 22 PPC needed for the channelizer, and 7 PPC for the HSA; about 2 man-months for development.

• FPGA – one half of a VirtexIIPro P70 FPGA (quarter board), about 8 man-months, all the math had to be developed using some Xilinx cores.

• Cell BE – single processor (half board), about 4 man-weeks (using the same math and SAL calls as the PPC code).

Page 23: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.23

Data Rates Tested

• PPC implementation accepted data at 70, 80 and 105 MHz (and is easily scalable).

• FPGA implementation met data rates at 70 and 80 MHz (MS/sec).

• Cell BE implementation met data rates at 70, 80 and 105 MHz (MS/sec).

Windowing wasn’t implemented in Cell BE because of insufficient local store for the weights. To add this an extra 2-3 weeks of design modification to the data organization and channels would be needed (Times were measured with a multiply by constant to be true to performance).

Math only started to impact data rates when using less than 4 SPEs for the FFT, adding more SPEs didn’t result in added speed.

Page 24: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.24

Outline

• Introduction

• Channelizer Problem

• Preliminary Results

• Summary

Page 25: © 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, ysteinsa@mc.com Scott Geaghan, sgeaghan@mc.com Myra Jean Prelle, mjp@mc.com Brian Bouzas, bbouzas@mc.com

© 2005 Mercury Computer Systems, Inc.25

Summary

• Morphing a library with similar API to new architecture makes porting applications efficient.

• Hardware footprint (6U slots) is comparable to FPGA use.

• The small size of the SPE local store is a significant contributor in determining whether an application will port easily or require additional work.

• Mercury is fully cognizant of the architecture and works to reduce code size while benefiting from the large I/O bandwidth and fast processing capability of the Cell BE.