software correlators

1
Cross correlators are a highly computationally intensive part of any radio interferometer. Although the number of operations per data sample is small, correlation occurs before time integration and so the sample rate is very high – it must be at least twice the radio-frequency bandwidth of the telescope. The number of operations is proportional to both the bandwidth of the telescope, and the number of baselines being correlated. For SKA 1 , and operation rate of many petaflops will be required in the correlator alone. Software Correlators At present, FPGA and/or ASIC based correlators are widely used. These offer highly transistor- efficient solutions as logic units are hard-wired to solve the particular correlation problem required. However, this efficiency comes at the expense of flexibility. Software correlators, running on general-purpose CPUs or GPUs offer the ability to easily add antennas, change the number of baselines correlated, or tune the spectral resolution of the telescope as required. The specification of the telescope can be scaled with time as more computing power becomes available. Figure 3: The MWA has already trialled a GPU- based correlator (Wayth et al. 2007). Introduction NVIDIA Tesla cards Graphics Processing Units (GPUs) may offer the computational power needed in a correlator for the SKA. Today, NVIDIA’s Tesla cards already offer 100s GFLOPs per card, and a maximum data rate of 64 Gbit/s through the card’s 16-lane PCI-Express-2 connection. They easily surpass more traditional CPUs in cost per FLOP and power consumption per FLOP. In Cambridge, we are developing a CUDA- based correlator to run on NVIDIA GPUs. We hope to deploy a small-scale trial system on the Arcminute Microkelvin Imager (AMI). Prototype systems Dominic Ford, Jongsoo Kim & Paul Alexander Figure 2: LOFAR uses an IBM Bluegene/P as its correlator (Romein et al. 2010). Figure 4: An NVIDIA C1060 Tesla card, offering a theoretical peak performance of 933 GFLOP/s in single precision. Email your comments to: [email protected] Figure 5: The memory model of an NVIDIA Tesla card. Using GPGPUs as Correlators Using GPGPUs as Correlators Antennas Antennas Antennas Time-domain FFT Cross correlation Time integration Spatial FFT and Imaging Figure 1: A schematic FX correlator. High data rate. Low complexity. Lower sample rate. High complexity. Several implementations already exist. LOFAR uses an IBM Bluegene supercomputer as its correlator. An MPI-based correlator for use on x86 compute clusters has been developed by Deller et al. (2007) at Swinburne and is in use at the LBA. The correlation problem is sufficiently parallel that such highly-parallel architectures can be readily used. Moreover, GPU architectures are optimised for performing the multiply-and-add operations that we require. For telescopes with fewer than around 300 antennas, a GPU correlator would be throttled by the speed of PCI- Express data transport, but for larger systems such as SKA 1 , they are an attractive option. This will allow us to optimise our use of the memory architecture of the Tesla cards and the distribution of data between cards. It will also allow us to assess the performance which could be achieved in a correlator for SKA 1 .

Upload: amina

Post on 15-Jan-2016

31 views

Category:

Documents


2 download

DESCRIPTION

Using GPGPUs as Correlators. Dominic Ford, Jongsoo Kim & Paul Alexander. Introduction. Antennas. Antennas. Antennas. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Software Correlators

Cross correlators are a highly computationally intensive part of any radio interferometer. Although the number of operations per data sample is small, correlation occurs before time integration and so the sample rate is very high – it must be at least twice the radio-frequency bandwidth of the telescope.

The number of operations is proportional to both the bandwidth of the telescope, and the number of baselines being correlated. For SKA1, and operation rate of many petaflops will be required in the correlator alone.

Software CorrelatorsAt present, FPGA and/or ASIC based correlators are widely used. These offer highly transistor-efficient solutions as logic units are hard-wired to solve the particular correlation problem required. However, this efficiency comes at the expense of flexibility.

Software correlators, running on general-purpose CPUs or GPUs offer the ability to easily add antennas, change the number of baselines correlated, or tune the spectral resolution of the telescope as required.

The specification of the telescope can be scaled with time as more computing power becomes available.

Figure 3: The MWA has already trialled a GPU-based correlator (Wayth et al. 2007).

Introduction

NVIDIA Tesla cards

Graphics Processing Units (GPUs) may offer the computational power needed in a correlator for the SKA.

Today, NVIDIA’s Tesla cards already offer 100s GFLOPs per card, and a maximum data rate of 64 Gbit/s through the card’s 16-lane PCI-Express-2 connection. They easily surpass more traditional CPUs in cost per FLOP and power consumption per FLOP.

In Cambridge, we are developing a CUDA-based correlator to run on NVIDIA GPUs. We hope to deploy a small-scale trial system on the Arcminute Microkelvin Imager (AMI).

Prototype systems

Dominic Ford, Jongsoo Kim & Paul Alexander

Figure 2: LOFAR uses an IBM Bluegene/P as its correlator (Romein et al. 2010).

Figure 4: An NVIDIA C1060 Tesla card, offering a theoretical peak performance of 933 GFLOP/s in single precision.

Email your comments to: [email protected]

Figure 5: The memory model of an NVIDIA Tesla card.

Using GPGPUs as CorrelatorsUsing GPGPUs as Correlators

Antennas Antennas Antennas

Time-domain FFT

Cross correlation

Time integration

Spatial FFT and ImagingFigure 1: A schematic FX correlator.

High data rate.Low complexity.

Lower sample rate.High complexity.

Several implementations already exist. LOFAR uses an IBM Bluegene supercomputer as its correlator. An MPI-based correlator for use on x86 compute clusters has been developed by Deller et al. (2007) at Swinburne and is in use at the LBA.

The correlation problem is sufficiently parallel that such highly-parallel architectures can be readily used. Moreover, GPU architectures are optimised for performing the multiply-and-add operations that we require.

For telescopes with fewer than around 300 antennas, a GPU correlator would be throttled by the speed of PCI-Express data transport, but for larger systems such as SKA1, they are an attractive option.

This will allow us to optimise our use of the memory architecture of the Tesla cards and the distribution of data between cards. It will also allow us to assess the performance which could be achieved in a correlator for SKA1.