a uniboard based phase 1 ska correlator and …€¦ · 2011‐03‐29 page 2 of 21 document...
TRANSCRIPT
Name Designation Affiliation Date Signature
Additional Authors
P. Boven, J. Hargreaves, S. Pirruccio, S. Pogrebenko (JIVE),
Andre Gunst, Gijs Schoonderbeek (ASTRON), for the UniBoard collaboration.
Submitted by:
A.Szomoru JIVE 2011‐03‐26
Approved by:
W. Turner Signal Processing Domain Specialist
SPDO 2011‐03‐26
A UNIBOARD‐BASED PHASE 1 SKA CORRELATOR AND
BEAMFORMER CONCEPT DESCRIPTION
Document number .................................................................. WP2‐040.070.010‐TD‐001
Revision ........................................................................................................................... 1
Author .............................................................................................................. A Szomoru
Date ................................................................................................................. 2011‐03‐29
Status .............................................................................................. Approved for release
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 2 of 21
DOCUMENT HISTORY
Revision Date Of Issue Engineering Change
Number
Comments
A ‐ ‐ First draft release for internal review
B ‐ ‐ Amended with respect to Sparse AA specifications
1 29th March 2011 First Issue
DOCUMENT SOFTWARE
Package Version Filename
Wordprocessor MsWord Word 2003 03d‐wp2‐040.070.010‐td‐001‐1‐Uni‐concept‐description‐2003
Block diagrams
Other
ORGANISATION DETAILS
Name SKA Program Development Office
Physical/Postal
Address
Jodrell Bank Centre for Astrophysics
Alan Turing Building
The University of Manchester
Oxford Road
Manchester, UK
M13 9PL
Fax. +44 (0)161 275 4049
Website www.skatelescope.org
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 3 of 21
TABLE OF CONTENTS
1 INTRODUCTION ............................................................................................. 6
1.1 Purpose of the document ....................................................................................................... 6
2 REFERENCES ................................................................................................ 6
3 THE UNIBOARD ............................................................................................ 7
3.1 The project .............................................................................................................................. 7
3.2 Project structure and participants .......................................................................................... 7
3.3 The hardware .......................................................................................................................... 8
3.4 The Applications .................................................................................................................... 10
4 UNIBOARD AS A SKA DISH ARRAY CORRELATOR ................................................ 11
4.1 Requirements ........................................................................................................................ 11
4.2 Functionality ......................................................................................................................... 12
4.3 Detailed use of resources...................................................................................................... 14
4.4 Data throughput between tiers and within boards .............................................................. 15
5 UNIBOARD AS A SKA SPARSE ARRAY CORRELATOR AND BEAM‐FORMER .................. 16
5.1 Requirements ........................................................................................................................ 16
5.2 Beam‐former ......................................................................................................................... 17
5.2.1 Interconnectivity ........................................................................................................... 17
5.2.2 Processing ..................................................................................................................... 18
5.3 Correlator .............................................................................................................................. 18
5.3.1 Interconnectivity ........................................................................................................... 18
5.3.2 Processing ..................................................................................................................... 19
6 GENERAL CONSIDERATIONS, POWER CONSUMPTION, PRICE .................................... 20
7 THE FUTURE ............................................................................................... 21
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 4 of 21
LIST OF FIGURES
Figure 1 high level UniBoard design ....................................................................................................... 8
Figure 2 UniBoard with XGB mini backplane .......................................................................................... 9
Figure 3 prototype UniBoard, delivered May 17, 2010 .......................................................................... 9
Figure 4 UniBoard as digital receiver and VLBI correlator, connected via internet ............................. 10
Figure 5 UniBoard as APERTIF correlator (left) and beam former (right), interconnected via a custom‐
made backplane, with ADCs connected to the opposite side of the backplane ........................ 11
Figure 6 UniBoard SKA correlator ......................................................................................................... 13
Figure 7 UniBoard SKA Correlator with Mid‐Plane ............................................................................... 13
Figure 8 UniBoard SKA correlator functionality .................................................................................... 14
Figure 9 UniBoard station processing configuration ............................................................................ 17
Figure 10 UniBoard in sparse array correlator configuration ............................................................... 19
LIST OF TABLES
Table 1 time and frequency resolution needed to map the entire FWHM of a 15 m dish .................. 11
Table 2 Resource usage in correlator FPGAs ........................................................................................ 15
Table 3 Data throughput in UniBoard SKA correlator design ............................................................... 16
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 5 of 21
LIST OF ABBREVIATIONS
AA .................................. Aperture Array
CoDR ............................. Conceptual Design Review
DRM .............................. Design Reference Mission
FLOPS ........................... Floating Point Operations per second
FoV ................................ Field of View
G b/s .............................. Giga bits per second
Ny .................................. Nyquist
Pols ............................... Polarisations
PrepSKA........................ Preparatory Phase for the SKA
RFI ................................. Radio Frequency Interference
rms ................................ root mean square
SKA ............................... Square Kilometre Array
SKADS .......................... SKA Design Studies
SPDO ............................ SKA Program Development Office
SSFoM .......................... Survey Speed Figure of Merit
TBD ............................... To be decided
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 6 of 21
1 Introduction
In this document we propose UniBoard‐based architectures for a SKA phase 1 correlator and beam‐
former. We will first explain the aims and the scope of the UniBoard project and the status of the
hard and firmware. We will then describe how this hardware could be used as a building block for a
mid‐frequency dish array correlator (design by Paul Boven), and a low‐frequency sparse array beam‐
former and correlator (by Andre Gunst). We discuss computing power, interconnectivity, power
consumption and price. Finally we introduce the proposed UniBoard2 project, and how new
technologies may influence the next generation of boards.
1.1 Purpose of the document
The purpose of this document is to provide a concept description part of a larger document set in
support of the SKA Signal Processing CoDR. It provides a ‘bottom up ‘perspective of Correlation and
Sparse Aperture Array Beam‐forming. This document meets the documentation content
requirements detailed in the Signal Processing PrepSKA Work Breakdown and includes
• First draft cost
First draft power
First draft block diagram of the relevant subsystem.
SKA Memos 125, 130 and the Phase 1 DRM have been used as the baseline for best information on
system parameters while the Systems Requirement Specification, SRS, is being created.
2 References
[1] SKA Science Case
[2] The Square Kilometre Array Design Reference Mission: SKA‐mid and SKA‐Lo v 0.4
[3] System Engineering Management Plan (SEMP) WP2‐005.010.030‐MP‐001Reference 3
[4] SKA System Requirement Specification (SRS)
[5] The Square Kilometre Array Design Reference Mission Phase 1 rev v1.3
[6] SKA Memo 130 SKA Phase 1: Preliminary System Description, P Dewdney et al
[7] SKA Memo 125 A Concept Design for SKA Phase 1 (SKA1), M.A. Garrett et al
WP2‐040.070.010‐TD‐001
Revision : 1
3 The UniBoard
3.1 The project
The UniBoard, a Joint Research Activity in the RadioNet FP71 programme, has as its aim the creation
of a generic high‐performance FPGA‐based computing platform for radio astronomy, along with the
implementation of several applications (correlator, digital receiver, pulsar binning machine). It is a 3‐
year project that kicked off on January 1, 2009. Now past its half‐way point, the first prototype
board has been delivered and is undergoing tests, and design documents and a large amount of
firmware have been produced. The board has generated a fair amount of interest in the radio‐
astronomical community, because of its high computing and I/O capacity, its potentially excellent
computing/power consumption ratio and its use of generic interfaces. At this time concrete plans
exist to use it as the basis for the next generation EVN correlator, the APERTIF correlator and beam
former system and at least one all‐dipole LOFAR correlator.
3.2 Project structure and participants
Originally the collaboration consisted of 7 participants, and in the course of the project two more
partners joined. The first partners and their original roles in the project were:
JIVE: project lead, VLBI correlator
ASTRON: hardware and test firmware development
University of Manchester: pulsar binning machine
INAF: digital receiver
University of Bordeaux: digital receiver
University of Orléans: RFI mitigation in pulsar binning application
KASI: VLBI correlator
later joined by:
Shanghai Observatory: VLBI correlator
University of Oxford: all‐dipole LOFAR correlator
At this time the VLBI correlator, digital receiver and pulsar binning machine are all under
development, while the RFI mitigation project has expanded to include both pulsar binning and
digital receiver applications. In addition to the original applications, work has started or is expected
to start soon on an APERTIF correlator and beam former (ASTRON), all‐dipole LOFAR correlators
(ASTRON + University of Amsterdam, University of Oxford). Several other applications are being
considered.1
1 EC Contract no. 227290
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 8 of 21
3.3 The hardware
At the start of the project, an inventory was made of the hardware requirements posed by the
different applications. Considerations of price, availability and pin lay‐out led to the selection of the
Altera Stratix IV EP4SGX230KF40C2 chip (40 nm, 1288 18x18 multipliers, 14.3 Mb internal block
RAM, 24 + 12 transceivers). With 1288 multipliers at 400 MHz each of these chips could yield a
maximum of about 0.5 TMAC/s.
Figure 1 high level UniBoard design
A configuration of eight FPGAs per board was found to be optimal in terms of computing power,
power consumption, density and complexity of the board (Figure 1). Each FPGA is connected to two
DDR3 memory banks, mounted on the back side of the board. Four times four 10‐GbE links connect
to the front nodes (FN) via four SFP+ cages. A high speed mesh connects each FN to all back nodes
(BN). The BNs in their turn connect via four times four 8‐bits LVDS to a backplane connector. To
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 9 of 21
make the board completely symmetrical, a 10G break‐out board (the XGB) has been designed in the
form of a mini‐backplane, with a total of 16 CX4 connectors (Figure 2).
Figure 2 UniBoard with XGB mini backplane
For system management, each FPGA also has a 1Gb/s Ethernet connection to an onboard Ethernet
switch, which offers 4 x 1Gb/s connectivity on RJ‐45 connectors. The central power supply of ‐48V is
distributed on the board via DC/DC convertors and regulators; the PCB itself has 14 layers. Control
and configuration are done via the embedded NIOS processor.
Figure 3 prototype UniBoard, delivered May 17, 2010
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 10 of 21
The actual PCB production and board assembly were outsourced. The prototype was delivered May
17, 2010 (Figure 2). No major design flaws have been identified, although power consumption at full
load has turned out somewhat higher than the original estimate (~400W versus 280W). After a
review of the modified design, a second production run has been initiated and the boards are
expected end of January/beginning of February 2011.
3.4 The Applications
Throughout the board development phase, work on the various applications progressed; design
documents were produced and refined, simulations were done and actual VHDL code was written.
All documentation is posted on a project wiki, and all code is shared through a common repository.
The board control is being written in Erlang, a high‐level language that provides robustness and
completeness and enables a very short code development cycle. At the same time, a general
correlator control system is being designed at JIVE.
The most demanding application at the moment is without doubt the correlator, be it for VLBI or
APERTIF. Several configurations for different applications are being considered, illustrated in Figures
3 and 4.
Figure 4 UniBoard as digital receiver and VLBI correlator, connected via internet
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 11 of 21
Figure 5 UniBoard as APERTIF correlator (left) and beam former (right), interconnected via a custom‐made
backplane, with ADCs connected to the opposite side of the backplane
4 UniBoard as a SKA Dish Array Correlator
4.1 Requirements
According to SKA Memo 130, SKA Phase 1: Preliminary System Description, P.E.Dewdney et. al., the
initial aim will be a mid‐frequency dish array using approximately 250 15‐meter antennas, with
single pixel feeds covering a frequency range from 0.45 to 3 GHz, and a maximum baseline length
from the core of 100 km. In spite of the specification of this rather short maximum baseline, it seems
likely that (possibly non‐SKA) outstations will be included fairly early on to enable VLBI observations.
Consequently, one should at least keep this possibility in mind while designing the correlator.
Using the formulae of Wrobel (1995), we can calculate the maximum field of view having no more
than a 10% decrease in the response to a point source. Reversing these and reworking them slightly,
we can get the time and frequency resolution needed to map the entire FWHM beam. This leads to:
tint < 7.38 x 10‐3 D/B
Δν < 197 D/(Bλ)
with t in s, D the diameter of the dish in m, B the baseline in units of 1000 km, lambda in m, ν in Hz.
In Table 1 the limits are listed for a number of baselines and observing frequencies, assuming a 15 m
dish.
Table 1 time and frequency resolution needed to map the entire FWHM of a 15 m dish
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 12 of 21
We will assume the following SKA configuration:
256 telescopes
single pixel feeds
1024 MHz instantaneous bandwidth
2 pols
8 bits representation
7.5 kHz maximum resolution
s minimum dump time
Note that we are using 8 bits representation instead of the 4 that are specified in memo 130. There
is no real additional cost in this design to using 8 bits representation into the correlator, but it will
make calibration and RFI mitigation much easier. Having enough dynamic range at the input also
makes it much easier to deal with the changes in noise figure over the input bandwidth, strong
sources in the main beam, and other effects that would otherwise require renormalization or even
changing the analogue gain to maintain sufficient resolution.
It is clear that this configuration will easily accommodate baselines out to 1000 km, but that “real”
VLBI will not be possible, at least not over the full beam. We also assume groups of 4 telescopes
instead of the 5 specified in memo 130, as this maps much better onto the UniBoard architecture.
4.2 Functionality
The dish array correlator design is composed of three tiers of UniBoards (Figures 5 and 6). A short
description of the functionality follows, further illustrated in Figure 7.
The tiers are interconnected via 10GE or, as shown in Figure 6, partly through a backplane, or rather,
mid‐plane (which does not exist at this moment).
The first tier accepts digitized telescope data (four telescopes/board) either via 10GE or via LVDS
lines into the BNs. Course delay correction and first stages of filtering are done in the BNs, fine delay
correction and further filtering in the FNs, down to 7.6 kHz bands.
Each board in the second tier receives 64 MHz from one quarter of all telescopes. Through the mesh,
4 MHz from these 64 telescopes are combined on the BNs, where the data streams are corner‐
turned. The function of the second tier is partly that of a high‐speed multi‐port Ethernet switch,
possible thanks to the high‐speed mesh on the UniBoard, but with additional computing power not
available in standard Ethernet switches.
Every board in the final tier receives 8 times 0.5 MHz (0.5 MHz per FPGA) from all 256 telescopes. As
the data from one quarter of the telescopes reaches one FN, some additional hops between FNs and
BNs will be needed to get all the data to the right nodes. In this tier, each FPGA acts as a fully
independent X‐machine.
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 13 of 21
Figure 6 UniBoard SKA correlator
Figure 7 UniBoard SKA Correlator with Mid‐Plane
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 14 of 21
Figure 8 UniBoard SKA correlator functionality
4.3 Detailed use of resources
The functionality of tiers 1 and 2 (filtering, delay correction, switching and corner turning) are fairly
easily accommodated on the available hardware. In fact, the UniBoards in tier 2 will only need DDR3
memory on the BNs, while the FPGAs themselves could be replaced by a cheaper type with less DSP
resources. It is the correlation itself that will be the most challenging part of the design.
An interferometer with 256 stations will have 32640 baselines and 131584 products (full stokes +
autocorrelations). The Altera FPGA we use has 322 18x18 complex multipliers, generating 36 bit
complex results. It also has 1235 M9k memory units, which can be configured as 512x18, or 256x36.
Four of these memories would constitute a complex 36 bit accumulator bank with 512 entries. A
correlator unit will then consist of 1 complex 18x18 multiplier, and 4 M9k memory blocks. As there
are only 1235 M9ks, we can build 308 correlator units. Our design uses 290 correlator units, each
responsible for 454 products.
To process 0.5MHz, the correlator units would need to run at 227 MHz (0.5 MHz * 131584 / 290).
This is a fairly modest rate for these FPGAs, as most of our current designs run at 266MHz. We are
aiming for the SKA_1 design to run at 256 MHz, which is 12.5% faster. As there is sufficient capacity
to run the correlation slightly faster than real‐time, the correlation units are idle for 1/8th of the
time, in which they can be read out.
Worst‐case analysis:
After 1560 integrations (corresponding to 0.1 second, shortest integration time at a 15.6 kHz
resolution), there are 1560 x (512 ‐ 454) = 90480 clock cycles available for reading out the memory.
As each memory contains 456 entries of data, only 2 units need to be read simultaneously which
means the design could be quite simple. The full correlation results comprise 131584 x 2 x 32 bits =
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 15 of 21
8.4 M bit. To send this data out within these 87360 clock cycles would generate a bursts of 24 Gb/s.
There is 40 Gb/s available for sending the data, per chip. As there are 8 correlator chips on a
Uniboard, the data should be arranged in such a way that they do not dump their data at the same
time. There is still enough burst bandwidth available to include headers to identify the integration
number, frequency channel and visibility etc.
Sending the data out at a lower rate is unfortunately not possible because of the lack of memory
inside the FPGA, while using DDR interfaces would consume too much logic. By moving all corner
turning to the second tier (or to the backplane) and storing at least a full integration time in this tier,
and by carefully staggering the delivery of data to the correlator nodes, it should be possible to use
aggregation switches to collect all data bursts.
By having the horizontal boards do a full corner turner for the correlation Uniboards, more of the
workload is placed on tier 2 while leaving all of the resources on the actual correlation boards
available for just performing the cross multiply and accumulation. This has as advantage that the
correlation Uniboards will not need DDR‐ram, while the tier‐2 FPGAs could be cheaper types with
fewer DSP resources.
In Table 2 the resource usage in the correlator FPGAs is summarised.
Table 2 Resource usage in correlator FPGAs
Notes to table 2:
1 Resource estimates are derived from known modules used in test designs or the EVN correlator, for example
2 Switch function assumes one external 10GbE port and 4 XAUI ports on to the mesh
3 Two 10Gb Ethernet ports per FPGA
4.4 Data throughput between tiers and within boards
Table 3 lists the estimated data throughput in and between the elements of the correlator system.
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 16 of 21
Table 3 Data throughput in UniBoard SKA correlator design
5 UniBoard as a SKA Sparse Array Correlator and Beam‐former
5.1 Requirements
Specifications from Memo 130, SKA‐Phase 1: Preliminary system description, P.E. Dewdney et. al.
Number of stations: 50
Number of polarisations: 2
Number of antennas / station: 11200
Input bandwidth: 500 MHz
Output bandwidth: 380 MHz
Number of beams: 480
Number of input bits from the station ADC: 8
Number of output bits to the correlator: 4 x 2
Sub‐band width: 125 kHz
Channel width: 1 kHz
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 17 of 21
5.2 Beam‐former
5.2.1 Interconnectivity
With a total of 11200 antennas, assuming 16 of them to be combined via analogue beam‐forming
and one RF beam, 700 signal paths times two polarisations need to be processed. Both polarisations
can be processed independently of each other. For the purpose of station processing, UniBoards and
receivers can be mounted on opposite sides of a backplane, using the LVDS interfaces on the
UniBoard (Figure 8).
Four of the eight FPGAs on one Uniboard connect to 4 times 8 LVDS pairs. Assuming 8‐bits
digitization and 500 MHz bandwidth, a total of 16 receivers can then be connected to one UniBoard.
Hence for 700 signal paths a total of 44 Uniboards are needed per polarisation. With 8 UniBoards
per sub‐rack this comes to a total of 6 sub‐racks. Multiple sub‐racks will have to be interconnected
through 10GE connections.
The filter bank will be located on the BNs, while beam‐forming is done on the FNs. Each FN will
calculate all the beams for a subset of sub‐bands. This implies that the sub‐bands need to be
distributed over the board, backplane and sub‐racks. On the board itself this is done through the
high‐speed mesh, within one sub‐rack all BNs 0 (through 3) of all boards are interconnected, and
finally all sub‐racks are connected by interconnecting boards 0 (through 7) of each sub‐rack.
Figure 9 UniBoard station processing configuration
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 18 of 21
5.2.2 Processing
The total amount of processing required for beam‐forming, per station for one polarisation, is
480 (beams) x 700 (signal paths) x 380 (MHz) x 4 (complex) = 510.72 TMAC/s.
The maximum processing capacity per UniBoard for the four beam‐forming FPGAs is
1288 (multipliers) x 4 x 266.5 (MHz) = 1.37 TMAC/s
In the previous section it was shown that 44 UniBoards are required from an interconnectivity point
of view. The total amount of processing available for beam‐forming on these boards is 44 x 1.37 =
60.28 TMAC/s. As this is not enough to process all beams, more UniBoards have to be deployed; on
the additional boards all eight FPGAs can be used for beam‐forming.
This means that the total number of UniBoards required to process all 480 beams for the full
bandwidth is
(510.72 – 60.28) / 2.75 + 44 = 208 boards.
Clearly, the total number of UniBoards is dominated by processing needs, not connectivity. Reducing
the number of digital beams reduces the demand for hardware significantly. A balance between
interconnectivity and processing is reached if the number of beams equals
(44 (Uniboards) x 1.37 (TMAC/s))/((700 (signal paths) x 380 (MHz) x 4)) = 56.7.
The coming update of memo 130, which deals with the SKA phase 1 high level description, states an
average number of 160 beams for the full 380 MHz bandwidth, needed to ensure a constant 5
degree FOV across the band. The total amount of processing required in this case equals 170.24
TMAC/s, resulting in a total of 84 UniBoards per polarization.
Another way to reduce the number of digital boards is by adding more antennas via analogue beam‐
forming.
5.3 Correlator
5.3.1 Interconnectivity
The data rate to each UniBoard after station processing is
50 (stations) x 480 (beams) x 125 (kHz) x 4 (bit) x 2 (complex) x 2 (pol) = 41.2 Gbps per sub‐band.
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 19 of 21
Figure 10 UniBoard in sparse array correlator configuration
In principle this bandwidth can be easily handled by one UniBoard. The complication however is to
get one sub‐band from each station to one Uniboard, while making the most efficient use of the
connections between stations and correlator. Per station the data rate for one sub‐band is only 960
Mbps. Assuming each station aggregates 10 sub‐bands on one 10G link, 9 out of these 10 sub‐bands
will have to be redistributed to other UniBoards. Combining at least 10 UniBoards on a backplane is
needed to redistribute sub‐bands to other boards. It might be preferable to aggregate 8 sub‐bands
onto one 10G link, as 8 UniBoards fit in one sub‐rack. In that way a total of 380subracks would be
required to process the full 380 MHz bandwidth (Figure 9).
5.3.2 Processing
For the correlator one has to consider the number of visibilities required per beam and per sub‐
band, since all beams and sub‐bands can be correlated independently. The number of visibilities per
beam and per sub‐band equals (50 x 2)2 / 2 = 5000. Assuming a bandwidth of 380 MHz results in a
total required processing capacity of 380 (MHz) x 5000 x 4 (MAC/s per beam) (the four is necessary
since complex numbers are involved while the MACs are implemented using real multipliers). This
equals 7.6 TMAC/s. On each UniBoard 8 FPGAs are available. The design of the Uniboard is such that
four input FPGAs are responsible for the channelisation while the other four output FPGAs are
responsible for the correlation. Assuming that four FPGAs per Uniboard handle the correlation, the
total amount of processing capacity per Uniboard becomes
1288 (multipliers) x 4 x 266.5 (MHz) = 1.37 TMAC/s.
Hence from a processing point of view 5.5 Uniboards per beam are required. Using 480 beams
results in a total amount of 480 x 5.5 = 2657 Uniboards.
The required processing capacity per sub‐band is
125 (kHz) x 5000 (visibilities) x 4 = 2.5 GMAC/s.
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 20 of 21
Processing all 480 beams for one sub‐band requires 1.2 TMAC/s. This matches the processing
capacity available in one half of a UniBoard (with the other half used for filtering and delay tracking).
This means that each UniBoard can process all 480 beams for one sub‐band, and that the number of
boards required is the same as the number of sub‐bands: 380 (MHz) / 125 (kHz) = 3040.
Using once again a total of 160 beams results in a processing power per sub‐band of 2.5 (GMAC/s) x
160 = 0.4 TMAC/s. Hence each board can easily process 3 sub‐bands (in fact 3.4 sub‐bands). The
total number of boards required for the correlator in this case is 1014.
6 General considerations, power consumption, price
We have presented designs for SKA Phase 1 beam‐former and correlators which could be built
today, using existing hardware. This design has a number of attractive features:
Only one type of hardware to produce and maintain (although FPGAs with identical
footprints but different DSP resources might be used)
No need for (very expensive) high‐capacity routing equipment
Ready availability of firmware specifically written for UniBoard
Standard interfaces
Linear scaling with bandwidth
The current UniBoard uses about 400W, at maximum load. As not all boards will be working at such
a load all the time, we assume an average power consumption of 350W. With a total of 384 boards,
the total power consumption of the dish array correlator will be about 135 kW. The power
consumption associated with cooling will be of the order of 30%, leading to a total of 170 kW.
The price of the current UniBoard is about 15 k Euro/piece, in a production of only nine boards. In
bulk this price should go down considerably, to an estimated 10 k Euro/piece. At this price, the 384
Boards needed for the dish array correlator will cost 3.8 M Euro. This is without memory modules,
cabling, switches, racks, power supplies, but constitutes the biggest single expense. Note that much
of the functionality of tier 2 could be taken over by 16 128‐port Ethernet switches. Such switches
however currently cost between 0.5 and 1 M Euro per piece (as opposed to 40 k Euro for 4
UniBoards).
For the sparse array specifications, the stations need a total of 416 UniBoards to process 480 beams
or 168 UniBoards for 160 beams. With 50 stations, this results in respectively 20800 and 8400
boards for beam‐forming. The correlator adds another 3040 (480 beams) or 1014 boards (160
beams), resulting in a total of 23840 (480 beams) or 9414 (160 beams) UniBoards.
At 10 k Euro/board the total budget required for beam‐forming and correlation for respectively 480
and 160 beams equals 238 M Euro or 90 M Euro. The power dissipation too is considerable: 8.3 M
Watt for 480 beams and 3.3 M Watt for all digital processing.
These numbers of course assume an optimal use of resources, which is probably a bit optimistic.
Nevertheless, the dish array correlator seems feasible enough, considering price, size and power
WP2‐040.070.010‐TD‐001
Revision : 1
2011‐03‐29 Page 21 of 21
budget, while the sparse array beam former and correlator are clearly more challenging. However,
all this is based on the use of currently available hardware.
7 The future
It is obvious that a Phase 1 SKA correlator will not be built using hardware based on 2009
technology.
The current RadioNet project ends in 2012. If RadioNet3 receives funding, the UniBoard effort will
continue in UniBoard2, which has as its aim to create a completely re‐designed platform with several
innovative features. The new project will specifically address power efficiency, first of all by using
the newest available hardware (at this time this would mean replacing the current 40nm by 28nm
FPGAs), but also by investigating techniques offered by FPGA manufacturers under names such as
HardCopy or EasyPath. This enables one to develop on standard FPGAs and then to freeze the design
into ASICs with the same footprint, allegedly cutting power consumption by as much as 50%. Other
“green” measures will include the use of non‐leaded components, the careful balancing of system
parameters and performance and the optimisation of firmware designs and algorithms.
Regardless of the success of the RadioNet3 proposal, a re‐spin of the current UniBoard is a viable
option. The constraint on the mid‐frequency correlator design implemented in the Stratix IV
EP4SGX230 is not the number and speed of the multipliers but the logic needed to get data into
them, and the on‐chip storage needed for the accumulation products. Some of these constraints
could be alleviated in the future by migrating to a Stratix V device. For example the 5SGXAB has
39Mbits of on‐chip SRAM compared to 14Mbits for the EP4SGX230 – enough to permit
accumulation to 36bit resolution. It also has over four times the general logic resources and eight
times as many registers which would allow a simpler and potentially faster design for the MAC input
multiplexer and accumulator. If the 1500 18x18 bit multipliers on the 5SGXAB were clocked at
350MHz, it would be possible to increase the bandwidth processed per FPGA from 0.5MHz to 1MHz,
reducing by half the number of UniBoards needed for the correlation stage.
If the hardware for SKA phase 1 is selected only in 2015, FPGA technology will likely have advanced
by two generations. This means that we can expect the number of UniBoards needed for processing
to go down by a factor of about 4, which would reduce the budget for the sparse array
correlator/beam‐former to a more manageable 60 M Euro for 480 beams and 22.5 M Euro for 160
beams.