What is JIVE?Operate the EVN correlator and support astronomers doing VLBI.
A collaboration of the major radio-astronomical research facilities inEurope, China and South Africa
A 3 year program to create a distributed astronomical instrument of inter-continental dimensions using e-VLBI, connecting up to 16 radio telescopes
Radio vs. Optical astronomyThe imaging accuracy (resolution) of a telescope: θ ≈ 1.2 λ/D (λ = wavelength, D = diameter)
Hubble Space Telescope:λ ≈ 600nm (visible light)D = 2.4mθ = 0.06 arcsecond
Onsola Space Observatory:λ = 6cm (5GHz)D = 25mθ = 600 arcsecondsWanted: 240km dish
Very Long Baseline Interferometry• Create a huge radio telescope by using telescopes in different locations around the world at the same time
• Resolution depends on distance between dishes, milli-arc second level• Sensitivity on dish area, time and bandwidth• Requires atomic clock stability for timing• Processed in a specialpurpose super-computer:Correlator, 16x 1024Mb/s
Very Long Baseline Interferometry
tekst
• Initially (1990) we used large single-reel tapes
tekst
• Then came harddisk-packs
• And now: e-VLBI
Very Long Baseline Interferometry
“Never underestimate the bandwidth of a station wagon laden with computer tapes hurtling
down the highway”(Andy Tanenbaum)
Latency: 2 weeks Latency: 150ms
Why e-VLBI• Quick turn-around• Rapid response• Check data as it comes in, not weeks later (You can’t redo just 1 telescope)• More bandwidth• Logistics (disks delayed/deleted/damaged/destroyed)
Example: Cyg X-3• Star + black hole• Flares irregularly• Timescale: days• Left: 2 weeks late• May: Observed flare with e-VLBI
Networking challenges
e-VLBI is:
• High Bandwidth: > 1 Gb/s
• Long Distance: Worldwide
• Real-time
• Long duration: 12 hours
• Can accept a little
packet loss
TCP Research• Mirror port (span)
• eVLBI: RTT up to 375ms
• Window Size (kernel vers.)
• SACK-bugs
• TCP Tuning defeats fairness
• Conclusion:
• UDP
• ‘private’ connections
(LP, VLAN, dark fiber)
Lightpaths• Dedicated point-to-point circuit• Based on SDH/Sonet timeslots (NOT a lambda)• Stitched together at cross-connects• Guaranteed bandwidth• But also: a string of SPFs
e-VLBI
Lightpaths• Especially the longer lightpaths have many outages• NRENs usually very good about announcing maint.• A -lot- of email.
• e-VLBI is becoming a ‘target of opportunity’ instrument, planned and unplanned observations
Telescope CC Bandwidth RTT (ms)Sheshan CN 622M LP / 512M R 354 / 180
ATNF AU 1G LP 343
Kashima JP 512M R 288Arecibo PR 512M VLAN 154
TIGO CL 95M R 150
Westford US 512M R 92
Yebes ES 512M R 42,1Torun PL 1G LP / 10G R 34,9
Onsala SE 1.5G VLAN 34,2
Metsahovi FI 10G R 32,7
Medicina IT 1G LP 29,7
Jodrell Bank UK 2x 1G LP 18,6
Effelsberg DE 10G VLAN 13,5WSRT NL 2x 1G CWDM 0,57
Network Overview
The 1Gb/s speedbump• VLBI (tape based) comes in fixed speeds, power of 2: 128Mb/s, 256Mb/s, 512Mb/s - and 1024Mb/s
• 1024Mb/s > 1Gb/s! (with headers it’s more like 1030)
• Dropping packets works but is sub-optimal
• Dropping ‘tracks’ to <1Gb/s: Takes a LOT of CPU work
• Lightpaths come in ‘quanta’ of150Mb/s, but Ethernet doesn’t
The Trouble with Trunking• Standard trunking: LACP (802.3ad)• Uses a hash of source/destination MAC, IP and/or Port to choose outgoing port• This is to prevent re-ordering• A single TCP/UDP stream will use only 1 link member!
• Recent Linux kernels come with bonding, ‘ifenslave’• Round Robin traffic distribution• Keep both halves in separate VLANS/Lightpaths as switches in between only speak LACP
“Do NOT cross the streams!”
All the colours of the rainbow...
... and then some.
SX: 850nm
1610nm
LX: 1310nm
ZX: 1550nm
1470nm
1550nm
1510nm
1570nm
Timing is everything• Originating network interface often has more
bandwidth than the end-to-end network path
• Example: 1Gb/s interface, 622 Mb/s lightpath
• Trying to send 520 Mb/s for e-VLBI - should fit...
• Ethernet either sends at 1Gb/s, or 0 Gb/s
• Linux timer interrupt: 250Hz (2.6) or100Hz
(2.4)
• Data will be sent as bursts: 4ms at 1Gb/s is
512kB
• Buffer space at choke-point is limited, so
packets get
dropped when queue fills up
Timing is everything
• Linux timer interrupt, even at 1000Hz, is too
slow
• Use a CPU to run a calibrated delay at full
speed
• And mount a big cooling fan
• Good packet spacing - problem solved?
• A real-time problem requires a real-time
kernel
• High-resolution nanosleep( ) in 2.6.17 and
beyond
• Available by default in e.g. Debian Lenny
(2.6.26)
• Sleeping instead of spinning
• CPU-core load from 99% to 5% - much
greener!
Multicast - it’s not just for video• MERLIN - Multi Element Radio Linked
Interferometer
• 6 UK telescopes, connected at 128Mb/s
• Up to 4 recorded on one server
• Disk: copy the disks upon arrival
• e-VLBI: network to copy data
• Using a span/mirror port (ugly hack)
• Multicast 512Mb/s UDP stream
• Needs hardware multicast support in
switch/router
• Now in regular production use: ‘Merlincast’
The Elliptical Robin• Jodrell Bank ‘home’ telescope:
full 1024 Mb/s
• 2 lightpaths to JIVE, 1Gb/s each
• Use ethernet bonding (‘Round Robin’) -
distribute
the packets over 2 links
• No room left over for 512 Mb/s Merlincast?
• 8 line changes to
.../drivers/net/bonding/bond_main.c
• # modprobe bonding skew=4
• 5 packets on first interface, then 1 on the
other
e-VLBI today• ToO observation of Cygnus X-3 outburst
• 3 observation epochs (May 23rd , June 10th,
July 4th)
• Observations at K-band (1cm, 22GHz)
• Telescopes from Australia, Japan and Europe
e-VLBI today• ToO observation of Cygnus X-3 outburst
• 3 observation epochs (May 23rd , June 10th,
July 4th)
• Observations at K-band (1cm, 22GHz)
• Telescopes from Australia, Japan and Europe
Future e-VLBI• More bandwidth increases sensitivity (by √B)
• Current correlator limited to 1024Mb/s (per
station)
• Researching software correlators, GPUs
• Researching FPGA based correlators (64 Gb/s
/st)
• New telescope backends
• More telescopes increases image accuracy
• Current correlator limited to 16 stations
• N * (N - 1) / 2 baselines
• FPGA based design targetting 32 stations