xmas meeting, manchester, dec 2006, r. hughes-jones manchester 1 atlas tdaq networking, remote...
TRANSCRIPT
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester1
ATLAS TDAQ
Networking,Remote Compute Farms
& Evaluating SFOs
Richard Hughes-Jones The University of Manchester
www.hep.man.ac.uk/~rich/ then “Talks”
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester2
Network Purchase Review
Considered the 4 networks in the Online system:DataCollection, Backend, Control, Management
The review process was to: Derive the quantities and connectivity from each of the document
sources Cross-check where possible that the information was consistant Examine what was ordered
Documents used: network viso diagrams ROS Control and Data Networks Architecture Proposal 28 April, 2006 version
0.4 The order to IT for the network components Rack layouts
Components & devices Ordered Installation well in hand
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester3
Remote Computing Concepts
ROBROBROBROB
L2PUL2PUL2PUL2PU
SFISFI SFI
PFLocal Event Processing Farms
ATLAS Detectors – Level 1 Trigger
SFOs
Mass storageExperimental Area
CERN B513
CopenhagenEdmontonKrakowManchesterAmsterdam
PF
Remote Event Processing Farms
PF
PF PF
ligh
tpat
hs
PF
Data Collection Network
Back End Network
GÉANT
Switch
Level 2 Trigger
Event Builders
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester4
Tests & Traffic on Pionier Summer 2006 Set up Advanced Network Test
GPS systems CERN, Krakow, Manc
Examined Network Performance PC-PC ANT-ANT
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester5
CERN Krakow: Throughput CERN Krakow Krakow CERN
atb79-zeus15_28Sep06
0100200300400500600700800900
1000
0 10 20 30 40Spacing between frames us
Recv W
ire r
ate
Mbit/s
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
atb79-zeus15_28Sep06
0
20
40
60
80
100
0 10 20 30 40Spacing between frames us
% P
acket
loss
50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
atb79-zeus15_28Sep06
0
20000
40000
60000
80000
100000
0 10 20 30 40Spacing between frames us
Packet
re-o
rdere
d 50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
800Mbit/s 20% Loss in the network
zeus15-atb79_29Sep06
0100200300400500600700800900
1000
0 10 20 30 40Spacing between frames us
Recv W
ire r
ate
Mbit/s
50 bytes
100 bytes
200 bytes
400 bytes
600 bytes
800 bytes
1000 bytes
1200 bytes
1400 bytes
1472 bytes
zeus15-atb79_29Sep06
0
20
40
60
80
100
0 10 20 30 40Spacing between frames us
% P
acket
loss
50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
600-800Mbit/s 40% Loss in the network ~3 * more re-ordering
zeus15-atb79_29Sep06
0
50000
100000
150000
200000
250000
0 10 20 30 40Spacing between frames us
Packet
re-o
rdere
d 50 bytes
100 bytes 200 bytes
400 bytes 600 bytes
800 bytes 1000 bytes
1200 bytes 1400 bytes
1472 bytes
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester6
Remote Farms: TDAQ Initiative Aim of the Initiative:
Develop the methodology Provide or identify the required “tool kits” Demonstrate Proof of Concepts for using remote Grid or compute farms in
real time for ATLAS TDAQ applications. Not guaranteed real-time services for ATLAS – yet Use Cases:
Calibration specific data from a small section of the experiment
Monitoringdata from several sub-detectors, allowing more detailed examination of a sub-detector performance
Remote event storage Remote Event Filter Processing
extend the computing capabilities of the Event Filter to remote Grid Farms Security – Protect Point1
Application Gateways Links to Grid operations
Document in Draft
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester7
SFO Evaluation SFO
Buffers events from the Event Filter Transfers them to Mass Storage (CASTOR)
Data movements: Receive events over the network from Event Filter
specific data from a small section of the experiment Store them on one RAID5 disk sub-system in multiple streams When this is full swap writing to second RAID system Read data back from first RAID5 disk sub-system Send events over second network to mass storage at CERN
As part of the RFI (Tender) Test performance of RAID controllers
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester8
RAID Controller Performance: Single Flow Elonex system had the Intel S5000PSL motherboard Two dual-core Intel Xeon CPU 5130 2.00GHz CPUs 3Ware 9550SX-12MI 12 port SATA-RAID on PCI-e bus Seven465.76 GB Hitachi Deskstar® E7K500 Configured as RAID5 64k byte stripe size
RAID5 ext2 file system One flow 2 GByte file: Read 2750 Mbit/s, Write 1215 Mbit/s
Disk – Memory Read Speeds Memory - Disk Write Speeds
RAID5 7disks 3w9550 ext2 29 Oct 06 write
0
500
1000
1500
2000
2500
3000
3500
4000
0.0 1000.0 2000.0 3000.0 4000.0 5000.0 6000.0 7000.0 8000.0
File size Mbytes
Th
rou
gh
pu
t M
bit
/s
Mbit/s 256 w
Mbit/s 2048 w
Mbit/s 16385 w
RAID5 7disks 3w9550 ext2 29 Oct 06 read
0
2000
4000
6000
8000
10000
12000
14000
16000
0.0 1000.0 2000.0 3000.0 4000.0 5000.0 6000.0 7000.0 8000.0
File size Mbytes
Th
rou
gh
pu
t M
bit
/s
Mbit/s 256 r
Mbit/s 2048 r
Mbit/s 16384 r
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester9
RAID Controller Performance: Multiple Flows
RAID5 ext2 file system 8 GByte file: 2 Flows 180 Mbit/s /flow 5 Flows 10-25 Mbit/s /flow
Too slow !
Memory - Disk Write 2 flows Memory - Disk Write 5 flows
RAID5 7disks cntl0 2 write ext2 20 Nov 06 write RA 2048
0
20
40
60
80
100
120
140
160
180
200
0.0 2.0 4.0 6.0 8.0 10.0
Test number
Th
rou
gh
pu
t M
bit
/s
RAID5 7disks cntl0 5 write ext2 20 Nov 06 write RA 2048
0
5
10
15
20
25
30
0.0 2.0 4.0 6.0 8.0 10.0
Test number
Th
rou
gh
pu
t M
bit
/s
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester10
Any Questions?
Xmas Meeting, Manchester, Dec 2006, R. Hughes-Jones Manchester11
Backup Slides