distributed iaas clouds and 100g networking for hep...
TRANSCRIPT
![Page 1: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/1.jpg)
Distributed IaaS Clouds and 100G Networking for HEP
applications HPCS 2013 June 3
1
Ian Gable
A.Agarwal, A.Charbonneau, C.Leavett-‐Brown, K Lewall. R. Impey , M.Paterson, W. Podiama, R.J. Sobie, R.Taylor
A. Barczyk, R. Hockett, D. Kcira, I. Legrand, S. McKee, A. Mughal, H. Newman, S. Rosza, S. Timoteo, R. Voicu
M. Hay, D. McWilliam Y. Savard, T. Tam
University of Victoria, NRC
BCNet and CANARIE
Caltech and University of Michigan
![Page 2: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/2.jpg)
2
IaaSClouds
VM Node
DataNetwork
Typically, computing tasks have been moved to the data. With Infrastructure-as-a-Service clouds we want to move the data to computing on demand as resources become available. Networks are the key.
![Page 3: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/3.jpg)
Outline
Running ATLAS jobs on distributed IaaS cloud
– Motivation
– Software required
– Results from 14 months of production operation
100G network demonstrations at Super Computing
2012
– HEP networks today
– Physical setup
– Results
3
![Page 4: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/4.jpg)
ATLAS Experiment in One Slide
• A particle detector at CERN which gathers a vast
amount of data (100+ PB in last 2 years)
• Analysis and simulation computational tasks (i.e. jobs)
are embarrassingly parallel
• Most of the computation is completed today on the
WLCG Grid
4
See Reda Tafirout’s talk:
“ATLAS Experiment: Big Data + Big Compute = Big Discovery” Immediately following this talk
![Page 5: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/5.jpg)
Many facilities are evolving into clouds Can we use these sites in a federated way?
And integrate them into our existing systems?
What are some of the challenges?
Distributed cloud computing system for ATLAS Production system for ATLAS experiment
Integrated into the WLCG
5
![Page 6: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/6.jpg)
Randall Sobie IPP/UVictoria 6
Cloud Scheduler
IaaS Clouds
HTCondor JobQueue
UserJob
User VM
Discover User-‐Job
![Page 7: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/7.jpg)
Randall Sobie IPP/UVictoria 7
Cloud Scheduler
IaaS Clouds
HTCondor JobQueue
UserJob
User VM
Boot User-‐VM on a cloud
![Page 8: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/8.jpg)
Randall Sobie IPP/UVictoria 8
Cloud Scheduler
IaaS Clouds
HTCondor JobQueue
UserJob
User VM
VM registers with HTCondor
![Page 9: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/9.jpg)
9
Cloud Scheduler
IaaS Clouds
HTCondor JobQueue
UserJob
User VM
Dispatches UserJob to VM
Remote VM image repository (Nimbus clouds) Upload VMs to OpenStack clouds
![Page 10: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/10.jpg)
Current clouds
10 10
Clouds
Nimbus
Victoria(3) Ottawa
FutureGrid Chicago FutureGrid SanDiego FutureGrid Florida
OpenStack
Melbourne-‐NECTAR CERN-‐Ibex
CANARIE-‐West CANARIE-‐East
Imperial College-‐GridPP
![Page 11: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/11.jpg)
11 11
Cloud evaporation and condensation
FutureGrid Monthly Maintenance
WestGrid outage
Number of 8-‐core VMs in May 2013
![Page 12: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/12.jpg)
VMs in 2013
12 12
CERN IBex
CANARIE Imperial College
Nimbus Clouds (and NECTAR)
OpenStack Clouds
8-‐core VMs at each site
![Page 13: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/13.jpg)
13
Fully integrated as an ATLAS Grid site (grid operations, monitoring, …)
April 2012
Integrated number of jobs (380,000)
Weekly jobs (12,500 in 2013)
Peak over 1000 simultaneous jobs April 2013
400,000
15,000
![Page 14: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/14.jpg)
14
Moving on to Networking
![Page 15: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/15.jpg)
HEP networks today
15
![Page 16: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/16.jpg)
Next Generation 100G Networks
16
At Super Computing 2012 in November 2012 in Salt Lake City a we worked together to set new records for Wide Area Network transfer.
CANARIE, BCNet, Internet 2 3x 100G circuits into a single conference booth
![Page 17: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/17.jpg)
SC12 WAN
17
Internet2 SLC
Internet2, Caltech818W LA
Alcatel SR-‐12100/40G
Switch-‐Router
40GE Gen3 Servers
100G
9 x 40G QSFP SR
SCinet
Caltech Booth
Caltech Victoria MichiganEfficient LHC Data Distribution across 100GE Networks
710 STARLIGHT
Univ of Victoria
100G
100G
Cisco ONS 15454Optical Network System
Cisco M6100GE DWDM
Seattle to BCNet
SDN
SDNSDN
Univ of Michigan
Cisco ONS 15454Optical Network System
3 x 40G QSFP SR
4 x 40G QSFP SR
40GE Gen3 Servers
Alcatel SR-‐12100/40G
Switch-‐Router
40GE Gen3 Servers DDN
SFA-‐12K
40GE Gen3 Server
100G
100G
Juniper MX480100/40G
Switch-Router
![Page 18: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/18.jpg)
18
Internet2 SLC
100/40GSwitch-‐router 40GE Gen3
Servers
100G
6 x 40G QSFP SR
SCinet
Caltech Booth At SC12
UVic SC12WAN Diagram
Univ of Victoria
100G
100GSeattle to BCNet
SDN4 x 40G QSFP
SR
Victoria to Salt Lake City
![Page 19: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/19.jpg)
IBM x3650 M4
19
![Page 20: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/20.jpg)
Juniper MX480
20
![Page 21: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/21.jpg)
Ciena OME 6500
21
![Page 22: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/22.jpg)
22
![Page 23: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/23.jpg)
23
![Page 24: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/24.jpg)
SC Show Floor 40G and 100G equipment
24 24
Data Direct
Networks Storage 120 TB
Alcatel-Lucent 3x100GE 15x40GE
Juniper MX480 1x100GE 4x40GE
Force10/Dell Z9000
(40GE x 32)
2x Dell R720 Lustre clients
4x SuperMicro Servers
SSD Storage
2x SuperMicro Servers
FusionIO Storage
2x SuperMicro Lustre clients
2x Dell R520 Lustre clients
33 Mellanox ConnectX-3 VPI cards
![Page 25: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/25.jpg)
Alcatel 3 x LR4 100G CFP 25
![Page 26: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/26.jpg)
26 Force 10 z9000 32x40G Ethernet
![Page 27: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/27.jpg)
Day One: Memory to Memory
27
Memory to Memory Transfer UVic to SC 3 Machine at UVic to Three Machines at SC At SC11 this was hard to achieve
0G
100G
![Page 28: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/28.jpg)
Disk to Disk
28
4 IBM x3650 with OCZ SSDs at UVic to SSD servers and Lustre clients mounting DDN SFA12K 1500 km
Peak 96 Gbps sustained 85 Gbps for 10 hours
![Page 29: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/29.jpg)
29
Remote Direct Memory Access over Ethernet 5% CPU Utilization 1 Server in the Booth and two Servers at Caltech
![Page 30: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/30.jpg)
30
337 Gbps peak Memory to Memory Rate equivalent to 3PB per day
![Page 31: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/31.jpg)
Successes
IaaS Production system
– ATLAS IaaS distributed cloud in
production for 14 month
– CANFAR (astronomy) using
the same technology
– Federated system 10+ clouds
over 3 continents, HEP and
non-HEP sites
– 1000 simultaneous jobs
– Dynamic system that handles
changing availability
31
100G networking
– 96 Gbps disk to disk using
‘modest’ set of hardware
over 1500 km
– 337 Gbps memory to
memory using 3 sites
– Promising demonstration of
RDMA over Ethernet.
– Possible to efficiently use
100G networks at the end
site
![Page 32: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/32.jpg)
Summary
Clouds are increasingly available to the research community.
Distributed cloud computing is a viable solution for scientific
computing
No observed limits to the system scalability in low I/O cases.
We expect 100G networks to remove barriers to high I/O
applications on clouds.
32
![Page 33: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/33.jpg)
Industrial Partners
33
![Page 34: Distributed IaaS Clouds and 100G Networking for HEP ...heprcdocs.phys.uvic.ca/presentations/hpcs-gable-2013.pdfCiena OME 6500! 21! 22! 23! SC Show Floor 40G and 100G equipment! 24](https://reader036.vdocuments.site/reader036/viewer/2022081618/60a3f401199a6e2ce025f799/html5/thumbnails/34.jpg)
34
http://heprc.phys.uvic.ca [email protected] @igable Cloud Scheduler:
https://github.com/hep-‐gc/cloud-‐scheduler