virtualisation, clouds and iaas at cern helge meinhard cern, it department vtdc delft (nl), 19 june...

34
Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

Upload: marilyn-rook

Post on 29-Mar-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

Virtualisation,Clouds and IaaS at CERN

Helge MeinhardCERN, IT Department

VTDC Delft (NL), 19 June 2012

Page 2: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

2

Outline

• Introduction– CERN– Physics at the LHC– LHC machine and detectors– Data processing challenges– WLCG– CERN computer centre

• Past and present (phase I): CERN virtualisation infrastructure, service consolidation, lxcloud

• Present and future (phase II): remote data centre, new tool suite, IaaS

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 3: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012
Page 4: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

Hubble ALMA

VLTAMS

AtomProton

Big Bang

Radius of Earth

Radius of Galaxies

Earth to Sun

Universe

cmStudy physics laws of first moments after Big Bang increasing Symbiosis between Particle Physics, Astrophysics and Cosmology

Super-Microscope

LHC

Virtualisation, clouds, IaaS - June 2011 4

Page 5: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

Enter a New Era in Fundamental ScienceThe Large Hadron Collider (LHC), one of the largest and truly global

scientific projects ever, is the most exciting turning point in particle physics.

Exploration of a new energy frontier

LHC ring:27 km circumference

CMS

ALICE

LHCb

ATLASVirtualisation, clouds, IaaS -

June 2011 5

Page 6: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

Virtualisation, clouds, IaaS - June 2011 6

Page 7: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

The LHC Computing Challenge

Signal/Noise: 10-13 (10-9 offline) Data volume

High rate * large number of channels * 4 experiments

22 PetaBytes of new data each year Compute power

Event complexity * Nb. events * thousands users

200 k CPUs 45 PB of disk storage

Worldwide analysis & funding Computing funding locally in major

regions & countries Efficient analysis everywhere GRID technology

Page 8: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

Worldwide LHC Computing Grid Tier 0: CERN

Data acquisition and initial processing

Data distribution Long-term curation

Tier 1: 11 major centres Managed mass storage Data-heavy analysis Dedicated 10 Gbps

lines to CERN Tier 2: More than 200

centres in more than 30 countries Simulation End-user analysis

Tier 3: from physicists’ desktops to small workgroup cluster Not covered by MoU

Tier3physics

department

Desktop

Germany

USAUK

France

Italy

Taiwan

NordicCountries

Nether-lands

CERN Tier 0

Tier2

Lab a

Uni a

Lab c

Uni n

Lab m

Lab b

Uni bUni y

Uni x

grid for a physicsstudy group

SpainTier 1

grid for a regional group

Page 9: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

9

The CERN Data Centre in Numbers

• Data Centre Operations (Tier 0)– 24x7 operator support and System Administration services to support 24x7

operation of all IT services.– Hardware installation & retirement

• ~7,000 hardware movements/year; ~1800 disk failures/year

– Management and Automation framework for large scale Linux clusters

Fujitsu3%

Hitachi23% HP

0% Maxtor

0% Seagate15%

Western Digital

59%

Other0%

High Speed Routers(640 Mbps → 2.4 Tbps) 24

Ethernet Switches 350

10 Gbps ports 2000

Switching Capacity 4.8 Tbps

1 Gbps ports 16,939

10 Gbps ports 558

Racks 828

Boxes 11,728

Processors 15,694

Cores 64,238

HEPSpec06 482,507

Disks 64,109

Raw disk capacity (TiB) 63,289

Memory modules 56,014

Memory capacity (TiB) 158

RAID controllers 3,749

Tape Drives 160

Tape Cartridges 45000

Tape slots 56000

Tape Capacity (TiB) 34000

IT Power Consumption 2456 KW

Total Power Consumption 3890 KW

Helge Meinhard Virtualisation, clouds, IaaS - June 2011

AMD Opteron 6164 HE6%

Intel Xeon 51500% Intel

Xeon 5160

2% Intel Xeon

E53352% Intel

Xeon E5345

9% Intel Xeon

E54052%

Intel Xeon

E541011%

Intel Xeon

L54208%

Intel Xeon L552045%

Intel Xeon

L56300%

Intel Xeon

L564013%

Intel Xeon X56501%

Intel Xeon X56800%

Page 10: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

10

Functionality drill-down

• “Clusters” – sets of machines with identical configuration different from other clusters

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 11: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

11

Problem Statement (Phase I)

• Small clusters– Far too many clusters and too many managers– Small size of some clusters makes disruptive

upgrades very difficult• OS/software upgrades• HW life cycle management

– Many servers poorly used• Large clusters

– Effective, efficient management is a must• Virtualisation addresses part of these

problemVirtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 12: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

12

Phase I: CERN Virtualisation Infrastructure (1)• Addressing the “small cluster” problem• Custom virtual machines in the CERN computer centre

– VMs have a long-term lifetime of months/years

• User kiosk for requesting a VM in less than 30 mins• Based on Microsoft’s System Center Virtual Machine Manager

on top of Hyper-V– Enterprise class centralized management– Rich feature set:

• Allows grouping of hypervisors, with delegation of administrative privileges

• VM migration, High availability• Checkpoints• PowerShell Snap-In for administration / scripting

• Hardware implementation using ‘cells’ of blade servers and redundant iSCSI arrays

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 13: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

13

Phase I: CERN Virtualisation Infrastructure (2)• Why SCVMM/Hyper-V?

– Only cost-effective solution for CERN at the time offering required advanced management functionality

• Current status– Checkpointing implemented– Hypervisors upgraded to Win 2008 R2 SP1 – Dynamic memory allocation allowing for overcommitting memory

Virtualisation, clouds, IaaS - June 2011Helge MeinhardMar-10

Apr-10

May-10

Jun-10Jul-1

0

Aug-10

Sep-10

Oct-10

Nov-10

Dec-10

Jan-11

Feb-11

Mar-11

Apr-11

May-11

Jun-11Jul-1

1

Aug-11

Sep-11

Oct-11

Nov-11

Dec-11

Jan-12

Feb-12

0

500

1000

1500

2000

2500

42% Windows VMs

58% Linux VMs

Feb 2012: 2450 VMs on 350 hypervisorsNov 2010: 680 VMs on 170 hypervisors

Page 14: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

14

Phase I: Service Consolidation

• More than 600 Linux machines in CVI run as fully managed machines for physics services

– Installation, configuration, monitoring

• We offer managed CERN Linux VMs with (some) combinations of:– 1-4 CPUs– 1-8 GB memory– 100-2000 GB disk– 1Gbps paravirtualized network

• 100Mbps during installation

• CPU, disk and network are happily overcommitted– Typical physical CPU usage on hypervisors < 30%– Typical physical network usage < 2%– Real disk usage vs. committed capacity < 20%

• Memory is not overcommitted• Current statistics:

– 627 VMs– 1245 virtual CPUs– 3235 GB memory– 59TB disk used (out of 265TB allocated)

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 15: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

15

Phase I: lxcloud (1)

• Addressing the “large cluster” problem• Aims:

– Dynamic provisioning of resources to users of our large batch computing service

• SLC 5 vs. SLC 6• User group-specific customisations of environment

– Test provisioning of a generic cloud interface (EC2) to selected users

• Hardware: O(60) physical batch worker nodes (out of 4000) with local storage

• Fully managed SLC 6 machines with KVM and KSM

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 16: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

16

Phase I: lxcloud (2)

• Image repository, image distribution mechanism– Images for virtual batch servers derived from

fully managed “golden nodes”– User-supplied images for EC2 interface– Internal distribution to hypervisors via torrent-

like mechanism– Sharing images across (WLCG) sites discussed

in context of HEPiX• VM provisioning system: OpenNebula 3.2

– Looking at OpenStack (see later)Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 17: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

17

Phase II: New challenges

• CERN data centre is reaching its limits• IT staff numbers remain fixed but more

computing capacity is needed• Tools are high maintenance and becoming

increasingly brittle• Inefficiencies exist but root cause cannot

be easily identified

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 18: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

18

CERN Data Centre• More capacity needed for the processing of LHC data• For various reasons, not possible to provide additional capacity at CERN• 2010: calls for expression of interest among CERN member states; 2011:

call for tender; 2012: adjudication: Wigner Institute in Budapest/Hungary

• Timescales: Prototyping in 2012, testing in 2013, production in 2014• This will be a “hands-off” facility for CERN

– Only “smart hands” there, everything else done remotely

• Disaster Recovery for key services in primary CERN data centre becomes a realistic scenario

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 19: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

19

Usage Model(s)

• Various possible models of usage – sure to evolve– The less specific the hardware installed, the

easier to change function• Vision: Run both massively scaled services

(“kettle”) and carefully set-up special services (“pets”) as virtual machines on top of “kettle” style hypervisors

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 20: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

20

Infrastructure Tools Evolution (1)

• We had to develop our own toolset in 2002– Installation, configuration, monitoring

• Nowadays, – CERN compute capacity is no longer leading edge– Many options available for open source fabric management– We need to scale to meet the upcoming capacity increase

• If there is a requirement which is not available through an open source tool, we should question the need– If we are the first to need it, contribute it back to the open source

tool

• Large community out there taking the “tool chain” approach whose scaling needs match ours: O(100k) servers and many applications– Many small tools for specific purposes linked together

• Easy to exchange one tool with an alternative one

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 21: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

21

Infrastructure Tools Evolution (2)

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 22: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

22

Infrastructure Tools Evolution (3)

• Configuration management:– Using off the shelf components

• Puppet – configuration definition• Foreman – GUI and Data store• Git – version control• Mcollective – remote execution

– Integrated with• CERN Single Sign On• CERN Certificate Authority• Installation Server

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 23: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

23

Infrastructure as a Service

• Goals– Improve repair processes with virtualisation– More efficient use of our hardware– Better tracking of usage– Enable remote management for new data centre– Support potential new use cases (PaaS, Cloud)– Sustainable support model

• At scale for 2015– 15,000 servers– 90% of hardware virtualized– 300,000 VMs needed

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 24: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

24

Openstack

• Open source cloud software• Supported by 173 companies including

IBM, RedHat, Rackspace, HP, Cisco, AT&T, …• Vibrant development

community and ecosystem• Infrastructure as a Service to our scale• Started in 2010 but maturing rapidly

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 25: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

25

Openstack at CERN (1)

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Compute Scheduler

NetworkVolume

Registry Image

KEYSTONEHORIZON

NOVA

GLANCE

Page 26: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

26

Openstack at CERN (2)

• Multiple uses of IaaS– Server consolidation– Classic batch (single or multi-core)– Cloud VMs such as CERNVM

• Scheduling options– Availability zones for disaster recovery– Quality of service options to improve efficiency such as

build machines, public login services– Batch system scalability is likely to be an issue

• Accounting– Use underlying services of IaaS and Hypervisors for

reporting and quotas

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 27: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

27

Monitoring

• Action needed– >30 monitoring applications

• Number of producers: ~40k• Input data volume: ~280 GB per day

– Covering a wide range of different resources• Hardware, OS, applications, files, jobs, etc.

– Application-specific monitoring solutions• Using different technologies (including commercial tools)• Sharing similar needs: aggregate metrics, get alarms, etc

– Limited sharing of monitoring data• Hard to implement complex monitoring queries

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 28: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

28

Monitoring: New Architecture

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Messaging Broker

StorageConsumer

Producer Sensor

Storage andAnalysis Engine

OperationsTools

OperationsConsumers

ProducerSensor

ProducerSensor

Dashboards and APIs

Apollo

Lemon

Hadoop

Splunk

Page 29: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

29

Current tool snapshot(subject to change!)

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Jenkins

Koji, Mock

PuppetForeman

AIMS/PXEForeman

Yum repoPulp

Puppet stored config DB

mcollective, yum

JIRA

Lemon

git, SVN

Openstack Nova

Hardware database

Page 30: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

Timelines

Year What Actions

2012 Prepare formal project planEstablish IaaS in CERN Data CentreMonitoring Implementation as per WGMigrate lxcloud usersEarly adopters to use new tools

2013 LS 1New Data Centre

Extend IaaS to remote Data CentreBusiness ContinuityMigrate CVI usersGeneral migration to new tools with SLC6 and Windows 8

2014 LS 1 (to November)

Phase out legacy tools such as Quattor

Virtualisation, clouds and IaaS - June 2012 30Helge Meinhard

Page 31: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

31

Conclusions

• Remote T0 and other challenges require us to re-think the way we run the computer centre and our services

• Virtualisation has proved to be the right way forward (CVI/service consolidation and lxcloud)

• Now unifying on single tool (Openstack) and going much further– Coverage of machines and services– Tool chain for installation, configuration,

monitoring, IaaS– Proof of concept done rapidly, very successful

• People highly motivatedVirtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 32: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

32

More information

• HEPiX Agile Infrastructure Talks–http://cern.ch/go/99Ck

• Tier-0 Upgrade–http://cern.ch/go/NN98

• Other info or contacts…Helge.Meinhard (at) cern.ch

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 33: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/it

33

Acknowledgements

• Numerous colleagues and collaborators at CERN, including– Ian Bird– Tim Bell– Gavin McCance– Ulrich Schwickerath– Alexandre Lossent– Jose Castro Leon– Jan van Eldik– Belmiro Moreira

Virtualisation, clouds, IaaS - June 2011Helge Meinhard

Page 34: Virtualisation, Clouds and IaaS at CERN Helge Meinhard CERN, IT Department VTDC Delft (NL), 19 June 2012

Virtualisation, clouds, IaaS - June 2011 34

Jenkins

Koji, Mock

PuppetForeman

AIMS/PXEForeman

Yum repoPulp

Puppet stored

config DB

mcollective, yum

JIRA

Lemon

git, SVN

Openstack Nova

Hardware database

Thank you

Helge Meinhard