1 boinc-vm and volunteer cloud computing ben segal / cern and: predrag buncic, jakob blomer, pere...

25
1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan / CERN Jie Wu, Wenjing Wu / CAS-IHEP, Beijing Rohit Yadav / Banares Hindu University David Garcia Quintas, Yushu Yao / Lawrence Berkeley Laboratory Jarno Rantala / Tampere University of Technology David Weir / Imperial College, London 6th BOINC Workshop, London August 31, 2010

Upload: mitchell-small

Post on 13-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

1

BOINC-VM and Volunteer Cloud Computing

Ben Segal / CERN

and:

Predrag Buncic, Jakob Blomer, Pere Mato / CERN

Carlos Aguado Sanchez, Artem Harutyunyan / CERN

Jie Wu, Wenjing Wu / CAS-IHEP, Beijing

Rohit Yadav / Banares Hindu University

David Garcia Quintas, Yushu Yao / Lawrence Berkeley Laboratory

Jarno Rantala / Tampere University of Technology

David Weir / Imperial College, London

6th BOINC Workshop, London August 31, 2010

Page 2: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

2

LHC@home

• Calculates stability of proton orbits in CERN’s new LHC accelerator

• System is nonlinear and unstable so numerically very sensitive. Hard to get identical results on all platforms

• About 40 000 users, 70 000 PC’s… over 1500 CPU years of processing

• Objectives: extra CPU power and raising public awareness of CERN and the LHC - both successfully achieved.

• Started as an outreach project for CERN 50th Anniversary 2004; used for Year of Physics (Einstein Year) 2005

Page 3: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

3

SixTrack programSixTrack is a Fortran program by F. Schmidt, based on DESY programSixTrack simulates 60 particles for 100k-1M LHC orbitsCan include measured magnet parameters, beam-beam interactionsLHC@home revealed reproducibility issues, solved by E. McIntosh

Phase space images of a particle for a stable orbit (left) and unstable chaotic orbit (right).

Page 4: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

4

BOINC & LHC physics code

Problems with “normal” BOINC used for LHC physics: 1) A project’s application(s) must be ported to each volunteer platform: most

clients run Windows, but CERN runs Scientific Linux and porting to Windows is impractical. Each experiment has its own environment.

1) The physics code changes frequently so apps must be updated often.

2) The projects’ work must be fed into the BOINC server for distribution, and results recovered. Job submission scripts must be developed for this, but CERN physics experiments won’t change their current setups.

1) Job management is primitive in BOINC, whereas physicists want to know where their jobs are and be able to manage them.

Page 5: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

5

BOINC & Virtualization

Summary of the basic approach • Solve client application porting problems using VM’s• Use “VMwrapper” to link VM’s to BOINC core client & server• Provide a host <-> guest-VM communication/control layer

.. and in addition ..

• Solve the image size problem and physics job production interfaces using the CernVM project together with the Co-Pilot adapter system.

Page 6: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

6

CernVM Motivation

• Software @ LHC Experiment(s)– Millions of lines of code– Complicated software installation/update/configuration procedure,

different from experiment to experiment– Only a tiny portion of it is really used at runtime in most cases– Often incompatible or lagging behind OS versions on desktop/laptop

• Multi core CPUs with hardware support for virtualization– Making laptop/desktop ever more powerful and underutilised

• Using virtualization and extra cores to get extra comfort– Zero effort to install, maintain and keep up to date the experiment

software– Reduce the cost of software development by reducing the number of

compiler-platform combinations– Decouple application lifecycle from evolution of system infrastructure

Page 7: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

7

How does CernVM work?

• It is a “thin” Virtual Software Appliance that is auto-configured and auto-updated while in use by each LHC experiment

• This appliance – provides a complete, portable and easy to configure user environment for

developing and running LHC data analysis locally and on Grids– is independent of physical software and hardware platforms (Linux, Windows,

MacOSX and variants all supported)

• This minimizes the number of platforms (compiler-OS combinations) on which experiment software needs to be supported and tested, thus reducing the overall cost of LHC software maintenance

• All this is being done – in collaboration with the LHC experiments and OpenLab– reusing existing solutions where possible

Page 8: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

8

“Thin” Software Appliance

JeOS (based on rPath Linux)

rAA

KERNELfuse

module

FILESYSTEM

rAAplugin

Extra Libs & Apps

Cache

HTTPD

SoftwareRepository

10 GB1 GB0.1 GB

LAN/WAN(HTTP)

Page 9: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

9

Key Building Blocks

• rBuilder from rPath (www.rpath.org)– A tool to build VM images for various

virtualization platforms • rPath Linux 1

– Slim Linux OS binary compatible with Red Hat / SLC4

• rAA - rPath Linux Appliance Agent– Web user interface– XMLRPC API

• Can be fully customized and extended by means of plugins (401)

• CVMFS - CernVM file system– Read-only file system optimized for

software distribution• Aggressive caching

– Operational in offline mode• For as long as you stay within the cache

Build types– Installable CD/DVD– Stub Image– Raw File System Image– Netboot Image– Compressed Tar File– Demo CD/DVD (Live CD/DVD)– Raw Hard Disk Image– VMware ® Virtual Appliance– VMware ® ESX Server Virtual

Appliance– Microsoft ® VHD Virtual Appliance– Xen Enterprise Virtual Appliance– Virtual Iron Virtual Appliance– Parallels Virtual Appliance– Amazon Machine Image– Update CD/DVD– Appliance Installable ISO– Sun Virtual Box Image

Page 10: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

10

CernVM File System

App

On same host:

/opt/lcg-> /chirp/localhost/opt/lcg

open(“/opt/lcg”)

On File Server

/opt/lcg -> /grow/host/opt/lcg

Cache

Kernel

NFS LFS FUSE

CernVMFuse

!Cache

Page 11: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

11

CernVM File System

Page 12: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

12

Co-Pilot: CernVM’s Cloud Interface• CernVM cloud worker nodes are sets of VM’s created on clusters or Grids or

Amazon EC2, etc. • They may be “untrusted”.• They each run a CernVM image with a Co-Pilot Agent inside.

( In our case nodes are simply … BOINC Volunteer PC’s )

• The Co-Pilot Agents get and return jobs from an experiment’s own job management system (e.g. ATLAS/PanDA, ALICE/AlieN, etc.)

• BOINC work-unit scheduling is not used at all.

• The BOINC volunteer just sees a (very long-running) task which returns credit regularly, in proportion to CPU resources used.

Page 13: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

13

CernVM Co-Pilot architectureLHC experiments’ Grid Job Production Management System

Co-Pilot Agents Co-Pilot Adapters

PanDA Grid

AliEnStorage Adapter

PanDAStorage Adapter

AliEn Job Adapter

PanDAJob Adapter

Jabber/XMPPMessaging Network

Key Manager

The use of Jabber/XMPP allows to scale the system in case of high load by just adding new Adapter instances

Page 14: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

14

• Job Adapter– Receives job execution requests from an Agent– Contacts Grid services (e.g. AliEn/PanDA/Dirac) and gets a job for

execution– Gets the necessary input files from the Grid file catalogue– Makes input files available for the Agent via Chirp– Instructs Agent to download the input – Instructs Agent to execute job command (e.g. ‘root myMacro.C’)

• Storage Adapter– Receives job completion request – Provides Agent with the Chirp directory to upload job output– Puts the job output to the Grid file catalogue– Contacts Grid services and sets the final status of the job

• NOTE: Grid credentials are handled by Adapters, not sent to Agents

The work of CernVM Co-Pilot Adapters

Page 15: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

15 15

Job Adapter

CernVM Co-Pilot job execution (ALICE)

0. Send host JDL (e.g. memory and disk info)

1. Append framework- specific information and request a job

5. Register output files, mark job as DONE

4. When the job is done send back the output files (and, optionally, validation of result)

2. Send user job JDL from Task Queue

AliEn for ALICE

Storage Adapter

3. Send input files and commands for execution

Key Manager service may be used to secure the communication

For the detailed description of the communication protocol please see:https://cernvm.cern.ch/project/trac/cernvm/wiki/CoPilotProtocol

Page 16: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

16 16

CernVM Co-Pilot job execution (ATLAS)

For the detailed description of the communication protocol please see:https://cernvm.cern.ch/project/trac/cernvm/wiki/CoPilotProtocol

Page 17: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

17

BOINC & Virtualization

So CernVM and Co-Pilot allow us to solve our three problems (porting, job preparation, and job management) but to run guest VM’s within a BOINC host we need a cross-platform solution for (at least):

• control of (multiple) VM’s on a host, including: Start|Stop|pause|resume|reset|poweroff|savestate as well as (for the general case):

• command execution on guest VM’s• file transfers from guests to host (and reverse)

Page 18: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

18

BOINC & Virtualization

Details of the “VM controller” package: (developed by David Garcia Quintas / LBL, ex-CERN)

• Cross-platform support - based on Python (Windows, MacOSX, Linux… ).• Uses Python packages: Netifaces, Stomper, Twisted, Zope, simplejson, Chirp…

• Does asynchronous message passing between host and guest entities via a broker (e.g. ActiveMQ). Messages are XML/RPC based.

• Supports:• control of multiple VM’s on a host, including: Start|Stop|pause|resume|reset|poweroff|savestate• command execution on guest VM’s• file transfers from guests to host (and reverse) using Chirp

Page 19: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

19

Host to VM Guest communication

Page 20: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

20

BOINC & Virtualization

Details of the new BOINC “VMwrapper”: (developed by Jarno Rantala / CERN openlab student, 2009)

• Written in Python, therefore multi-platform• Uses “VM controller” infrastructure described above• Fully back-compatible with original BOINC Wrapper

• Supports standard BOINC job.xml files• For VM case, supports extra tags in the job.xml file

• Able to measure the VM guest resources and issue credit requests• Uses BOINC “trickle message” facility for very long-running work units

Page 21: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

21

BOINC VMwrapper architecture

Page 22: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

22

BOINC & Virtualization

The above BOINC-VM “General Solution” is being finished and packaged as part of a Bachelor Thesis project by:

Rohit Yadav / CERN summer student, 2010with help from: David Garcia Quintas / LBL and ex-CERN

• Has general host <-> guest-VM functionality (not restricted to BOINC)• Will support VMware as well as VirtualBox hypervisors• Supports multiple VM’s per host• Supports Linux, Windows and Mac OSX platforms• Etc, etc.

Page 23: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

23

BOINC & Virtualization

Details of the new BOINC “CernVMwrapper”: (developed by Jie Wu / CERN openlab student, 2010)

• Written in C++, easy to port to Linux, Windows and Mac OSX

• Based on original BOINC Wrapper, with enough BOINC support to run our LHC Cloud application using CernVM and Co-Pilot

• Minimal VM functionality, enough to launch and control a VirtualBox VM• Uses the VirtualBox COM programming interface

• Able to measure the VM guest resources and issue credit requests• Uses BOINC “trickle message” facility for very long-running work units

Page 24: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

24

BOINC Virtual Cloud

Summary of the method: • New BOINC wrapper (VMwrapper) used to start a guest VM in a BOINC

client PC, and execute a CernVM image.• The CernVM image has all LHC software and CoPilot code.• Host-to-VM communication/control provided for BOINC client.• The new VMwrapper gives BOINC client and server all the functions they

need - they are unaware of VM’s.• The CoPilot connection allows LHC job production to proceed without

changes, using experiments’ own job control systems.

Page 25: 1 BOINC-VM and Volunteer Cloud Computing Ben Segal / CERN and: Predrag Buncic, Jakob Blomer, Pere Mato / CERN Carlos Aguado Sanchez, Artem Harutyunyan

25

Building a Volunteer Cloud

• Final Summary:

• Solved porting problem to all client platforms• Solved image size problem• Solved job production interface problem

• All done without changing existing BOINC infrastructure (client or server side)

• All done without changing physicists’ code or procedures

• We have built a “Volunteer Cloud” … demo follows