the data bridge laurence field it/sdc 6 march 2015

14
IT-SDC : Support for Distributed Computing The Data Bridge Laurence Field IT/SDC 6 March 2015

Upload: marjorie-jordan

Post on 29-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Data Bridge Laurence Field IT/SDC 6 March 2015

IT-SDC : Support for Distributed Computing

The Data Bridge

Laurence FieldIT/SDC

6 March 2015

Page 2: The Data Bridge Laurence Field IT/SDC 6 March 2015

2IT-SDC

BOINC and Virtualization

Page 3: The Data Bridge Laurence Field IT/SDC 6 March 2015

3IT-SDC

Test4Theory Model

Avoid restarting the VM for every job Reduces CVMFS related network traffic

CoPilot support challenges Dependencies not available in the standard repositories

Reduce the operational cost New standardized components available for some functions

Separation of VM management and job management Inline with the cloud model

Can reuse cloud related tooling

VM

BOINC ServerVolunteer

Agent

Job Wrapper

Co Pilot

Job Agent

Storage Agent

Job Description

Data I/O

Page 4: The Data Bridge Laurence Field IT/SDC 6 March 2015

4IT-SDC

The Challenge

Workload Manager

VO(X509/VOMS)

VM

Volunteer

Job Wrapper

GridGrid

InfrastructureVolunteer

BOINC Auth

Page 5: The Data Bridge Laurence Field IT/SDC 6 March 2015

5IT-SDC

Authentication

How to authenticate BOINC users? In the VM, credential provided via /dev/fd0

BOINC_ID BOINC_AUTHENTICATOR

BOINC Project DB is the user Identity Provider (IDP) MySQL User Table

mod_auth_mysql Maps username/password to DB table

AuthMysqlUserTable user AuthMySQLNameField id AuthMySQLPasswordField authenticator

Enables reuse of apache-based HTTP technology

Page 6: The Data Bridge Laurence Field IT/SDC 6 March 2015

6IT-SDC

The Architecture

Workload ManagerMessaging

Service

Data Bridge

VO(X509/VOMS) Infrastructure

VolunteerBOINC Auth

VM

Volunteer

Agent

Job Wrapper

Job Description

FTS

GridGrid

PullPushPluginData I/O

Page 7: The Data Bridge Laurence Field IT/SDC 6 March 2015

7IT-SDC

The Data Bridge

Spans authentication domains BOINC user’s credential Grid x509 credentials

Scalable data I/O With sandboxing capabilities

Data isolation

Simple apache-based prototype Supports HTTP PUT/GET

mod_auth_mysql to validate BOINC user’s credential mod_auth_ssl to validate WMS x509 credential

HTTP Federation Possibility to reuse standard DM tools

Page 8: The Data Bridge Laurence Field IT/SDC 6 March 2015

8IT-SDC

Dynamic HTTP Federations

Dynafed implements federated storage over HTTP In testing in LHCb and Canada (ATLAS) Federates WebDAV or S3 enabled storage systems Apache front end

Can be used as a data bridge S3 storage backend(s) Acts as a security gateway between X509 or BOINC Auth

Clients then redirected directly to the storage Great scalability potential

Global system, smart replica selection (availability, proximity)

http://svnweb.cern.ch/trac/lcgdm/wiki/Dynafeds

Page 9: The Data Bridge Laurence Field IT/SDC 6 March 2015

9IT-SDC

Apache

The Data Bridge

ssl

FTS

S3S3

mysql

WMS BOINCUser

PUT/GET PUT/GET

HTTP redirect & sign

HTTP redirect & sign

PUT/GET PUT/GET

GridGrid

DynaFed

Page 10: The Data Bridge Laurence Field IT/SDC 6 March 2015

10IT-SDC

Message Queue

Messaging service does not support BOINC authentication Not clear if it is possible or worthwhile to provide functionality

Standard apache Web server approach mod_auth_mysql to validate BOINC user’s credential mod_auth_ssl to validate WMS x509 credential

Two simple cgi scripts put-job.cgi get-job.cgi

Simple file-based queue python-dirq

Job descriptions from the WM Supports arbitrary file types

Garbage in, Garbage out Extensible

Web Serverdirq

put

Get

Page 11: The Data Bridge Laurence Field IT/SDC 6 March 2015

11IT-SDC

Implementation

Workload Manager

Message Queue

Data Bridge

VO(X509/VOMS) Infrastructure

VolunteerBOINC Auth

VM

Volunteer

Agent

Job Wrapper

GET

PUTPlugin

PUT

FTS

S3S3

GridGrid

Page 12: The Data Bridge Laurence Field IT/SDC 6 March 2015

12IT-SDC

Adoption

Building upon vLHC@home and the data bridge as a platform Require:

WM to POST job description to the data bridge message queue Stage input data to the data bridge’s input bucket (if needed) CernVM3 Image

Contextualized including CVMFS configuration Credentials read from /dev/fd0

BOINC_ID and BOINC_AUTHENTICATOR

Job Agent GET the job description GET the input from the data bridge Run job PUT output on the data bridge

Read data from the data bridge’s output bucket Similar to HLT

Fitter bad data etc.

Page 13: The Data Bridge Laurence Field IT/SDC 6 March 2015

13IT-SDC

Rollout Plans

CMS@Home Pioneered the adoption of the data bridge Will hopefully enter the beta testing phase soon

Test4Theory Plan to migrate to the data bridge within the next 2

months To address co-pilot support issues

Beauty@Home Currently integrating the data bridge

Will solve their x509 credential distribution issue Open the project up to the public

Page 14: The Data Bridge Laurence Field IT/SDC 6 March 2015

14IT-SDC

Summary

The Data Bridge spans auth domains Grid and Volunteer computing

Reuses HTTP federation component for S3 Added BOINC authentication

A simple message delivery function For the job description

Just provide an image along with a job agent And interact with the data bridge

Towards a platform for volunteer computing