1 lhc requirements for grid middleware f.carminati, p.cerello, c.grandi, o.smirnova, j.templon,...

21
1 LHC requirements for GRID middleware F.Carminati, P.Cerello, C.Grandi, O.Smirnova, J.Templon, E.Van Herwijnen CHEP 2003 La Jolla, March 24-28, 2003

Upload: clifford-glenn

Post on 26-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

1

LHC requirements for GRID middleware 

F.Carminati, P.Cerello, C.Grandi, O.Smirnova, J.Templon, E.Van HerwijnenCHEP 2003La Jolla, March 24-28, 2003

2CHEP 2003, La JollaMarch 27, 2003

Why an HEP Common Application Layer (HEPCAL)?

EDG/WP8 started gathering LHC requirements in early 2002These were judged “vastly divergent” by EDG MW developers

And indeed they looked very different

The LCG commissioned an RTAG on HEP Common Use Cases

Review plans of GRID integration in the experiments Describe high level common GRID use cases for LHC

experiments Describe experiment specific use cases Derive a set of common requirements for GRID MW

RTAG delivered after four person-months of work Four 2.5 day meeting

3CHEP 2003, La JollaMarch 27, 2003

What we want from a GRID

OS & Net services

Bag of Services (GLOBUS)

DataGRID middlewarePPDG, GriPhyn, DataGRID

HEPVO common application layer

Earth Obs. Biology

ALICE ATLAS CMS LHCb

Specific application layer

WP9 WP 10

Mar

ch 9

2001

4CHEP 2003, La JollaMarch 27, 2003

What we have

OS & Net services

Bag of Services (GLOBUS)

Middleware

ALICE ATLAS CMS LHCb

Specific application layer

WP1 WP2 WP3 WP4 WP5Semantic gap

5CHEP 2003, La JollaMarch 27, 2003

How to proceed

CMS ATLAS

ALICE LHCb

CMS ATLAS

ALICE LHCb

Core common use case

6CHEP 2003, La JollaMarch 27, 2003

A proposal

OS & Net services

Bag of Services (GLOBUS)

Specific application layer

DataGRID middleware

WP1 WP2 WP3 WP4 WP5

Common use casesVO common application layer

If we manage to define

Middleware

WP1 WP2 WP3 WP4 WP5

It will be easier for them to arrive at

ALICE ATLAS CMS LHCb

7CHEP 2003, La JollaMarch 27, 2003

Why this is important?Experiments want to work on common LCG projects

We need a common set of requirements / use cases to define common deliverables

Several bodies (e.g. HICB, GLUE, LCG, MW projects…) expect clear requirements

Much more effective to provide a common set of use cases instead of four competing ones

The different GRID MW activities risk to diverge Common use cases could help them to develop coherent

solutions Or ideally complementary elements

8CHEP 2003, La JollaMarch 27, 2003

Rules of the gameAs much as you may like Harry Potter, he is not a good excuse!

If you cannot explain it to your mother-in-law, you did not undestand it yourself

If your only argument is “why not” or “we need it”, go back and think again

Say what you want, not how you think it must be done -- STOP short of architecture

9CHEP 2003, La JollaMarch 27, 2003

Files, DataSets and Catalogues

Two entities Catalogue: a updateable and transactional collection of data Dataset: a WORM collection of data

Atomic entities implemented as one or more files Live forever on the Grid unless explicitly deleted

Datasets have a forever VO-unique logical dataset name (LDN) Can associate a default access protocol to a dataset A DMS manages the association between LDN and PDN DS can reference to other DS (recursivity, longref’s or VDS)

Files of a DS are opened via POSIX calls or remote access protocolsThe GRID acts at the DS level, applications map objects to DS

GRID and application persistency collaborate in the navigation

Virtual DS are an extension of the DS The GRID knows how to produce it, algorithm, needed software and

DS Need a method to calculate creation cost of physical copies

10CHEP 2003, La JollaMarch 27, 2003

CataloguesCollection of files that can be updated

Must be fully transactional

Contain information about objects, but not the objects themselves The Replica Catalogue is an example

The GRID implements the catalogues, no assumption on technology

Replication, consistency…

Grid-managed catalogues User inserts/deletes information mostly indirectly and cannot

create/delete DS metadata (can have a user defined part), Jobs, Software, Grid

users

User-defined catalogues Managed by the user via GRID facilities Identified by a location-independent “logical name”

More discussion needed (replication… ) Only very basic use cases for user-defined catalogues

11CHEP 2003, La JollaMarch 27, 2003

JobsSingle invocation of the Grid submission use case

At least input data, executable(s) to run and output data

Organized jobs -- optimisation feasibleChaotic jobs -- optimisation hard

May or may not be possible to specify the datasets upfront

Interactivity not treatedJobs are combined into “chains”, “workflows”, or “pipelines”Embarrassing parallelism, but job splitting is an open problem

Without user assistance (DAG?) With user assistance (plug-in) Process spawning under WMS control, results communicated back and

joined

Three classes of GRID job identifiers Basic, composite and production

The GRID provides a job catalogue indexed by job ID Can be queried and users may add information to it The job ID is part of the metadata of the DS created by the job

12CHEP 2003, La JollaMarch 27, 2003

Data navigation & accessAn event is composed of objects contained in one or more DS

Unique Event Identifier (EvtId) present in all derived products

DS are located by queries to the DMS catalogue returning LDNs “give me all DS with events between 22/11/2007 and 18/07/2008

with XYZ trigger” Read/write, indexed by the LDN (some keys are reserved for the

GRID)

Users access/modify DS meta-information in the catalogue Predefined attributes have meaning that is potentially different for

each VO The schema of the catalogue is defined at the VO creation Users can add and remove attributes

Condition data options Simple DS (snapshots of DBs), GRID catalogues or read/Write files on

the GRID (outside HEPCAL)

Weak confidentiality requirements Control unauthorised modification or deletion Read-only access subject to experiment policy, users may want

private GRID DS

13CHEP 2003, La JollaMarch 27, 2003

Use casesPresented in rigorous (?) tabular description

Easy to translate to a formal language such as UMLTo be implemented by a “single call”

From the command shell, C++ API or Web portal

USE CASE: OBTAIN GRID AUTHORISATION

Identifier UC#gridauth

Goals in Context Obtain authorisation to access the Grid

Actors User

Triggers Need to access the Grid

Included UseCases

Specialised UseCases

Pre-conditions The user has either a valid account on a computer connected to theGrid, or has access via the Web to a server that can execute Gridcommands on her behalf;

Post-conditions User can perform a Grid login as a member of a VO;

Basic Flow 1 User submits a request for authorisation to use the Grid (eithervia a web interface or a command line)

2 The access authority manager confirms his authorisation as amember of a VO;

3 User receives the instructions and any necessary physical token;

4 Following the instructions the user properly configures hispersonal workspace;

Devious Flow(s) Access authority manager refuses the request. Necessary configurationcannot be done according to instructions;

Importance andFrequency

Done when a Grid user wants to become member of a VO to haveaccess to the Grid resources of that VO. In principle once per user andVO, but very high importance.

AdditionalRequirements

14CHEP 2003, La JollaMarch 27, 2003

Use casesDS management use cases

DS metadata update DS metadata access DS registration to the Grid VDS declaration VDS materialization DS upload User-defined catalogue creation Data set access Dataset transfer to non-Grid

storage Dataset replica upload to the

Grid Data set access cost evaluation Data set replication Physical data set instance

deletion Data set deletion (complete) User defined catalogue deletion

(complete) Data retrieval from remote

Datasets Data set verification Data set browsing Browse condition database

General use cases Obtain Grid authorisation Ask for revocation of Grid

authorisation Grid login Browse Grid resources

Job management use cases Job catalogue update Job catalogue query Job submission Job Output Access or Retrieval Error Recovery for Aborted or

Failing Production Jobs Job Control Steer job submission Job resource estimation Job environment modification Job splitting Production job Analysis 1 Data set transformation Job monitoring Simulation Job Experiment software

development for the Grid

15CHEP 2003, La JollaMarch 27, 2003

Use casesDMS grants access to a physical replica of a DS file

Direct access, local or SE replication, materialisation The user gives an LDN gets a file ID to pass to an open call

A physical DS copy appears on a SE in four different ways Uploading it to the Grid (first DS upload) Copying it from another SE (DS replication) Requesting a virtual dataset (VDS declaration and materialization) Importing directly from local storage (DS import)

The DMS tracks DS access for monitoring and optimisationJobs are submitted to the Grid WMS

Program to be run, (optional) input and output DS, environment requirements (operating system, installed software) and needed resources

The user must be able to override any choice of the WMS

Dynamic job directory reclaimed when the files are safely handled The user stores information in the job catalogue at submission, running time and after run

16CHEP 2003, La JollaMarch 27, 2003

VO management use casesNot clear how privileges are shared for VO management

Grid operation centre, local system managers etc.

Actions, which may evolve into use cases Configuring the VO

DS metadata catalogue (either initially or reconfiguring) Job catalogue (either initially or reconfiguring) User profile (if this is possible at all on a VO basis) Adding or removing VO elements, e.g. computing elements,

storage elements, DMS and WMS and the like VO elements, including quotas, privileges etc

Managing Users Add and remove users to/from the VO Modify the user or group information, including privileges, quotas,

priorities and authorisations for the VO VO wide resource reservation

Release unused reserved resources Associate reserved resources with a particular job

VO wide resource allocation to users Condition publishing Software publishing

17CHEP 2003, La JollaMarch 27, 2003

Answers to HEPCALVery detailed answer from EDG

Several use cases declared addressed by the project All virtual-data use cases, Error recovery for jobs use

case and Experiment software publishing not on the map

Less detailed answer from PPDG/US PPDG more advanced with virtual data functionality Some of HEPCAL left to experiment layers Nice to have experiments agree on one

implementation May be just a matter of how people are counted: US

project give people to experiments, obviously things are done in experiments

“Other” US Grid projects mentioned, but less detail Response hard to evaluate, since hasn’t undergone

review by people using middleware

18CHEP 2003, La JollaMarch 27, 2003

Comments to the answers

Mostly “paper” analysisSome implementations achieved the functionality, but

Taken many more steps than in HEPCAL Experiment layers must provide the glue, maintain

additional information to assist the MW or track interface or behavioural changes for all components

Often didn’t implement use case, implemented several “more elemental” use casesMore detail asked

Our big effort was NOT to give too many details

Very difficult to establish a dialectic procedure

19CHEP 2003, La JollaMarch 27, 2003

edg-dsupload -s source_file –l LDN -d targetSE

HEPCAL request

IS_HOST=lxshare0382.cern.ch

IS_PORT=2170

while getopts ":s:l:d:v:" opt; do

case $opt in

s ) SOURCEFILE=$OPTARG ;;

l ) LDN=$OPTARG ;;

d ) TARGSE=$OPTARG ;;

v ) VONAME=$OPTARG ;;

esac

done

GDMP_CONFIG_FILE=/opt/edg/etc/$VONAME/gdmp.conf ; export GDMP_CONFIG_FILE

destpath=$(ldapsearch -h $IS_HOST -p $IS_PORT -x -b "seId=$TARGSE,o=grid" | \

gawk -F : '/^SEvo.*'$VONAME'/ { print $3 }')

if [ $(dirname $SOURCEFILE) = "." ] ; then

SOURCEFILE=$(pwd)/$SOURCEFILE

fi

globus-url-copy file://$SOURCEFILE gsiftp://$TARGSE/$destpath/$LDN

gdmp_register_local_file -S $TARGSE -R -p $destpath/$LDN

sleep 10

gdmp_publish_catalogue -S $TARGSE -C

Implementation

•Grid information provided by user

•Glue code supplied by user

•“Other” middleware called by user

•Middleware specific to this use case

Example: Upload Grid Dataset

20CHEP 2003, La JollaMarch 27, 2003

GAGA Grid Application Group has been setup by LCG to follow up on HEPCAL

Semi permanent and reporting to LCG

Both US and EU representatives from experiments and GRID projectHEPCAL II already scheduled before SummerDiscussion on the production of test jobs / code fragments / examples to validate against use cases

21CHEP 2003, La JollaMarch 27, 2003

ConclusionVery interesting and productive work

320 google hits (with moderate filtering)!

It prompted a constructive dialogue with MW projects

And between US and EU projects

It provides a solid base to develop a GRID architecture

Largely used by EDG ATF

It proves that common meaningful requirements can be produced by different experimentsThe dialogue with the MW projects has to continue, but it is very labor intensive