1 lhc requirements for grid middleware f.carminati, p.cerello, c.grandi, o.smirnova, j.templon,...
TRANSCRIPT
1
LHC requirements for GRID middleware
F.Carminati, P.Cerello, C.Grandi, O.Smirnova, J.Templon, E.Van HerwijnenCHEP 2003La Jolla, March 24-28, 2003
2CHEP 2003, La JollaMarch 27, 2003
Why an HEP Common Application Layer (HEPCAL)?
EDG/WP8 started gathering LHC requirements in early 2002These were judged “vastly divergent” by EDG MW developers
And indeed they looked very different
The LCG commissioned an RTAG on HEP Common Use Cases
Review plans of GRID integration in the experiments Describe high level common GRID use cases for LHC
experiments Describe experiment specific use cases Derive a set of common requirements for GRID MW
RTAG delivered after four person-months of work Four 2.5 day meeting
3CHEP 2003, La JollaMarch 27, 2003
What we want from a GRID
OS & Net services
Bag of Services (GLOBUS)
DataGRID middlewarePPDG, GriPhyn, DataGRID
HEPVO common application layer
Earth Obs. Biology
ALICE ATLAS CMS LHCb
Specific application layer
WP9 WP 10
Mar
ch 9
2001
4CHEP 2003, La JollaMarch 27, 2003
What we have
OS & Net services
Bag of Services (GLOBUS)
Middleware
ALICE ATLAS CMS LHCb
Specific application layer
WP1 WP2 WP3 WP4 WP5Semantic gap
5CHEP 2003, La JollaMarch 27, 2003
How to proceed
CMS ATLAS
ALICE LHCb
CMS ATLAS
ALICE LHCb
Core common use case
6CHEP 2003, La JollaMarch 27, 2003
A proposal
OS & Net services
Bag of Services (GLOBUS)
Specific application layer
DataGRID middleware
WP1 WP2 WP3 WP4 WP5
Common use casesVO common application layer
If we manage to define
Middleware
WP1 WP2 WP3 WP4 WP5
It will be easier for them to arrive at
ALICE ATLAS CMS LHCb
7CHEP 2003, La JollaMarch 27, 2003
Why this is important?Experiments want to work on common LCG projects
We need a common set of requirements / use cases to define common deliverables
Several bodies (e.g. HICB, GLUE, LCG, MW projects…) expect clear requirements
Much more effective to provide a common set of use cases instead of four competing ones
The different GRID MW activities risk to diverge Common use cases could help them to develop coherent
solutions Or ideally complementary elements
8CHEP 2003, La JollaMarch 27, 2003
Rules of the gameAs much as you may like Harry Potter, he is not a good excuse!
If you cannot explain it to your mother-in-law, you did not undestand it yourself
If your only argument is “why not” or “we need it”, go back and think again
Say what you want, not how you think it must be done -- STOP short of architecture
9CHEP 2003, La JollaMarch 27, 2003
Files, DataSets and Catalogues
Two entities Catalogue: a updateable and transactional collection of data Dataset: a WORM collection of data
Atomic entities implemented as one or more files Live forever on the Grid unless explicitly deleted
Datasets have a forever VO-unique logical dataset name (LDN) Can associate a default access protocol to a dataset A DMS manages the association between LDN and PDN DS can reference to other DS (recursivity, longref’s or VDS)
Files of a DS are opened via POSIX calls or remote access protocolsThe GRID acts at the DS level, applications map objects to DS
GRID and application persistency collaborate in the navigation
Virtual DS are an extension of the DS The GRID knows how to produce it, algorithm, needed software and
DS Need a method to calculate creation cost of physical copies
10CHEP 2003, La JollaMarch 27, 2003
CataloguesCollection of files that can be updated
Must be fully transactional
Contain information about objects, but not the objects themselves The Replica Catalogue is an example
The GRID implements the catalogues, no assumption on technology
Replication, consistency…
Grid-managed catalogues User inserts/deletes information mostly indirectly and cannot
create/delete DS metadata (can have a user defined part), Jobs, Software, Grid
users
User-defined catalogues Managed by the user via GRID facilities Identified by a location-independent “logical name”
More discussion needed (replication… ) Only very basic use cases for user-defined catalogues
11CHEP 2003, La JollaMarch 27, 2003
JobsSingle invocation of the Grid submission use case
At least input data, executable(s) to run and output data
Organized jobs -- optimisation feasibleChaotic jobs -- optimisation hard
May or may not be possible to specify the datasets upfront
Interactivity not treatedJobs are combined into “chains”, “workflows”, or “pipelines”Embarrassing parallelism, but job splitting is an open problem
Without user assistance (DAG?) With user assistance (plug-in) Process spawning under WMS control, results communicated back and
joined
Three classes of GRID job identifiers Basic, composite and production
The GRID provides a job catalogue indexed by job ID Can be queried and users may add information to it The job ID is part of the metadata of the DS created by the job
12CHEP 2003, La JollaMarch 27, 2003
Data navigation & accessAn event is composed of objects contained in one or more DS
Unique Event Identifier (EvtId) present in all derived products
DS are located by queries to the DMS catalogue returning LDNs “give me all DS with events between 22/11/2007 and 18/07/2008
with XYZ trigger” Read/write, indexed by the LDN (some keys are reserved for the
GRID)
Users access/modify DS meta-information in the catalogue Predefined attributes have meaning that is potentially different for
each VO The schema of the catalogue is defined at the VO creation Users can add and remove attributes
Condition data options Simple DS (snapshots of DBs), GRID catalogues or read/Write files on
the GRID (outside HEPCAL)
Weak confidentiality requirements Control unauthorised modification or deletion Read-only access subject to experiment policy, users may want
private GRID DS
13CHEP 2003, La JollaMarch 27, 2003
Use casesPresented in rigorous (?) tabular description
Easy to translate to a formal language such as UMLTo be implemented by a “single call”
From the command shell, C++ API or Web portal
USE CASE: OBTAIN GRID AUTHORISATION
Identifier UC#gridauth
Goals in Context Obtain authorisation to access the Grid
Actors User
Triggers Need to access the Grid
Included UseCases
Specialised UseCases
Pre-conditions The user has either a valid account on a computer connected to theGrid, or has access via the Web to a server that can execute Gridcommands on her behalf;
Post-conditions User can perform a Grid login as a member of a VO;
Basic Flow 1 User submits a request for authorisation to use the Grid (eithervia a web interface or a command line)
2 The access authority manager confirms his authorisation as amember of a VO;
3 User receives the instructions and any necessary physical token;
4 Following the instructions the user properly configures hispersonal workspace;
Devious Flow(s) Access authority manager refuses the request. Necessary configurationcannot be done according to instructions;
Importance andFrequency
Done when a Grid user wants to become member of a VO to haveaccess to the Grid resources of that VO. In principle once per user andVO, but very high importance.
AdditionalRequirements
14CHEP 2003, La JollaMarch 27, 2003
Use casesDS management use cases
DS metadata update DS metadata access DS registration to the Grid VDS declaration VDS materialization DS upload User-defined catalogue creation Data set access Dataset transfer to non-Grid
storage Dataset replica upload to the
Grid Data set access cost evaluation Data set replication Physical data set instance
deletion Data set deletion (complete) User defined catalogue deletion
(complete) Data retrieval from remote
Datasets Data set verification Data set browsing Browse condition database
General use cases Obtain Grid authorisation Ask for revocation of Grid
authorisation Grid login Browse Grid resources
Job management use cases Job catalogue update Job catalogue query Job submission Job Output Access or Retrieval Error Recovery for Aborted or
Failing Production Jobs Job Control Steer job submission Job resource estimation Job environment modification Job splitting Production job Analysis 1 Data set transformation Job monitoring Simulation Job Experiment software
development for the Grid
15CHEP 2003, La JollaMarch 27, 2003
Use casesDMS grants access to a physical replica of a DS file
Direct access, local or SE replication, materialisation The user gives an LDN gets a file ID to pass to an open call
A physical DS copy appears on a SE in four different ways Uploading it to the Grid (first DS upload) Copying it from another SE (DS replication) Requesting a virtual dataset (VDS declaration and materialization) Importing directly from local storage (DS import)
The DMS tracks DS access for monitoring and optimisationJobs are submitted to the Grid WMS
Program to be run, (optional) input and output DS, environment requirements (operating system, installed software) and needed resources
The user must be able to override any choice of the WMS
Dynamic job directory reclaimed when the files are safely handled The user stores information in the job catalogue at submission, running time and after run
16CHEP 2003, La JollaMarch 27, 2003
VO management use casesNot clear how privileges are shared for VO management
Grid operation centre, local system managers etc.
Actions, which may evolve into use cases Configuring the VO
DS metadata catalogue (either initially or reconfiguring) Job catalogue (either initially or reconfiguring) User profile (if this is possible at all on a VO basis) Adding or removing VO elements, e.g. computing elements,
storage elements, DMS and WMS and the like VO elements, including quotas, privileges etc
Managing Users Add and remove users to/from the VO Modify the user or group information, including privileges, quotas,
priorities and authorisations for the VO VO wide resource reservation
Release unused reserved resources Associate reserved resources with a particular job
VO wide resource allocation to users Condition publishing Software publishing
17CHEP 2003, La JollaMarch 27, 2003
Answers to HEPCALVery detailed answer from EDG
Several use cases declared addressed by the project All virtual-data use cases, Error recovery for jobs use
case and Experiment software publishing not on the map
Less detailed answer from PPDG/US PPDG more advanced with virtual data functionality Some of HEPCAL left to experiment layers Nice to have experiments agree on one
implementation May be just a matter of how people are counted: US
project give people to experiments, obviously things are done in experiments
“Other” US Grid projects mentioned, but less detail Response hard to evaluate, since hasn’t undergone
review by people using middleware
18CHEP 2003, La JollaMarch 27, 2003
Comments to the answers
Mostly “paper” analysisSome implementations achieved the functionality, but
Taken many more steps than in HEPCAL Experiment layers must provide the glue, maintain
additional information to assist the MW or track interface or behavioural changes for all components
Often didn’t implement use case, implemented several “more elemental” use casesMore detail asked
Our big effort was NOT to give too many details
Very difficult to establish a dialectic procedure
19CHEP 2003, La JollaMarch 27, 2003
edg-dsupload -s source_file –l LDN -d targetSE
HEPCAL request
IS_HOST=lxshare0382.cern.ch
IS_PORT=2170
while getopts ":s:l:d:v:" opt; do
case $opt in
s ) SOURCEFILE=$OPTARG ;;
l ) LDN=$OPTARG ;;
d ) TARGSE=$OPTARG ;;
v ) VONAME=$OPTARG ;;
esac
done
GDMP_CONFIG_FILE=/opt/edg/etc/$VONAME/gdmp.conf ; export GDMP_CONFIG_FILE
destpath=$(ldapsearch -h $IS_HOST -p $IS_PORT -x -b "seId=$TARGSE,o=grid" | \
gawk -F : '/^SEvo.*'$VONAME'/ { print $3 }')
if [ $(dirname $SOURCEFILE) = "." ] ; then
SOURCEFILE=$(pwd)/$SOURCEFILE
fi
globus-url-copy file://$SOURCEFILE gsiftp://$TARGSE/$destpath/$LDN
gdmp_register_local_file -S $TARGSE -R -p $destpath/$LDN
sleep 10
gdmp_publish_catalogue -S $TARGSE -C
Implementation
•Grid information provided by user
•Glue code supplied by user
•“Other” middleware called by user
•Middleware specific to this use case
Example: Upload Grid Dataset
20CHEP 2003, La JollaMarch 27, 2003
GAGA Grid Application Group has been setup by LCG to follow up on HEPCAL
Semi permanent and reporting to LCG
Both US and EU representatives from experiments and GRID projectHEPCAL II already scheduled before SummerDiscussion on the production of test jobs / code fragments / examples to validate against use cases
21CHEP 2003, La JollaMarch 27, 2003
ConclusionVery interesting and productive work
320 google hits (with moderate filtering)!
It prompted a constructive dialogue with MW projects
And between US and EU projects
It provides a solid base to develop a GRID architecture
Largely used by EDG ATF
It proves that common meaningful requirements can be produced by different experimentsThe dialogue with the MW projects has to continue, but it is very labor intensive