enabling grids for e-science the glite workload management system alessandro maraschini...

23
Enabling Grids for E-sciencE www.eu-egee.org The gLite Workload Management System Alessandro Maraschini [email protected] OGF20, Manchester, UK 2007, May 10

Upload: august-kelley

Post on 18-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

Enabling Grids for E-sciencE

www.eu-egee.org

The gLite Workload Management SystemAlessandro Maraschini [email protected]

OGF20, Manchester, UK2007, May 10

Page 2: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 2

Enabling Grids for E-sciencE

Contents

• WMS– System overview, partners, task, components

• JDL– Language overview– JobTypes: single /compounds

• News– New Functionalities– Latest Activities

• Future Plans– Future Implementations & Activities– WMS and WfMS

• Tests– Middleware Testing Activities & Results

Page 3: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 3

Enabling Grids for E-sciencE

• Workload Management System (WMS)– Italian and Czech clusters

Part of Joint Research Activity 1 (JRA1)

• Partners involved – INFN– Datamat – CESNET

• Provides Distribution and Management of tasks across resources available on a Grid– Accepts a request of execution of a Job from a client– Finds appropriate resources to satisfy the Job– Follows the Job until completion.

Introduction: gLite WMS

Page 4: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 4

Enabling Grids for E-sciencE

WMS Architecture: core components

• WMProxy– Accepts Request from User– Checks/Authentication/Authorization– Sets up Local File System– Forwards request to WM

• Workload Manager (WM)– Look for the appropriate Computing Element (CE)

Matchmaking operation

– Forwards request to CondorC• Logging & Bookkeeping (LB)

– Tracks jobs in terms of events gathered from various gLite components– Processes the incoming events to give a higher level view on the job

states– Recently introduction of LBProxy

Lightweight local LB service “dedicated” to WMS components Asynchronously logs info to the actual (usually remote) LB

Page 5: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 5

Enabling Grids for E-sciencE

WMS Architecture overview

Job Controller CondorG

gLite WMS

Workload Manager

LB Proxy

WMProxyUserInterface

LB Server

gLite

CE

LCG

CE

Job Controller

CondorC

Log Monitor

Page 6: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 8

Enabling Grids for E-sciencE

JDL: overview

• Job Description Language (JDL)– gLite approach to Request Description– Allows the user to provide job execution needed information

Characteristics of the application Requirements/preferences about resources Customized hints for gLite WMS on how to handle the application

• Supported Job Types– Single Jobs– Compound Jobs

Workflows (DAGs) Collections, Parametric Jobs

Page 7: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 9

Enabling Grids for E-sciencE

JDL: Single Types

• Single Jobs– Normal: single and simple batch job with no peculiar requirements– MPICH: a parallel application to be run on the nodes of a cluster using

the MPICH implementation of the message passing interface new MPI flavours support planned

– Interactive: a job whose standard streams are forwarded to the submitting client, that can actually interact and steer the job execution by providing real-time input information

• Previously Supported Job Types– Not supported anymore:

Checkpointable Jobs Partitionable Jobs

– Deprecation due Lack of feedback from users It seems they are not used at all

– Focus on improving support for “really used” job types

Page 8: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 10

Enabling Grids for E-sciencE

JDL: Compound Jobs

• Definition– Aggregation of Single/Normal Jobs

• Benefits– One Shot submission for (up to thousands of) jobs

Single call to WMProxy server Single AuthN and AuthZ process Submission time reduction

– Single Identification to manage all jobs (father Job) Not an actual Job, used to monitor the whole bunch

– Sharing of files between jobs

Page 9: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 11

Enabling Grids for E-sciencE

JDL: Compound Types

• Compound Jobs: Workflows– Implemented as Directed Acyclic Graphs (DAGs)– Set of jobs where the input, output or execution of one of more

jobs may depend on one or more other jobs– Dependencies represent time constraints: a child cannot start

before all parents have successfully completed

nodeEnodeDnodeC

nodeI

nodeF

nodeH nodeG

nodeBnodeA Father

Page 10: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 12

Enabling Grids for E-sciencE

JDL: Compound Types

• Compound Jobs: Parametric Jobs– Parameterized description of a Job

Parameter sweep usage

– Automatically converted on WMS side – Generate a (possibly) huge number of (similar) jobs– No dependencies between nodes

nodeDnodeCnodeB

nodeA nodeE

Father

Page 11: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 13

Enabling Grids for E-sciencE

JDL: Compound Types

• Compound Jobs: Collections– A set of possibly heterogeneous jobs that can be specified

within a single JDL description– No dependencies among specified jobs

nodeD

nodeCnodeB

nodeA nodeE

Father

Page 12: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 14

Enabling Grids for E-sciencE

New Functionalities: WMProxy

• WMProxy server– Replaces the old C++ based socket connection service– Implements an interoperable interface

Web Service based WS-I compliant

• WMProxy client– Provides C++ based WMS command-line User Interface (UI),

which executes all the needed operation automatically– Provides multi language (C++, Java and Python) provided APIs

Page 13: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 15

Enabling Grids for E-sciencE

New Functionalities: ICE- CREAM

• CREAM service: Computing Resource Execution And Management service– CE with Web service interface– WMS requests are directly forwarded to CREAM based CEs

through ICE

• ICE: Interface to Cream Environment– Basically reproduces the Job Controller / CondorC / Log Monitor

layer needed by gLite/LCG CEs

Page 14: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 16

Enabling Grids for E-sciencE

New Functionalities: ICE- CREAM

Job Controller CondorG

gLite WMS

Workload Manager

LB Proxy

WMProxyUserInterface

ICE

LB Server

gLite

CE

LCG

CE

CREAM

Job Controller

CondorC

LogMonitor

Page 15: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 17

Enabling Grids for E-sciencE

New Functionalities: Sandbox Files

• Sandbox Archiving and Sharing– Job sandbox files can be automatically compressed– Different jobs can share the same sandbox

dramatically reduces network traffic allowes the user to save time and bandwidth

• Sandbox Remote Specification– User can store files directly on a remote machine– No intermediate copies – JobWrapper will download directly

from WorkerNode– Reduces server load

• Supported File Transfer– Full support (input & output files) for protocols:

gridftp https

Page 16: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 18

Enabling Grids for E-sciencE

New Functionalities: Bulk-MM

• Bulk-MatchMaking– Natural completion of the Bulk Submission– Allow single Matchmaking of similar jobs in one shot

Jobs equivalence are based upon specifying “significant attributes” Jobs whose significant attribues are literally equal are equivalent

– Target Jobs: Bunch of Independent Jobs Mainly Collections and Parametric Jobs Originally managed with Condor DAGMan

– Allows Submission to CREAM-based CEs Provides additional boost in WMS performance

– Saves time & resources

Page 17: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 19

Enabling Grids for E-sciencE

Other New Functionalities

• Service Discovery– Provides services information by performing queries to external

databases of different kinds (RGMA, BDII)– Client side

Queries for available WMProxy Endpoints on the net Does not need user commands manual reconfiguration

– Server side Queries for available LB servers where to Log Job information

• Job Files Perusal– Performs a monitoring activity on the actual output files

produced by a job during its lifecycle: – Adds important pieces of information not available by simple

status monitoring and that were before available only at job completion

Page 18: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 20

Enabling Grids for E-sciencE

gLite New Activities

• New platforms widely deployed on the infrastructure– In particular Scientific Linux 4 and 64-bit architectures

• Migration to ETICS build system– High Flexibility– Addresses multiple platform support

almost impossible using the old gLite build system

– All WMS components build achieved– WMS Ongoing activity: Integration/deployment

Software not yet fully deployed Client side manual installation fully working Server side installation not yet available (almost achieved)

Page 19: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 21

Enabling Grids for E-sciencE

gLite WMS Ongoing Restructuring

• gLite Restructuring:– All new features development stopped for 6 months– improving usability & portability

Multi platform (Structural changes needed)

– Cleaning up sections that cause build and porting difficulties– Removing/Reducing Dependencies on external software to

ease installation and deployment

• Goals:– Easier Service maintainance and Usage

Will increase stability and throughput

– Toward a “gLighter” User Interface Identify and remove all unnecessary dependencies

Page 20: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 22

Enabling Grids for E-sciencE

gLite WMS Future

• Improving Logging and Error Reporting– Common syslog-like logging format

• Windows working prototype gLite porting on Microsoft Windows platforms

• Improving interoperability– Supercomputing 06

Working Prototype for Demo Basic Execution Service (BES) Job Submission Description Language (JSDL)

Page 21: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 23

Enabling Grids for E-sciencE

WfMS and gLite WMS

• Possible “external integration” with external existing Workflow frameworks– Still to be discussed and planned

• A proposal for a Workflow Mangement System Integrated within WMS under discussion– Running on top of gLite Middleware – Abstract and Generic Representation of Workflow

Internally usage of Petri Net model Externally translation mechanisms from different language front

ends

Page 22: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 25

Enabling Grids for E-sciencE

Test & Result

• Intense testing and constant bug fixing activities have been performed over the last months – Improved job submission rate– Improved service stability – New Functionalities tested and adopted by experiments

• Production quality test Results:– 16K jobs/day over one week of submissions– No manual intervention on server

Stable memory usage

– 0.3% of jobs in non-final state Aborted jobs mostly due to expired user credentials Was about 5% before Bulk-MM support

Page 23: Enabling Grids for E-sciencE  The gLite Workload Management System Alessandro Maraschini alessandro.maraschini@datamat.it OGF20, Manchester,

OGF20, Manchester, UK 2007, May 10th 27

Enabling Grids for E-sciencE

Some Links

• WMS – http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/

• WMProxy– http://trinity.datamat.it/projects/EGEE/wiki/wiki.php

• LB– http://egee.cesnet.cz/en/JRA1/index.html

• CREAM– http://grid.pd.infn.it/cream

• JDL– http://edms.cern.ch/document/590869/1