low level grid services (job management, data management, monitoring services)

41
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Low Level Grid Services (Job Management, Data Management, Monitoring Services) Ravi K Madduri Argonne National Laboratory University of Chicago

Upload: chill

Post on 18-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Low Level Grid Services (Job Management, Data Management, Monitoring Services). Ravi K Madduri Argonne National Laboratory University of Chicago. Services Overview. Installation Data Management GridFTP, RFT, RLS, DAIS Resource Management Schedulers, logs, sudo Information Services - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Low Level Grid Services (Job Management, Data Management, Monitoring

Services)

Ravi K MadduriArgonne National Laboratory

University of Chicago

Page 2: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Services Overview

• Installation• Data Management

– GridFTP, RFT, RLS, DAIS

• Resource Management– Schedulers, logs, sudo

• Information Services– Index service hierarchies,

ganglia/hawkeye

Page 3: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Full Toolkit Installation• Binaries available for many platforms

– Apple– Linux

• Debian, Fedora, SuSe, RHEL, Redhat– FreeBSD– HP/UX, Tru64– AIX– Solaris– Windows (Java code only)

• Source code also available• See http://www.globus.org/toolkit/docs/4.0 for

installation guide, quickstart, and pre-req documentation

Page 4: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Overview of GT4 Data Services

• GridFTP– High-performance Data transfer protocol

• The Reliable File Transfer Service (RFT)– Data movement services for GT4

• The Replica Location Service (RLS)– Distributed registry that records locations of

data copies

• The Data Access and Integration Service (DAIS)– Service to access relational and XML databases

Page 5: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

What is GridFTP?• A secure, robust, fast, efficient, standards based,

widely accepted data transfer protocol• A Protocol

– Multiple Independent implementation can interoperate• This works. Both the Condor Project at Uwis and

Fermi Lab have home grown servers that work with ours.

• Lots of people have developed clients independent of the Globus Project.

• The Globus Toolkit supplies a reference implementation:– Server– Client tools (globus-url-copy)– Development Libraries

Page 6: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

GT4 GridFTP Implementation

•Based on XIO•Extremely modular to allow

integration with a variety of data sources (files, mass stores, etc.)

•Striping support is provided in 4.0

•Has IPV6 support included (EPRT, EPSV)

Page 7: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Configuring GridFTP

• Right configuration results in better performance

• Add entries to /etc/services and (x)inetd

• Configuration options:– Binding to a specific

interface/address– Striped backend– TCP tuning parameters

Page 8: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

What is RFT ?• WS-RF compliant Fault-tolerant, High-

performance data transfer service– Soft state.– Notifications/Query

• Reliability on top of high performance provided by GridFTP.– Fire and Forget.– Integrated Automatic Failure Recovery.

• Network level failures.• System level failures etc.

– Essentially a Data transfer scheduler with FIFO as a Queue Policy.

Page 9: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

What is RFT (Continued..)?

RFT Service

RFT Client

SOAP Messages

Notifications(Optional)

DataChannel

Protocol Interpreter

MasterDSI

DataChannel

SlaveDSI

IPCReceiver

IPC Link

MasterDSI

Protocol Interpreter

Data Channel

IPCReceiver

SlaveDSI

Data Channel

IPC Link

GridFTP Server GridFTP Server

Page 10: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Data Transfer Comparison

Control

Data

Control

Data

Control

Data

Control

Data

globus-url-copy RFT Service

RFT Client

SOAP Messages

Notifications(Optional)

Page 11: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Replica Management in Grids

• Data intensive applications produce terabytes or petabytes of data– Hundreds of millions of data objects

• Replicate data at multiple locations for reasons of:– Fault tolerance

• Avoid single points of failure

– Performance• Avoid wide area data transfer latencies• Achieve load balancing

Page 12: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

A Replica Location Service• A Replica Location Service (RLS) is a distributed

registry that records the locations of data copies and allows replica discovery– RLS maintains mappings between logical identifiers

and target names – Must perform and scale well: support hundreds of

millions of objects, hundreds of clients

• E.g., LIGO (Laser Interferometer Gravitational Wave Observatory) Project– RLS servers at 8 sites– Maintain associations between 3 million logical file

names & 30 million physical file locations

• RLS is one component of a Replica Management system– Other components include consistency services,

replica selection services, reliable data transfer, etc.

Page 13: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Goals for OGSA-DAI• Aim to deliver application mechanisms that:

– Meet the data requirements of Grid applications • Functionality, performance and reliability• Reduce development cost of data centric Grid applications• Provide consistent interfaces to data resources

– Acceptable and supportable by database providers• Trustable, imposed demand is acceptable, etc.• Provide a standard framework that satisfies standard

requirements

• A base for developing higher-level services– Data federation– Distributed query processing– Data mining– Data visualisation

Page 14: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Data Management Q & A

Page 15: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Grid Monitoring Services• Overview• Index Service

– Aggregate the data• Trigger Service

– Notify when data changes• Information Providers

– Provide the data• WebMDS

– Client to visualize data

Page 16: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

What Is Grid Monitoring?• A way to discover what services and

resources are available to use (Discovery)

• A way to understand the status/attributes of those services (Monitoring)

• A system to warn you when things fail

• Sharing of community data between sites using a standard interface for querying and notification.

Page 17: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Why Grid Monitoring Hard?• Lack of central control

– Different local systems according to local policy

– Different interfaces and monitoring requirements

• Shared resources– Contention, variability

• Communication– Different sites implies different sys

admins, users, institutional policies

Page 18: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

MDS4Monitoring and Discovery

System• Grid-level monitoring system used most often for resource selection– Aid user/agent to identify host(s) on which to run an

application

• Uses standard interfaces to provide publishing of data, discovery, and data access, including subscription/notification– WS-ResourceProperties, WS-BaseNotification, WS-

ServiceGroup

• Functions as an hourglass to provide a common interface to lower-level monitoring tools

Page 19: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

GLUE Schema Attributes(cluster info,queue info, FS info)

Information Users :Schedulers, Portals, etc.

Cluster monitors(Ganglia, Hawkeye,Clumon, and Nagios soon)

Services(GRAM, RFT, RLS)

Queueing systems(PBS, LSF, Torque)

WS standard interfaces for subscription, registration, notification

Page 20: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

MDS4 Components• Higher level services

– Index Service – a way to aggregate data– Trigger Service – a way to be notified of changes– Both built on common aggregator framework

• Information providers– Monitoring is a part of every WSRF service– Non-WS services can also be used

• Clients– WebMDS

• All of the tool are schema-agnostic, but interoperability needs a well-understood common language

Page 21: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Sample Deployment

Page 22: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

WebMDS User Interface• Web-based interface to WSRF resource

property information• User-friendly front-end to the Index Service• Uses standard resource property requests to

query resource property data• XSLT transforms to format and display them• Customized pages are simply done by using

HTML form options and creating your own XSLT transforms

• Sample page:– http://mds.globus.org:8080/webmds/

webmds?info=indexinfo&xsl=servicegroupxsl

Page 23: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

WebMDS Service

Page 24: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Page 25: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Page 26: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Information Services Q & A

Page 27: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

GRAM

• Overview• Submitting a test job• Resource Specification Language

(RSL)• Data Staging• Multi-jobs

Page 28: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

GRAM Overview

• Intended for jobs where arbitrary programs, state-ful monitoring, credential management, and file staging are important

• If the application is lightweight, with modest input/output, may be a better candidate for hosting directly as a WSRF service

Page 29: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

GRAM Prerequisites

• A secure container• For staging jobs, access to an RFT

service and a GridFTP server– Note that even stderr/stdout are

considered staging, so RFT and GridFTP are used in all but the most basic jobs

• sudo for running as other accounts• Can be integrated with PBS, LSF,

Condor

Page 30: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Submitting A Test Job

• globusrun-ws –submit –c /bin/true• echo $?• Will run locally. Specify a remote

host with –F• globusrun-ws –submit –F host2 –

c /bin/true• The return code will be the job’s

exit code if supported by the scheduler

Page 31: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Data Staging

• GRAM allows jobs to stage-in and stage-out data

• To perform this task it uses RFT• RFT in turn uses GridFTP servers• Simplest stage-in/stage-out

example is stdout/stderr

Page 32: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Streaming Results

• globusrun-ws –S –s –c /bin/date• -S is short for “-submit”• -s is short for –streaming

– The output will be sent back to the terminal, control will not return until the job is done

Page 33: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Resource Specification Language

• For more complicated jobs, we’ll use RSL to specify the job

<job><executable>/bin/echo</executable><argument>this is an example_string

</argument><argument>Globus was here</argument><stdout>${GLOBUS_USER_HOME}/stdout</

stdout>

<stderr>${GLOBUS_USER_HOME}/stderr</stderr>

</job>

Page 34: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Submitting Using XML

• Create the file containing the RSL• You may validate the RSL ahead of

time– globusrun-ws –validate –f

rslfile.xml

• If the file validates, submit using -submit

Page 35: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

At Most Once Submission• You may specify a UUID with your job

submission• If you’re not sure the submission

worked, you may submit the job again with the same UUID

• If the job has already been submitted, the new submission will have no effect

• If you do not specify a UUID, one will be generated for you

Page 36: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Staging Data

• GRAM’s RSL allows many fileStageIn/fileStageOut directives

• The transfers will be executed by RFT– May specify additional RFT options

using the RFTOptions tag

• There is no GASS cache staging option anymore

Page 37: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Batch Submission

• Your client does not have to stay attached to the execution of the job

• -batch will disconnect from the job and output an EPR– You may redirect the EPR to a file

with –o

• Use the EPR file with –monitor or -status

• You may also kill the job using -kill

Page 38: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Specifying Scheduler Options

• RSL lets you specify various scheduler options– what queue to submit to– which project to select for accounting– max CPU and wallclock time to spend– min/max memory required

• All defined online under the schema document for GRAM

Page 39: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Multijobs

• You may specify more than one <job> element in a <multijob>

• At that point, you want to specify the <factoryEndpoint> in the RSL rather than the commandline

• Will be used by MPICH-G to support MPI jobs

Page 40: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Resource Management Q & A

Page 41: Low Level Grid Services    (Job Management, Data Management, Monitoring Services)

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

For more information

• The Globus Toolkit ™– http://www-unix.globus.org/toolkit/

• The Globus Toolkit ™– http://www-unix.globus.org/toolkit/