grid computing hakan ÜnlÜ cmpe 511 presentation fall 2004

Post on 14-Dec-2015

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Grid Computing

Hakan ÜNLÜCMPE 511 Presentation

Fall 2004

Overview General Introduction to Grid Computing

Introduction: Why Grids? Applications for Grids Basic Grid Architecture Grid Platforms & Standarts

Issues in Grid Computing Hardware: Blade Computers System Management : Globus Toolkit Software: Scheduling

What is Grid Computing? Computational and Networking

Infrastructure that is designed to provide pervasive, uniform and reliable access to data, computational and human resources distributed over wide area environments

Grids Are By Definition Heterogeneous It’s about legacy resources, infrastructure,

applications, policies, and procedures The grid and its administrators must

integrate in stealth mode…with Firewalls Filesystems Queuing systems Grumpy systems administrators Tried and true applications

A Grid Example

Challenges in Grid Computing Reliable performance Trust relationships between multiple security

domains Deployment and maintenance of grid middleware

across hundreds or thousands of nodes Access to data across WAN’s Access to state information of remote processes Workflow / dependency management Distributed software and license management Accounting and billing

Applications for a Grid Generally, apps that work well on clusters

can work well on grids Non-interactive / batch jobs Parallel computations with minimal

interprocess communication and workflow dependencies

Reasonable data transfer requirements Sensible economics

Productivity Gains > Cost of Building Grid + Opportunity Costs of Resources

Non-Interactive / Batch Jobs Difficult to get a real-time UI for jobs running on

the grid A possible interactive application: spreadsheet

computation Want to take advantage of off-peak free cycles

Jobs run for several days, weeks or months The user might prefer to be sleeping while the job runs!

Running processes might need to be interrupted or re-prioritized based on the current load on a grid compute engine Idle thread / “screensaver” computing

Seti@Home

Parallel Computations Application needs to be able to run as multiple,

mostly independent pieces Can’t depend on the network’s Quality of Service Can’t rely upon the order of execution and completion Apps that need these things are better suited for tightly

coupled compute platforms (e.g. SMP systems) Grid can still be useful as a meta-scheduler and data source for

such apps e.g. the user submits the job to the grid queue and asks for the

best available SMP resource

Some Costs and BenefitsCosts: Grid Middleware Architects and

Developers User Training Infrastructure

Hardware Opportunity Costs

Would a big SMP box return better results for your problem?

Benefits: Better Utilization of

Existing Capital Resources

More Efficient Users Ability to complete

more work in the same amount of time Performance near or

sometimes as good as the big SMP box

Basic Grid Architecture Clusters and how grids are different than

clusters Departmental Grid Model Enterprise Grid Model Global Grid Model

What Makes a Cluster a Cluster? Uses a Distributed Resource Manager

(DRM) to manager job scheduling Tightly coupled - High speed, low latency

interconnect network Fairly homogenous - Configuration

management is important! Single administrative domain

The Cluster Model

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

User Interface/API

Cluster DRM

Cluster Node Cluster Node Cluster Node Cluster Node

High SpeedInterconnect

Master Node

SharedStorage

ConfigurationManagement

How is an Enterprise Grid Different from a Cluster? Heterogeneous - Clusters, SMP, even workstations

of dissimilar configurations, but all are tied together through a grid middleware layer

Lightly coupled - Connected via 100 or 1000Mbps Ethernet

Introduces a resource registry and grid security service But usually only a single registry and security service for

the grid Not necessarily a single administrative domain

The Enterprise Grid Model

RD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PM3A DMMP

Operating System

StorageCompute

Grid Interface

RD PM3A DMMP

Operating System

StorageCompute

Grid Interface

RD PM3A DMMP

User Interface/API

Grid Interface

SMP SMP

EnterpriseLAN or WAN

SecurityInfrastructure

ResourceRegistry

Grid Interface

Cluster DRM RD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceGrid Interface

Cluster DRM

RD PM3A DMMP RD PM3A DMMP

How is a Global Grid Different from an Enterprise Grid? "Grid of Grids" - Collection of enterprise

grids Loosely coupled between sites - Not much

control over Quality of Service Mutually distrustful administrative

domains Multiple grid resource registries and grid

security services

The Global Grid Model

Grid

WAN

RR SI

Cluster

Grid

SMP

Grid

SMP

Grid

Cluster

UI/API

Grid

LAN

Grid

RR SI

SMP

Grid

SMP

Grid

SMP

Grid

Cluster

Cluster

RR SI

ClusterSMP

Grid

Cluster

Grid Grid Grid

LAN

Site A

Site B

Site C

UI/API

Grid

UI/API

Grid

LAN

Grid Platforms & Standards The Global Grid Forum

http://www.gridforum.org/

Globus Toolkit DCML (Data Center Markup Language)

Globus Toolkit V2 “Pillars”

InformationServices(MDS)

DataManagement

(GASS)

ResourceManagement

(GRAM)

Grid Security Infrastructure(GSI)

Globus Toolkit V2 Stack

MDS GASS/GridFTPGRAM

GSI

HTTP LDAP FTP

TLS/SSL

TCP/IP

Globus Toolkit V2 Key Components:

GRAM, MDS and GASS Grid Resource Allocation Manager (GRAM)

Server-side: “gatekeeper” process that controls execution of job managers

Client-side: “globusrun” UI to launch jobs Monitoring and Directory Service (MDS)

GRIS: Grid Resource Information Service collects local info

GIIS: Grid Index Information Service collects GRIS info Global Access to Secondary Storage (GASS)

GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command

Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt

Globus Toolkit V2 Additional Components Grid Packaging Tools (GPT)

Used to build (“gpt-build”), install (“gpt-install”) and localize (“gpt-postinstall”) Globus components

MPICH-G2 A Globus V2 enabled version of MPI (Message

Passing Interface) Based on MPICH Utilizes GSI, MDS and GRAM

Globus Toolkit V2 Network Services

CertificateAuthority

GIISServer

GRIS

gatekeeper

in.ftpd

Grid Node

GRAMClient

Client Node

GRIS

gatekeeper

in.ftpd

Grid Node

GRIS

gatekeeper

in.ftpd

Grid Node

GRIS

gatekeeper

in.ftpd

Grid Node

Network

GRAM, MDS and GASS Interactions

resourceresourceprocessprocess

job manager

gatekeeper

process

GRAM

GRIS

resource

GIIS

MDS

GridFTPin.ftpd

GASS

job allocationjob management

resourcediscovery

data transferdata control

user / proxy

Client

RSL/DUROC/HTTP 1.1 LDAP LDAP

LDAP LDAP

gsiftp

Globus Toolkit V2 Strengths and WeaknessesStrengths: Mindshare and

collaboration in both industry & academia

Open source Standards-based

underpinnings (e.g. SSL, LDAP)

Flexibility and CoG API's Driving OGSA with

heavy resource commitment from IBM

Weaknesses: Significant effort

required to get applications working on a grid

Not production quality at this time

No “metascheduler” -- user has to explicitly tell their jobs where to run

Issues inGrid Computing

Hardware : Blades

Hardware Trends HW Trends that enable Grids and

Distributed Processing There is a lot of idle computing power Computers are now better connected There are many different brands and

configurations in any environment And Distributed Computing that give rise

to new HW architectures Blade Computers

What is a blade? Inclusive chassis-based modular

computing system that includes processors, memory, network interface cards and local storage on a single board.

BladeBlade Chasis & Blades Blade Farm

Anatomy of a blade

How far it can go?

Advantages & Disadvantages Low Cost (power,

heat, data center space)

Physical Server Consolidation (Save space, eliminate cables)

High Availability Integrated Systems

Management

Not suitable in small numbers

Need for standardization (for network connection and management)

Blades & Grid Each blade is a server that can run jobs. Blades can be used to form clusters or

grids. With efficient management different

configurations of blades can be used in a single grid computer. Easy to expand Protects investment

Issues inGrid Computing

System Management : Globus Toolkit

Globus Toolkit V2 “Pillars”

InformationServices(MDS)

DataManagement

(GASS)

ResourceManagement

(GRAM)

Grid Security Infrastructure(GSI)

Globus Toolkit V2 Stack

MDS GASS/GridFTPGRAM

GSI

HTTP LDAP FTP

TLS/SSL

TCP/IP

Globus Toolkit V2 Key Components:

GRAM, MDS and GASS Grid Resource Allocation Manager (GRAM)

Server-side: “gatekeeper” process that controls execution of job managers

Client-side: “globusrun” UI to launch jobs Monitoring and Directory Service (MDS)

GRIS: Grid Resource Information Service collects local info

GIIS: Grid Index Information Service collects GRIS info Global Access to Secondary Storage (GASS)

GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command

Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt

Globus Toolkit V2 Additional Components Grid Packaging Tools (GPT)

Used to build (“gpt-build”), install (“gpt-install”) and localize (“gpt-postinstall”) Globus components

MPICH-G2 A Globus V2 enabled version of MPI (Message

Passing Interface) Based on MPICH Utilizes GSI, MDS and GRAM

Globus Toolkit V2 Network Services

CertificateAuthority

GIISServer

GRIS

gatekeeper

in.ftpd

Grid Node

GRAMClient

Client Node

GRIS

gatekeeper

in.ftpd

Grid Node

GRIS

gatekeeper

in.ftpd

Grid Node

GRIS

gatekeeper

in.ftpd

Grid Node

Network

GRAM, MDS and GASS Interactions

resourceresourceprocessprocess

job manager

gatekeeper

process

GRAM

GRIS

resource

GIIS

MDS

GridFTPin.ftpd

GASS

job allocationjob management

resourcediscovery

data transferdata control

user / proxy

Client

RSL/DUROC/HTTP 1.1 LDAP LDAP

LDAP LDAP

gsiftp

Globus Toolkit V2 Strengths and WeaknessesStrengths: Mindshare and

collaboration in both industry & academia

Open source Standards-based

underpinnings (e.g. SSL, LDAP)

Flexibility and CoG API's Driving OGSA with

heavy resource commitment from IBM

Weaknesses: Significant effort

required to get applications working on a grid

Not production quality at this time

No “metascheduler” -- user has to explicitly tell their jobs where to run

Issues inGrid Computing

Software : Scheduling

Superscheduling Superscheduling means scheduling

resources in multiple administrative domains.

Various models Submiting a job to a specific single machine Submiting a job to single machines at multiple

sites (With cancellation option) Scheduling a single job to use multiple

resources Most common superscheduler : USERS

Phases Of Superscheduling Resource Discovery

Authorisation Filtering Application Requirement Definition Minimal Requirement Filtering

System Selection Gathering Information (Query) Select Systems to run on

Run the Job Make an Advance Reservation (Optional) Submit Job to Resources Preperation Tasks Monitor Progress Job Completion Completion Tasks

Source : Global Grid Forum, Scheduling Working Group, 10 Actions When Scheduling, Schopf, 2001

Scheduling Framework (Ranganathan & Foster 2003)

External Scheduler Local Scheduler Dataset Scheduler

Scheduling And Replication Algorithms External Scheduler

JobRandom JobLeastLoaded JobDataPresent JobLocal

Dataset Scheduler DataDoNothing: No Active Replitication. Everything is on

demand DataRandom: Popular Datasets are replicated to Random Sites

DataLeastLoaded: Popular Datasets are snet to the least loaded sites.

Simulation Results

Average Response Times Average Data Transfered

Grid Computing

Thank Youand

Questions?

top related