grid computing hakan ÜnlÜ cmpe 511 presentation fall 2004

48
Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Upload: randy-havis

Post on 14-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Grid Computing

Hakan ÜNLÜCMPE 511 Presentation

Fall 2004

Page 2: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Overview General Introduction to Grid Computing

Introduction: Why Grids? Applications for Grids Basic Grid Architecture Grid Platforms & Standarts

Issues in Grid Computing Hardware: Blade Computers System Management : Globus Toolkit Software: Scheduling

Page 3: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

What is Grid Computing? Computational and Networking

Infrastructure that is designed to provide pervasive, uniform and reliable access to data, computational and human resources distributed over wide area environments

Page 4: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Grids Are By Definition Heterogeneous It’s about legacy resources, infrastructure,

applications, policies, and procedures The grid and its administrators must

integrate in stealth mode…with Firewalls Filesystems Queuing systems Grumpy systems administrators Tried and true applications

Page 5: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

A Grid Example

Page 6: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Challenges in Grid Computing Reliable performance Trust relationships between multiple security

domains Deployment and maintenance of grid middleware

across hundreds or thousands of nodes Access to data across WAN’s Access to state information of remote processes Workflow / dependency management Distributed software and license management Accounting and billing

Page 7: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Applications for a Grid Generally, apps that work well on clusters

can work well on grids Non-interactive / batch jobs Parallel computations with minimal

interprocess communication and workflow dependencies

Reasonable data transfer requirements Sensible economics

Productivity Gains > Cost of Building Grid + Opportunity Costs of Resources

Page 8: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Non-Interactive / Batch Jobs Difficult to get a real-time UI for jobs running on

the grid A possible interactive application: spreadsheet

computation Want to take advantage of off-peak free cycles

Jobs run for several days, weeks or months The user might prefer to be sleeping while the job runs!

Running processes might need to be interrupted or re-prioritized based on the current load on a grid compute engine Idle thread / “screensaver” computing

Page 9: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Seti@Home

Page 10: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Parallel Computations Application needs to be able to run as multiple,

mostly independent pieces Can’t depend on the network’s Quality of Service Can’t rely upon the order of execution and completion Apps that need these things are better suited for tightly

coupled compute platforms (e.g. SMP systems) Grid can still be useful as a meta-scheduler and data source for

such apps e.g. the user submits the job to the grid queue and asks for the

best available SMP resource

Page 11: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Some Costs and BenefitsCosts: Grid Middleware Architects and

Developers User Training Infrastructure

Hardware Opportunity Costs

Would a big SMP box return better results for your problem?

Benefits: Better Utilization of

Existing Capital Resources

More Efficient Users Ability to complete

more work in the same amount of time Performance near or

sometimes as good as the big SMP box

Page 12: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Basic Grid Architecture Clusters and how grids are different than

clusters Departmental Grid Model Enterprise Grid Model Global Grid Model

Page 13: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

What Makes a Cluster a Cluster? Uses a Distributed Resource Manager

(DRM) to manager job scheduling Tightly coupled - High speed, low latency

interconnect network Fairly homogenous - Configuration

management is important! Single administrative domain

Page 14: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

The Cluster Model

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

Operating System

StorageCompute

Cluster DRM

RD PM3A DMMP

User Interface/API

Cluster DRM

Cluster Node Cluster Node Cluster Node Cluster Node

High SpeedInterconnect

Master Node

SharedStorage

ConfigurationManagement

Page 15: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

How is an Enterprise Grid Different from a Cluster? Heterogeneous - Clusters, SMP, even workstations

of dissimilar configurations, but all are tied together through a grid middleware layer

Lightly coupled - Connected via 100 or 1000Mbps Ethernet

Introduces a resource registry and grid security service But usually only a single registry and security service for

the grid Not necessarily a single administrative domain

Page 16: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

The Enterprise Grid Model

RD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PM3A DMMP

Operating System

StorageCompute

Grid Interface

RD PM3A DMMP

Operating System

StorageCompute

Grid Interface

RD PM3A DMMP

User Interface/API

Grid Interface

SMP SMP

EnterpriseLAN or WAN

SecurityInfrastructure

ResourceRegistry

Grid Interface

Cluster DRM RD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceRD PMAA DMMP

Operating System

StorageCompute

Cluster InterfaceGrid Interface

Cluster DRM

RD PM3A DMMP RD PM3A DMMP

Page 17: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

How is a Global Grid Different from an Enterprise Grid? "Grid of Grids" - Collection of enterprise

grids Loosely coupled between sites - Not much

control over Quality of Service Mutually distrustful administrative

domains Multiple grid resource registries and grid

security services

Page 18: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

The Global Grid Model

Grid

WAN

RR SI

Cluster

Grid

SMP

Grid

SMP

Grid

Cluster

UI/API

Grid

LAN

Grid

RR SI

SMP

Grid

SMP

Grid

SMP

Grid

Cluster

Cluster

RR SI

ClusterSMP

Grid

Cluster

Grid Grid Grid

LAN

Site A

Site B

Site C

UI/API

Grid

UI/API

Grid

LAN

Page 19: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Grid Platforms & Standards The Global Grid Forum

http://www.gridforum.org/

Globus Toolkit DCML (Data Center Markup Language)

Page 20: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 “Pillars”

InformationServices(MDS)

DataManagement

(GASS)

ResourceManagement

(GRAM)

Grid Security Infrastructure(GSI)

Page 21: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 Stack

MDS GASS/GridFTPGRAM

GSI

HTTP LDAP FTP

TLS/SSL

TCP/IP

Page 22: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 Key Components:

GRAM, MDS and GASS Grid Resource Allocation Manager (GRAM)

Server-side: “gatekeeper” process that controls execution of job managers

Client-side: “globusrun” UI to launch jobs Monitoring and Directory Service (MDS)

GRIS: Grid Resource Information Service collects local info

GIIS: Grid Index Information Service collects GRIS info Global Access to Secondary Storage (GASS)

GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command

Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt

Page 23: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 Additional Components Grid Packaging Tools (GPT)

Used to build (“gpt-build”), install (“gpt-install”) and localize (“gpt-postinstall”) Globus components

MPICH-G2 A Globus V2 enabled version of MPI (Message

Passing Interface) Based on MPICH Utilizes GSI, MDS and GRAM

Page 24: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 Network Services

CertificateAuthority

GIISServer

GRIS

gatekeeper

in.ftpd

Grid Node

GRAMClient

Client Node

GRIS

gatekeeper

in.ftpd

Grid Node

GRIS

gatekeeper

in.ftpd

Grid Node

GRIS

gatekeeper

in.ftpd

Grid Node

Network

Page 25: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

GRAM, MDS and GASS Interactions

resourceresourceprocessprocess

job manager

gatekeeper

process

GRAM

GRIS

resource

GIIS

MDS

GridFTPin.ftpd

GASS

job allocationjob management

resourcediscovery

data transferdata control

user / proxy

Client

RSL/DUROC/HTTP 1.1 LDAP LDAP

LDAP LDAP

gsiftp

Page 26: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 Strengths and WeaknessesStrengths: Mindshare and

collaboration in both industry & academia

Open source Standards-based

underpinnings (e.g. SSL, LDAP)

Flexibility and CoG API's Driving OGSA with

heavy resource commitment from IBM

Weaknesses: Significant effort

required to get applications working on a grid

Not production quality at this time

No “metascheduler” -- user has to explicitly tell their jobs where to run

Page 27: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Issues inGrid Computing

Hardware : Blades

Page 28: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Hardware Trends HW Trends that enable Grids and

Distributed Processing There is a lot of idle computing power Computers are now better connected There are many different brands and

configurations in any environment And Distributed Computing that give rise

to new HW architectures Blade Computers

Page 29: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

What is a blade? Inclusive chassis-based modular

computing system that includes processors, memory, network interface cards and local storage on a single board.

BladeBlade Chasis & Blades Blade Farm

Page 30: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Anatomy of a blade

Page 31: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

How far it can go?

Page 32: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Advantages & Disadvantages Low Cost (power,

heat, data center space)

Physical Server Consolidation (Save space, eliminate cables)

High Availability Integrated Systems

Management

Not suitable in small numbers

Need for standardization (for network connection and management)

Page 33: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Blades & Grid Each blade is a server that can run jobs. Blades can be used to form clusters or

grids. With efficient management different

configurations of blades can be used in a single grid computer. Easy to expand Protects investment

Page 34: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Issues inGrid Computing

System Management : Globus Toolkit

Page 35: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 “Pillars”

InformationServices(MDS)

DataManagement

(GASS)

ResourceManagement

(GRAM)

Grid Security Infrastructure(GSI)

Page 36: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 Stack

MDS GASS/GridFTPGRAM

GSI

HTTP LDAP FTP

TLS/SSL

TCP/IP

Page 37: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 Key Components:

GRAM, MDS and GASS Grid Resource Allocation Manager (GRAM)

Server-side: “gatekeeper” process that controls execution of job managers

Client-side: “globusrun” UI to launch jobs Monitoring and Directory Service (MDS)

GRIS: Grid Resource Information Service collects local info

GIIS: Grid Index Information Service collects GRIS info Global Access to Secondary Storage (GASS)

GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command

Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt

Page 38: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 Additional Components Grid Packaging Tools (GPT)

Used to build (“gpt-build”), install (“gpt-install”) and localize (“gpt-postinstall”) Globus components

MPICH-G2 A Globus V2 enabled version of MPI (Message

Passing Interface) Based on MPICH Utilizes GSI, MDS and GRAM

Page 39: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 Network Services

CertificateAuthority

GIISServer

GRIS

gatekeeper

in.ftpd

Grid Node

GRAMClient

Client Node

GRIS

gatekeeper

in.ftpd

Grid Node

GRIS

gatekeeper

in.ftpd

Grid Node

GRIS

gatekeeper

in.ftpd

Grid Node

Network

Page 40: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

GRAM, MDS and GASS Interactions

resourceresourceprocessprocess

job manager

gatekeeper

process

GRAM

GRIS

resource

GIIS

MDS

GridFTPin.ftpd

GASS

job allocationjob management

resourcediscovery

data transferdata control

user / proxy

Client

RSL/DUROC/HTTP 1.1 LDAP LDAP

LDAP LDAP

gsiftp

Page 41: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Globus Toolkit V2 Strengths and WeaknessesStrengths: Mindshare and

collaboration in both industry & academia

Open source Standards-based

underpinnings (e.g. SSL, LDAP)

Flexibility and CoG API's Driving OGSA with

heavy resource commitment from IBM

Weaknesses: Significant effort

required to get applications working on a grid

Not production quality at this time

No “metascheduler” -- user has to explicitly tell their jobs where to run

Page 42: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Issues inGrid Computing

Software : Scheduling

Page 43: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Superscheduling Superscheduling means scheduling

resources in multiple administrative domains.

Various models Submiting a job to a specific single machine Submiting a job to single machines at multiple

sites (With cancellation option) Scheduling a single job to use multiple

resources Most common superscheduler : USERS

Page 44: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Phases Of Superscheduling Resource Discovery

Authorisation Filtering Application Requirement Definition Minimal Requirement Filtering

System Selection Gathering Information (Query) Select Systems to run on

Run the Job Make an Advance Reservation (Optional) Submit Job to Resources Preperation Tasks Monitor Progress Job Completion Completion Tasks

Source : Global Grid Forum, Scheduling Working Group, 10 Actions When Scheduling, Schopf, 2001

Page 45: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Scheduling Framework (Ranganathan & Foster 2003)

External Scheduler Local Scheduler Dataset Scheduler

Page 46: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Scheduling And Replication Algorithms External Scheduler

JobRandom JobLeastLoaded JobDataPresent JobLocal

Dataset Scheduler DataDoNothing: No Active Replitication. Everything is on

demand DataRandom: Popular Datasets are replicated to Random Sites

DataLeastLoaded: Popular Datasets are snet to the least loaded sites.

Page 47: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Simulation Results

Average Response Times Average Data Transfered

Page 48: Grid Computing Hakan ÜNLÜ CMPE 511 Presentation Fall 2004

Grid Computing

Thank Youand

Questions?