introduction to grid computing

25
Introduction to Grid Computing Ann Chervenak and Ewa Deelman USC Information Sciences Institute

Upload: ghalib

Post on 04-Feb-2016

111 views

Category:

Documents


2 download

DESCRIPTION

Introduction to Grid Computing. Ann Chervenak and Ewa Deelman USC Information Sciences Institute. Outline. Motivation Definition and characteristics of Grids Example Grid applications Grid Architecture How a Grid Is Assembled Overview of the Globus Toolkit Security Tools - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Introduction to  Grid Computing

Introduction to Grid Computing

Ann Chervenak and Ewa Deelman USC Information Sciences Institute

Page 2: Introduction to  Grid Computing

2

Outline Motivation Definition and characteristics of Grids Example Grid applications Grid Architecture How a Grid Is Assembled Overview of the Globus Toolkit

Security Tools Monitoring and Discovery System Computing/Execution Tools Data Tools

A more detailed example: The Earth System Grid

Page 3: Introduction to  Grid Computing

3

Motivation: Supporting Scientific Applications Computation intensive

Large-scale simulation and analysis (climate modeling, galaxy formation, gravity waves, event simulation)

Engineering (parameter studies, linked models)

Data intensive Experimental data analysis (high energy physics) Image & sensor analysis (astronomy, climate)

Distributed collaboration Online instrumentation (microscopes, x-ray) Remote visualization (climate studies, biology) Engineering (large-scale structural testing)

Large, complex scientific problems Require people in several organizations to collaborate Share computing resources, data, instruments

Page 4: Introduction to  Grid Computing

4

The Grid Problem Flexible, secure, coordinated resource sharing

among dynamic collections of individuals, institutions, and resource(From “The Anatomy of the Grid: Enabling Scalable

Virtual Organizations”)

Enable communities (“Virtual Organizations”) to share geographically distributed resources as they pursue common goals

Assuming the absence of… central location central control omniscience existing trust relationships

Page 5: Introduction to  Grid Computing

5

An Old Idea …

“The time-sharing computer system can unite a group of investigators …. one can conceive of such a facility as an … intellectual public utility.” Fernando Corbato and Robert Fano, 1966

“We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.” Len Kleinrock, 1967

Page 6: Introduction to  Grid Computing

A Few Grid Application Examples

Page 7: Introduction to  Grid Computing

7

Earth System Grid objectivesTo support the infrastructural needs of the national and international climate community, ESG is providing crucial technology to securely access, monitor, catalog, transport, and distribute data in today’s Grid computing environment.

7 Bernholdt_ESG_0611

HPChardware running

climate models

ESGSites

ESG Portal

Slide Courtesy of Dave Bernholdt, ORNL

Page 8: Introduction to  Grid Computing

8

ESG Portal at NCARESG Portal at NCAR IPCC AR4 ESG PortalIPCC AR4 ESG Portal

130 TB of data at four locations 840,331 files Includes the past 6 years of joint DOE/NSF

climate modeling experiments

28 TB of data at one location 68,400 files Generated by a modeling campaign coordinated by the

Intergovernmental Panel on Climate Change Model data from 11 countries

3,200 registered users 818 registered analysis projects

Downloads to date 25 TB 91,000 files

Downloads to date 123 TB 543,500 files 300 GB/day

(average)

300 scientific papers published to date based on analysis of IPCC AR4 data

ESGFacts and Figures

Worldwide ESG user base

0

100

200

300

400

500

600

GB

/day

Daily 7-Day Average

Nov 2004 – Oct 2006

IPCC Downloads (10/12/06)

Slide Courtesy of Dave Bernholdt, ORNL

Page 9: Introduction to  Grid Computing

9

UCSD UT

UC/ANL

NCSA

PSC

ORNL

PU

IU

A National Science Foundation Investment in Cyberinfrastructure

$100M 3-year construction (2001-2004)$150M 5-year operation &

enhancement (2005-2009)

NSF’s TeraGrid*

TeraGrid DEEP: Integrating NSF’s most powerful computers (60+ TF)

2+ PB Online Data Storage National data visualization facilities World’s most powerful network (national

footprint)TeraGrid WIDE Science Gateways:

Engaging Scientific Communities 90+ Community Data Collections Growing set of community partnerships

spanning the science community. Leveraging NSF ITR, NIH, DOE and

other science community projects. Engaging peer Grid projects such as

Open Science Grid in the U.S. as peer Grids in Europe and Asia-Pacific.

Base TeraGrid Cyberinfrastructure:Persistent, Reliable, National

Coordinated distributed computing and information environment

Coherent User Outreach, Training, and Support

Common, open infrastructure services* Slide courtesy of Ray Bair, Argonne National Laboratory

Page 10: Introduction to  Grid Computing

10Image courtesy Harvey Newman, Caltech

Data Grids forHigh Energy Physics

Tier2 Centre ~1 TIPS

Online System

Offline Processor Farm ~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance Regional Centre

Italy Regional Centre

Germany Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight (deprecated)

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 4Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

Page 11: Introduction to  Grid Computing

11

Elements of a Grid

Resource sharing Computers, storage systems, sensors, networks,… This sharing is always conditional: issues of trust, policy,

negotiation, payment, etc.

Coordinated problem solving Distributed data analysis, computation, simulation,

collaboration, …

Dynamic, multi-institutional virtual organizations Community overlays on classic organizational structures May be large or small, static or dynamic

Page 12: Introduction to  Grid Computing

12

Two Rules or Principles of the Grid Can’t rely on homogeneity of resources

In practice, resources in a large, distributed environment will be heterogeneous

STRATEGY - Plan for diverse systems and use mechanisms to manage heterogeneity

Can’t rely on trust among participants Sites will not be willing to share their resources if they

cannot trust clients from other sites STRATEGY - Provide a security model that can express

complicated social networks STRATEGY - Use full disclosure when making requests (who

is requesting, authorizing, and authenticating the request) and give service owners tools to enforce local policies.

Page 13: Introduction to  Grid Computing

14

Grid Infrastructure Provides distributed management

Of physical resources Of software services Of communities and their policies

Unified treatment Build on Web Services framework Use Web Services Resource Framework (WS-RF),

Web Services Notification (WS-Notification), etc. to represent and access state associated with a service

Common management abstractions & interfaces

Page 14: Introduction to  Grid Computing

15

Elements of the End-to-End Problem Include …

Massively parallel petascale simulation High-performance parallel I/O Remote visualization High-speed reliable data movement Terascale local analysis Data access and analysis by external users Troubleshooting problems in end-to-end system Security Orchestration of these various activities

Slide Courtesy of Ian Foster

Page 15: Introduction to  Grid Computing

Layered Grid Architecture

Page 16: Introduction to  Grid Computing

17

Layered Grid Architecture(By Analogy to Internet Architecture)

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communication (Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

InternetTransport

Application

Link

Inte

rnet P

roto

col

Arch

itectu

re

Page 17: Introduction to  Grid Computing

18

Protocols, Services,and APIs Occur at Each Level

Languages/Frameworks

Fabric Layer

Applications

Local Access APIs and Protocols

Collective Service APIs and SDKs

Collective ServicesCollective Service Protocols

Resource APIs and SDKs

Resource ServicesResource Service Protocols

Connectivity APIs

Connectivity Protocols

Page 18: Introduction to  Grid Computing

19

Important Points

Built on Internet protocols & services Communication, routing, name resolution, etc.

“Layering” here is conceptual, does not imply constraints on who can call what Protocols/services/APIs/SDKs will, ideally, be largely self-

contained Some things are fundamental: e.g., communication and

security But, advantageous for higher-level functions to use

common lower-level functions

Page 19: Introduction to  Grid Computing

20

The Hourglass Model

Focus on architecture issues Propose set of core services

as basic infrastructure Use to construct high-level,

domain-specific solutions Design principles

Keep participation cost low Enable local control Support for adaptation “IP hourglass” model

Diverse global services

Coreservices

Local OS

A p p l i c a t i o n s

Page 20: Introduction to  Grid Computing

21

Layered Grid Architecture(By Analogy to Internet Architecture)

Application

Fabric“Controlling things locally”: Access to, & control of, resources

Connectivity“Talking to things”: communication (Internet protocols) & security

Resource“Sharing single resources”: negotiating access, controlling use

Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services

InternetTransport

Application

Link

Inte

rnet P

roto

col

Arch

itectu

re

Page 21: Introduction to  Grid Computing

22GSI: www.gridforum.org/security

Connectivity LayerProtocols & Services

Communication protocols Internet protocols: IP, DNS, routing, etc.

Security protocols and infrastructure Uniform authentication, authorization, and message

protection mechanisms in multi-institutional setting Single sign-on, delegation, identity mapping E.g., Public key technology, SSL, X.509, GSS-API Supporting infrastructure: Certificate Authorities,

certificate & key management, …

Page 22: Introduction to  Grid Computing

23

Resource LayerProtocols & Services

Job submission and management tools Remote allocation, advance reservation, control of

compute resources Data Transport Tools

High-performance data access & transport Information Provider

Collects information about the current state of a resource, makes available to higher-level service

Page 23: Introduction to  Grid Computing

24

Collective LayerProtocols & Services

Information Services Aggregate and publish information about resource

characteristics Monitor current status of resources

Resource brokers Resource discovery and allocation

Metadata and Replica Catalogs Data Management Services (e.g., replication) Co-reservation and co-allocation services Workflow management services

Page 24: Introduction to  Grid Computing

25

Example:High-Throughput

Computing SystemHigh Throughput Computing System

Dynamic checkpoint, job management, failover, staging

Brokering, certificate authorities

Access to data, access to computers, access to network performance data

Communication, service discovery (DNS), authentication, authorization, delegation

Storage systems, schedulers

Collective(App)

App

Collective(Generic)

Resource

Connect

Fabric

Page 25: Introduction to  Grid Computing

26

Example: Grid Servicesfor Data-Intensive Applications

Discipline-Specific Data Grid Application

Coherency control, replica selection, task management, data placement services, …

Replica catalog, replica management, co-allocation, certificate authorities, metadata catalogs, …

Access to data, access to computers, access to network performance data, …

Communication, service discovery (DNS), authentication, authorization, delegation

Storage systems, clusters, networks, network caches, …

Collective(App)

App

Collective(Generic)

Resource

Connect

Fabric