grid computing hakan ÜnlÜ cmpe 511 presentation fall 2004
TRANSCRIPT
Grid Computing
Hakan ÜNLÜCMPE 511 Presentation
Fall 2004
Overview General Introduction to Grid Computing
Introduction: Why Grids? Applications for Grids Basic Grid Architecture Grid Platforms & Standarts
Issues in Grid Computing Hardware: Blade Computers System Management : Globus Toolkit Software: Scheduling
What is Grid Computing? Computational and Networking
Infrastructure that is designed to provide pervasive, uniform and reliable access to data, computational and human resources distributed over wide area environments
Grids Are By Definition Heterogeneous It’s about legacy resources, infrastructure,
applications, policies, and procedures The grid and its administrators must
integrate in stealth mode…with Firewalls Filesystems Queuing systems Grumpy systems administrators Tried and true applications
A Grid Example
Challenges in Grid Computing Reliable performance Trust relationships between multiple security
domains Deployment and maintenance of grid middleware
across hundreds or thousands of nodes Access to data across WAN’s Access to state information of remote processes Workflow / dependency management Distributed software and license management Accounting and billing
Applications for a Grid Generally, apps that work well on clusters
can work well on grids Non-interactive / batch jobs Parallel computations with minimal
interprocess communication and workflow dependencies
Reasonable data transfer requirements Sensible economics
Productivity Gains > Cost of Building Grid + Opportunity Costs of Resources
Non-Interactive / Batch Jobs Difficult to get a real-time UI for jobs running on
the grid A possible interactive application: spreadsheet
computation Want to take advantage of off-peak free cycles
Jobs run for several days, weeks or months The user might prefer to be sleeping while the job runs!
Running processes might need to be interrupted or re-prioritized based on the current load on a grid compute engine Idle thread / “screensaver” computing
Seti@Home
Parallel Computations Application needs to be able to run as multiple,
mostly independent pieces Can’t depend on the network’s Quality of Service Can’t rely upon the order of execution and completion Apps that need these things are better suited for tightly
coupled compute platforms (e.g. SMP systems) Grid can still be useful as a meta-scheduler and data source for
such apps e.g. the user submits the job to the grid queue and asks for the
best available SMP resource
Some Costs and BenefitsCosts: Grid Middleware Architects and
Developers User Training Infrastructure
Hardware Opportunity Costs
Would a big SMP box return better results for your problem?
Benefits: Better Utilization of
Existing Capital Resources
More Efficient Users Ability to complete
more work in the same amount of time Performance near or
sometimes as good as the big SMP box
Basic Grid Architecture Clusters and how grids are different than
clusters Departmental Grid Model Enterprise Grid Model Global Grid Model
What Makes a Cluster a Cluster? Uses a Distributed Resource Manager
(DRM) to manager job scheduling Tightly coupled - High speed, low latency
interconnect network Fairly homogenous - Configuration
management is important! Single administrative domain
The Cluster Model
RD PM3A DMMP
Operating System
StorageCompute
Cluster DRM
RD PM3A DMMP
Operating System
StorageCompute
Cluster DRM
RD PM3A DMMP
Operating System
StorageCompute
Cluster DRM
RD PM3A DMMP
Operating System
StorageCompute
Cluster DRM
RD PM3A DMMP
User Interface/API
Cluster DRM
Cluster Node Cluster Node Cluster Node Cluster Node
High SpeedInterconnect
Master Node
SharedStorage
ConfigurationManagement
How is an Enterprise Grid Different from a Cluster? Heterogeneous - Clusters, SMP, even workstations
of dissimilar configurations, but all are tied together through a grid middleware layer
Lightly coupled - Connected via 100 or 1000Mbps Ethernet
Introduces a resource registry and grid security service But usually only a single registry and security service for
the grid Not necessarily a single administrative domain
The Enterprise Grid Model
RD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceRD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceRD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceRD PM3A DMMP
Operating System
StorageCompute
Grid Interface
RD PM3A DMMP
Operating System
StorageCompute
Grid Interface
RD PM3A DMMP
User Interface/API
Grid Interface
SMP SMP
EnterpriseLAN or WAN
SecurityInfrastructure
ResourceRegistry
Grid Interface
Cluster DRM RD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceRD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceRD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceGrid Interface
Cluster DRM
RD PM3A DMMP RD PM3A DMMP
How is a Global Grid Different from an Enterprise Grid? "Grid of Grids" - Collection of enterprise
grids Loosely coupled between sites - Not much
control over Quality of Service Mutually distrustful administrative
domains Multiple grid resource registries and grid
security services
The Global Grid Model
Grid
WAN
RR SI
Cluster
Grid
SMP
Grid
SMP
Grid
Cluster
UI/API
Grid
LAN
Grid
RR SI
SMP
Grid
SMP
Grid
SMP
Grid
Cluster
Cluster
RR SI
ClusterSMP
Grid
Cluster
Grid Grid Grid
LAN
Site A
Site B
Site C
UI/API
Grid
UI/API
Grid
LAN
Grid Platforms & Standards The Global Grid Forum
http://www.gridforum.org/
Globus Toolkit DCML (Data Center Markup Language)
Globus Toolkit V2 “Pillars”
InformationServices(MDS)
DataManagement
(GASS)
ResourceManagement
(GRAM)
Grid Security Infrastructure(GSI)
Globus Toolkit V2 Stack
MDS GASS/GridFTPGRAM
GSI
HTTP LDAP FTP
TLS/SSL
TCP/IP
Globus Toolkit V2 Key Components:
GRAM, MDS and GASS Grid Resource Allocation Manager (GRAM)
Server-side: “gatekeeper” process that controls execution of job managers
Client-side: “globusrun” UI to launch jobs Monitoring and Directory Service (MDS)
GRIS: Grid Resource Information Service collects local info
GIIS: Grid Index Information Service collects GRIS info Global Access to Secondary Storage (GASS)
GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command
Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt
Globus Toolkit V2 Additional Components Grid Packaging Tools (GPT)
Used to build (“gpt-build”), install (“gpt-install”) and localize (“gpt-postinstall”) Globus components
MPICH-G2 A Globus V2 enabled version of MPI (Message
Passing Interface) Based on MPICH Utilizes GSI, MDS and GRAM
Globus Toolkit V2 Network Services
CertificateAuthority
GIISServer
GRIS
gatekeeper
in.ftpd
Grid Node
GRAMClient
Client Node
GRIS
gatekeeper
in.ftpd
Grid Node
GRIS
gatekeeper
in.ftpd
Grid Node
GRIS
gatekeeper
in.ftpd
Grid Node
Network
GRAM, MDS and GASS Interactions
resourceresourceprocessprocess
job manager
gatekeeper
process
GRAM
GRIS
resource
GIIS
MDS
GridFTPin.ftpd
GASS
job allocationjob management
resourcediscovery
data transferdata control
user / proxy
Client
RSL/DUROC/HTTP 1.1 LDAP LDAP
LDAP LDAP
gsiftp
Globus Toolkit V2 Strengths and WeaknessesStrengths: Mindshare and
collaboration in both industry & academia
Open source Standards-based
underpinnings (e.g. SSL, LDAP)
Flexibility and CoG API's Driving OGSA with
heavy resource commitment from IBM
Weaknesses: Significant effort
required to get applications working on a grid
Not production quality at this time
No “metascheduler” -- user has to explicitly tell their jobs where to run
Issues inGrid Computing
Hardware : Blades
Hardware Trends HW Trends that enable Grids and
Distributed Processing There is a lot of idle computing power Computers are now better connected There are many different brands and
configurations in any environment And Distributed Computing that give rise
to new HW architectures Blade Computers
What is a blade? Inclusive chassis-based modular
computing system that includes processors, memory, network interface cards and local storage on a single board.
BladeBlade Chasis & Blades Blade Farm
Anatomy of a blade
How far it can go?
Advantages & Disadvantages Low Cost (power,
heat, data center space)
Physical Server Consolidation (Save space, eliminate cables)
High Availability Integrated Systems
Management
Not suitable in small numbers
Need for standardization (for network connection and management)
Blades & Grid Each blade is a server that can run jobs. Blades can be used to form clusters or
grids. With efficient management different
configurations of blades can be used in a single grid computer. Easy to expand Protects investment
Issues inGrid Computing
System Management : Globus Toolkit
Globus Toolkit V2 “Pillars”
InformationServices(MDS)
DataManagement
(GASS)
ResourceManagement
(GRAM)
Grid Security Infrastructure(GSI)
Globus Toolkit V2 Stack
MDS GASS/GridFTPGRAM
GSI
HTTP LDAP FTP
TLS/SSL
TCP/IP
Globus Toolkit V2 Key Components:
GRAM, MDS and GASS Grid Resource Allocation Manager (GRAM)
Server-side: “gatekeeper” process that controls execution of job managers
Client-side: “globusrun” UI to launch jobs Monitoring and Directory Service (MDS)
GRIS: Grid Resource Information Service collects local info
GIIS: Grid Index Information Service collects GRIS info Global Access to Secondary Storage (GASS)
GridFTP, implemented through “in.ftpd” daemon and “globus-url-copy” command
Files accessed through a URI, e.g. gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt
Globus Toolkit V2 Additional Components Grid Packaging Tools (GPT)
Used to build (“gpt-build”), install (“gpt-install”) and localize (“gpt-postinstall”) Globus components
MPICH-G2 A Globus V2 enabled version of MPI (Message
Passing Interface) Based on MPICH Utilizes GSI, MDS and GRAM
Globus Toolkit V2 Network Services
CertificateAuthority
GIISServer
GRIS
gatekeeper
in.ftpd
Grid Node
GRAMClient
Client Node
GRIS
gatekeeper
in.ftpd
Grid Node
GRIS
gatekeeper
in.ftpd
Grid Node
GRIS
gatekeeper
in.ftpd
Grid Node
Network
GRAM, MDS and GASS Interactions
resourceresourceprocessprocess
job manager
gatekeeper
process
GRAM
GRIS
resource
GIIS
MDS
GridFTPin.ftpd
GASS
job allocationjob management
resourcediscovery
data transferdata control
user / proxy
Client
RSL/DUROC/HTTP 1.1 LDAP LDAP
LDAP LDAP
gsiftp
Globus Toolkit V2 Strengths and WeaknessesStrengths: Mindshare and
collaboration in both industry & academia
Open source Standards-based
underpinnings (e.g. SSL, LDAP)
Flexibility and CoG API's Driving OGSA with
heavy resource commitment from IBM
Weaknesses: Significant effort
required to get applications working on a grid
Not production quality at this time
No “metascheduler” -- user has to explicitly tell their jobs where to run
Issues inGrid Computing
Software : Scheduling
Superscheduling Superscheduling means scheduling
resources in multiple administrative domains.
Various models Submiting a job to a specific single machine Submiting a job to single machines at multiple
sites (With cancellation option) Scheduling a single job to use multiple
resources Most common superscheduler : USERS
Phases Of Superscheduling Resource Discovery
Authorisation Filtering Application Requirement Definition Minimal Requirement Filtering
System Selection Gathering Information (Query) Select Systems to run on
Run the Job Make an Advance Reservation (Optional) Submit Job to Resources Preperation Tasks Monitor Progress Job Completion Completion Tasks
Source : Global Grid Forum, Scheduling Working Group, 10 Actions When Scheduling, Schopf, 2001
Scheduling Framework (Ranganathan & Foster 2003)
External Scheduler Local Scheduler Dataset Scheduler
Scheduling And Replication Algorithms External Scheduler
JobRandom JobLeastLoaded JobDataPresent JobLocal
Dataset Scheduler DataDoNothing: No Active Replitication. Everything is on
demand DataRandom: Popular Datasets are replicated to Random Sites
DataLeastLoaded: Popular Datasets are snet to the least loaded sites.
Simulation Results
Average Response Times Average Data Transfered
Grid Computing
Thank Youand
Questions?