grid infrastructure
DESCRIPTION
Grid Infrastructure. [email protected]. What is it ?. SERVERS. Clients. IT all about IT. Hardware utilization. SOA & Web services. Decompose processing into services Each service works independently Main components: Universal Description, Discovery and Integration - PowerPoint PPT PresentationTRANSCRIPT
![Page 2: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/2.jpg)
What is it ?
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 3
SERVERS
Clients
![Page 3: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/3.jpg)
IT all about IT
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 4
![Page 4: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/4.jpg)
Hardware utilization
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 5
![Page 5: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/5.jpg)
SOA & Web services
• Decompose processing into services
• Each service works independently
• Main components:– Universal Description, Discovery and Integration– Simple Object Access Protocol – Web Services Description Language
• W3C standard
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 6
![Page 6: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/6.jpg)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 7
![Page 7: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/7.jpg)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 8
![Page 8: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/8.jpg)
THE WORLD NEEDS ONLY FIVE COMPUTERS (Thomas J. Watson)
• Google grid• Microsoft's live.com • Yahoo!• Amazon.com• eBay• Salesforce.com
Well, that's O(5) ;)
Greg Matter (http://blogs.sun.com/Gregp/entry/the_world_needs_only_five)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 9
![Page 9: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/9.jpg)
Scaling• Scale-up
– Add more resources within the system– Does not requires changes in the applications– Limited extension– Singe point of failure
• Scape-out– Add more systems– Architecture dependent (needs change of code)– Economically
• Howto ?– Split the operation into groups– Perform each group on a different machine
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 10
![Page 10: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/10.jpg)
How fast can parallelization be ?
• Let:– α be the proportion of the process that can not be
parallelized.– P – number of processors– S – System speedup
Amdhals law:
S = 1 / (α + (1- α ) / P )
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 11
![Page 11: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/11.jpg)
Cluster types• High availability
– Active-Active– Active-Passive– Heart beat
• Load Balancing Cluster– Round robin (weighted/non-weighted)– System status aware (session, cpu load, etc)
• Compute cluster– Queuing system (condor, hadoop, open-pbs, LSF, etc.)– Single system image (ScaleMP, SSI, Mosix, nomad,etc.)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 12
![Page 12: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/12.jpg)
Condor script ################# # Sample script # #################
Executable = /bin/hostname when_to_transfer_output = ON_EXIT_OR_EVICT Log = {file name}.log Error = err.$(Process) Output = out.$(Process) Requirements = substr(Machine,0,4)=="dopp"
&& ARCH=="X86_64" Arguments = +-u notification = Complete Universe = VANILLA Queue 10
![Page 13: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/13.jpg)
From a single PC to a Grid
Farm of PCs
Examples:
Seti@home
Africa@home
Example:
EGEE
Enterprise grid:Mutualization of resources in a company
Volunteer computing: CPU cycles made available by PC owners
Grid infrastructure: Internet + disk and storage resources + services for information management ( data collection, transfer and analysis)
![Page 14: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/14.jpg)
Batch to On-Line scale
gLite
&
Globus
Dedicated resources
PBS Torque
Utility computing
(Condor)hadoop
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 15
![Page 15: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/15.jpg)
Key Cloud Services Attributes• Off-Site, Thirds-party provider• Access via Internet• Minimal/no IT skills required to “implement”• Provisioning - self-service requesting; near
real-time deployment; dynamic & fine-grained scaling
• Fine-grained usage-based pricing model• UI - browser and successors• Web services APIs as System Interface• Shared resources/common versions
Source: IDC, Sep 2008
![Page 16: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/16.jpg)
What is “Grid”
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 17
![Page 17: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/17.jpg)
What is Grid Computing ?
Definition is not widely agreed
Foster & Kesselman:
• Computing resources are not administered centrally.
• Open standards are used.
• Non-trivial quality of service is achieved.
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 18
![Page 18: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/18.jpg)
Other definitions
• "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations." (Plaszczak/Wellner)
• "a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements“ (Buyya )
• "a service for sharing computer power and data storage capacity over the Internet." (CERN)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 19
![Page 19: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/19.jpg)
Virtual Organization
• What’s a VO?– People in different organisations
seeking to cooperate and share resources across their organisational boundaries
• Why establish a Grid?– Share data– Pool computers– Collaborate
• The initial vision: “The Grid”• The present reality: Many “grids” • Each grid is an infrastructure
enabling one or more “virtual organisations” to share computing resources
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 20
Institute A
VO1
Institute C
Institute B
Institute D
Institute E
VO2Institute F
![Page 20: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/20.jpg)
The Grid Metaphor
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 21
GRID
MIDDLEWARE
Visualising
Workstation
Mobile Access
Supercomputer, PC-Cluster
Data-storage, Sensors, Experiments
Internet, networks
![Page 21: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/21.jpg)
Stand alone computer
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 22
Hardware
Operating system
Application
![Page 22: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/22.jpg)
Stand alone computer
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 23
Hardware
Operating system
Network stack
Application
![Page 23: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/23.jpg)
Stand alone computer
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 24
Hardware
Operating system
Network stack
Grid Middleware
Application
![Page 24: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/24.jpg)
Middleware components – The batch approach
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 25
Information Information ServiceService
SE & CE info
Pu
blis
h
Input “sandbox” + Broker Info
ReplicaReplicaCatalogueCatalogueDataSets info
Logging &Logging &Book-keepingBook-keeping
Author.&Authen.
StorageStorageElementElement
ComputingComputingElementElement
Output “sandbox”
ResourceResourceBrokerBroker
Job Status
Job S
ub
mit
Even
t
Job
Qu
ery
Job
Stat
us
Input “sandbox”
Output “sandbox”
““User User interface”interface”
![Page 25: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/25.jpg)
UI
NetworkServer
Job Contr.
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
Characts.& status
![Page 26: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/26.jpg)
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
submitted
Job Status
UI: allows users to access the functionalitiesof the WMS(via command line, GUI, C++ and Java APIs)
![Page 27: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/27.jpg)
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
edg-job-submit myjob.jdlMyjob.jdl
JobType = “Normal”;Executable = "$(CMS)/exe/sum.exe";InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};OutputSandbox = {“sim.err”, “test.out”, “sim.log"};Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000;Rank = other.GlueCEStateFreeCPUs;
submitted
Job Statu
s
Job Description Language(JDL) to specify job characteristics and requirements
![Page 28: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/28.jpg)
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Input Sandboxfiles
Jobwaiting
submitted
Job StatusNS: network daemon
responsible for acceptingincoming requests
![Page 29: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/29.jpg)
Job submission
UI
NetworkServer
Job Contr.-
CondorG
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
WM: acts to satisfy the request
Job
Workload manager
![Page 30: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/30.jpg)
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/Broker
Where must thisjob be executed ?
![Page 31: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/31.jpg)
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/ Broker
Matchmaker: responsible to find the “best” CE for a job
![Page 32: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/32.jpg)
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/ Broker
Where are (which SEs) the needed data ?
What is thestatus of the
Grid ?
![Page 33: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/33.jpg)
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/Broker
CE choice
![Page 34: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/34.jpg)
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
JobAdapter
Job Adapter: responsible for the final “touches” to the job before performing submission(e.g. creation of wrapper script, PFN, etc.)
![Page 35: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/35.jpg)
Job submission
UI
NetworkServer
Job Contr.
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Job Status
Job Controller: responsible for theactual job managementoperations (done via CondorG)
Job
submitted
waiting
ready
![Page 36: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/36.jpg)
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Job Status
Job
submitted
waiting
ready
scheduled
![Page 37: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/37.jpg)
“Compute element” – reminder!
Homogeneous set of worker nodes
Grid gate node
Local resource management system:Condor / PBS / LSF master
Globus gatekeeper
Job request
Info system
Logging
gridmapfile
I.S.
Logging
![Page 38: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/38.jpg)
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
submitted
waiting
ready
scheduled
running
“Grid enabled”data transfers/
accesses
Job
InputSandboxfiles
![Page 39: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/39.jpg)
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
OutputSandboxfiles
submitted
waiting
ready
scheduled
running
done
![Page 40: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/40.jpg)
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
submitted
waiting
ready
scheduled
running
done
edg-job-get-output <dg-job-id>
![Page 41: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/41.jpg)
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
OutputSandboxfiles
submitted
waiting
ready
scheduled
running
done
cleared
![Page 42: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/42.jpg)
Job monitoring
UI
Log Monitor
Logging &Bookkeeping
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ComputingElement
RB node
LM: parses CondorG logfile (where CondorG logsinfo about jobs) and notifies LB
LB: receives and stores job events; processes corresponding job status
Log ofjob events
edg-job-status <dg-job-id>edg-job-get-logging-info <dg-job-id>
Job status
![Page 43: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/43.jpg)
Grid Operation and Security by Eddie Aronovich, Mar 2008 44
Approaches to Security: 1
The Poor Security House
![Page 44: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/44.jpg)
Grid Operation and Security by Eddie Aronovich, Mar 2008 45
Approaches to Security: 2
The Paranoid Security House
![Page 45: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/45.jpg)
Grid Operation and Security by Eddie Aronovich, Mar 2008 46
Approaches to Security: 3
The Realistic Security House
![Page 46: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/46.jpg)
Grid Operation and Security by Eddie Aronovich, Mar 2008 47
Mapping certificate to local user
• Site use local accounting system
• Pool of users dedicated for the Grid
• Each user is mapped using gridmap file or VOMS
• Mapping can implement local policy on external users
![Page 47: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/47.jpg)
Grid Operation and Security by Eddie Aronovich, Mar 2008 48
Certificate Request
Private Key encrypted on
local disk
CertificateRequest
Public Key
ID
Cert
User generatespublic/private
key pair.
User send public key to CA along
with proof of identity.
CA confirms identity, signs
certificate and sends back to user.
slide based on presentation given by Carl Kesselman at GGF Summer School 2004
Public
![Page 48: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/48.jpg)
Grid Operation and Security by Eddie Aronovich, Mar 2008 49
Inside the Certificate
• Standard (X.509) defined format.
• User identification (e.g. full name).
• Users Public key.
• A “signature” from a CA created by encoding a unique string (a hash) generated from the users identification, users public key and the name of the CA. The signature is encoded using the CA’s private key. This has the effect of:– Proving that the certificate came from the CA.– Vouching for the users identification.– Vouching for the binding of the users public key to their
identification.
NameIssuer: CAPublic KeySignature
![Page 49: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/49.jpg)
Grid Operation and Security by Eddie Aronovich, Mar 2008 50
Mutual Authentication
A sends their certificate;
B verifies signature in A’s certificate;
B sends to A a challenge string;
A encrypts the challenge string with his private key;
A sends encrypted challenge to B
B uses A’s public key to decrypt the challenge.
B compares the decrypted string with the original challenge
If they match, B verified A’s identity and A can not repudiate it.
AA BBA’s certificateA’s certificate
Verify CA signatureVerify CA signature
Random phraseRandom phrase
Encrypt with A’ s private keyEncrypt with A’ s private key
Encrypted phraseEncrypted phrase
Decrypt with A’ s public keyDecrypt with A’ s public key
Compare with original phraseCompare with original phrase
![Page 50: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/50.jpg)
Grid Operation and Security by Eddie Aronovich, Mar 2008 51
Proxy certificate
• Avoid passphrase re-enter by creating a proxy• Proxy consists of a new certificate and a private key• Proxy certificate contains the owner's identity (modified) • Remote party receives proxy's certificate (signed by
the owner), and owner's certificate. • Proxy certificate is life-time limited• Chain of trust from the CA to proxy through the owner
![Page 51: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/51.jpg)
Grids in Europe
www.eu-egi.eu
52EGEE08 Istanbul, Turkey
•www.eu-egi.eu
•Prof. Dieter KRANZLMUELLER , EGEE 08
![Page 52: Grid Infrastructure](https://reader037.vdocuments.site/reader037/viewer/2022110102/568134da550346895d9c0cf6/html5/thumbnails/52.jpg)
To be continued
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 53