introduction to the grid roy williams, caltech. enzo case study simulated dark matter density in...
TRANSCRIPT
Introduction to the Grid Roy Williams, Caltech
Enzo Case Study
Simulated dark matter density in early universe
• N-body gravitational dynamics (particle-mesh method)
• Hydrodynamics with PPM and ZEUS finite-difference• Up to 9 species of H and He
• Radiative cooling
• Uniform UV background (Haardt & Madau)
• Star formation and feedback
• Metallicity fields
Enzo Features
– N-body gravitational dynamics (particle-mesh method)– Hydrodynamics with PPM and ZEUS finite-difference– Up to 9 species of H and He– Radiative cooling– Uniform UV background (Haardt & Madau)– Star formation and feedback– Metallicity fields
Adaptive Mesh Refinement (AMR)
• multilevel grid hierarchy
• automatic, adaptive, recursive
• no limits on depth,complexity of grids
• C++/F77
• Bryan & Norman (1998)
Source: J. Shalf
Distributed Computing Zoo
• Grid Computing• Also called High-Performance Computing• Big clusters, Big data, Big pipes, Big centers• Globus backbone, which now includes Services and Gateways• Decentralized control
• Cluster Computing• local interconnect between identical cpu’s
• Peer-to-Peer (Napster, Kazaa)• Systems for sharing data without centeral server
• Internet Computing• Screensaver cycle scavenging• eg SETI@home, Einstein@home, ClimatePrediction.net, etc
• Access Grid• A videoconferencing system
• Globus• A popular software package to federate resources into a grid
• TeraGrid• A $150M award from NSF to the Supercomputer centers (NCSA, SCSC, PSC, etc etc)
• The World Wide Web provides seamless access to information that is stored in many millions of different geographical locations
• In contrast, the Grid is an emerging infrastructure that provides seamless access to computing power and data storage capacity distributed over the globe.
What is the Grid?
• “Grid” was coined by Ian Foster and Carl Kesselman “The Grid: blueprint for a new computing infrastructure”.
• Analogy with the electric power grid: plug-in to computing power without worrying where it comes from, like a toaster.
• The idea has been around under other names for a while (distributed computing, metacomputing, …).
• Technology is in place to realise the dream on a global scale.
What is the Grid?
• The Grid relies on advanced software, called middleware, which ensures seamless communication between different computers and different parts of the world
• The Grid search engine will not only find the data the scientist needs, but also the data processing techniques and the computing power to carry them out
• It will distribute the computing task to wherever in the world there is spare capacity, and send the result to the scientist
How will it work?
The GRID middleware:• Finds convenient places for the scientists “job” (computing task) to be run• Optimises use of the widely dispersed resources• Organises efficient access to scientific data • Deals with authentication to the different sites • Interfaces to local site authorisation / resource allocation• Runs the jobs• Monitors progress• Recovers from problems
… and ….Tells you when the work is complete and transfers the result back!
How will it work?
Benefits for Science
• More effective and seamless collaboration of dispersed communities, both scientific and commercial
• Ability to run large-scale applications comprising thousands of computers, for wide range of applications
• Transparent access to distributed resources from your desktop, or even your mobile phone
• The term “e-Science” has been coined to express these benefits
Five Big Ideas of Grid
• Federated sharing– independent management;
• Trust and Security– access policy; authentication; authorization
• Load balancing and efficiency– Condor, queues, prediction, brokering
• Distance doesn’t matter– 20 Mbyte/sec, global certificates,
• Open standards– NVO, FITS, MPI, Globus, SOAP
Grid as Federation
Grid as a federation
independent centers
flexibility
unified interface
power and strength
Large/small state compromise
•NASA Information Power Grid•DOE Science Grid•NSF National Virtual Observatory•NSF GriPhyN•DOE Particle Physics Data Grid•NSF TeraGrid•DOE ASCI Grid•DOE Earth Systems Grid•DARPA CoABS Grid•NEESGrid•DOH BIRN•NSF iVDGL
•UK e-Science Grid•Netherlands – VLAM, PolderGrid•Germany – UNICORE, Grid proposal•France – Grid funding approved•Italy – INFN Grid•Eire – Grid proposals•Switzerland - Network/Grid proposal•Hungary – DemoGrid, Grid proposal•Norway, Sweden - NorduGrid
•DataGrid (CERN, ...)•EuroGrid (Unicore)•DataTag (CERN,…)•Astrophysical Virtual Observatory•GRIP (Globus/Unicore)•GRIA (Industrial applications)•GridLab (Cactus Toolkit)•CrossGrid (Infrastructure Components)•EGSO (Solar Physics)
Grid projects in the world
TeraGrid Wide Area Network
TeraGrid Resources
ANL/UC
IU NCSA ORNL PSC Purdue SDSC TACC
ComputeResources
Itanium2(0.5 TF)
IA-32(0.5 TF)
Itanium2(0.2 TF)
IA-32(2.0 TF)
Itanium2
(10 TF)
SGI SMP(6.5 TF)
IA-32(0.3 TF)
XT3(10 TF)TCS (6 TF)Marvel(0.3 TF)
Hetero (1.7 TF)
Itanium2
(4.4 TF)
Power4(1.1 TF)
IA-32(6.3 TF)
Sun (Vis)
Online Storage
20 TB 32 TB 600 TB 1 TB 150 TB 540 TB 50 TB
MassStorage
1.2 PB 3 PB 2.4 PB 6 PB 2 PB
Data Collections
Yes Yes Yes Yes Yes
Visualization
Yes Yes Yes Yes Yes
Instruments Yes Yes
Network(Gb/s,Hub)
30CHI
10CHI
30CHI
10ATL
30CHI
10CHI
30LA
10CHI
The TeraGrid VisionDistributing the resources is better than putting them at one site
• Recently awarded $150M by NSF• Build new, extensible, grid-based infrastructure to support
grid-enabled scientific applications– New hardware, new networks, new software, new practices, new
policies• Expand centers to support cyberinfrastructure
– Distributed, coordinated operations center– Exploit unique partner expertise and resources to make whole
greater than the sum of its parts• Leverage homogeneity to make the distributed computing
easier and simplify initial development and standardization– Run single job across entire TeraGrid– Move executables between sites
TeraGrid Allocations Policies
• Any US researcher can request an allocation– Policies/procedures posted at:
• http://www.paci.org/Allocations.html – Online proposal submission
• https://pops-submit.paci.org/
• NVO has an account on Teragrid– (just ask RW)
Wide Variety of Usage Scenarios
• Tightly coupled simulation jobs storing vast amounts of data, performing visualization remotely as well as making data available through online collections (ENZO)
• Thousands of independent jobs using data from a distributed data collection (NVO)
• Science Gateways – "not a Unix prompt"!– from web browser with security– SOAP client for scripting– from application eg IRAF, IDL
Cluster Supercomputer
100s of nodes
purged /scratch
parallel file system/home (backed-up)
login node
job submission and queueing(Condor, PBS, ..)
user
metadata node
parallel I/O
MPI parallel programming
• Each node runs same program• first finds its number (“rank”)• and the number of coordinating nodes (“size”)
• Laplace solver example
Algorithm:
Each value becomes average
of neighbor valuesnode 0 node 1
Parallel:
Run algorithm with ghost points
Use messages to exchange ghost points
Serial:
for each point, compute average
remember boundary conditions
Storage Resource Broker (SRB)
• Single logical namespace while accessing distributed archival storage resources
• Effectively infinite storage• Data replication• Parallel Transfers• Interfaces: command-line, API, SOAP,
web/portal.
Storage Resource Broker (SRB):Virtual Resources, Replication
BrowserSOAP client
Command-line....
casjobs at JHU
tape at sdsc
myDisk
Similar to NVO VOStore concept
certificate
File may be replicatedFile comes with metadata
... may be customized
Globus
• Security• Single-sign-on, certificate handling, CAS, MyProxy
• Execution Management• Remote jobs: GRAM and Condor-G
• Data Management• GridFTP, reliable FT, 3rd party FT
• Information Services• aggregating information from federated grid resources
• Common Runtime Components• New web service
Public Grids for Astronomy
• Data Pipelines– split into independent pieces, send to
scheduler• Condor, PBS, Condor-G, DAGman, Pegasus
– big data storage• infinite tape, purged disk, scratch disk• no permanent TByte disk
• Services– VOStore, SIAP– Science gateways
• asynchronous, secure, web, scripted
Public Grids for Astronomy
• Databases– Not really supported (note: ask audience if this is true)
– VO effort for this (Casjobs, VOStore)
• Simulation– Forward: 100’s synchronized nodes, MPI– Inverse: Independent trials, 1000’s of jobs