teragrid: a terascale distributed discovery environment
DESCRIPTION
TeraGrid: A Terascale Distributed Discovery Environment. Jay Boisseau TeraGrid Executive Steering Committee (ESC) Member and Director, Texas Advanced Computing Center at The University of Texas at Austin. Outline. What is TeraGrid? Users Requirements TeraGrid Software - PowerPoint PPT PresentationTRANSCRIPT
Jay [email protected] 1
TeraGrid: A Terascale Distributed Discovery Environment
Jay BoisseauTeraGrid Executive Steering Committee (ESC) Member
and
Director, Texas Advanced Computing Center atThe University of Texas at Austin
Jay [email protected] 2
Outline
• What is TeraGrid?• Users Requirements• TeraGrid Software• TeraGrid Resources & Support• Science Gateways• Summary
Jay [email protected] 3
What is TeraGrid?
Jay [email protected] 4
The TeraGrid Vision
• Integrating the Nation’s Most Powerful Resources– Provide a unified, general purpose, reliable set of services and
resources.– Strategy: An extensible virtual organization of people and resources
across TeraGrid partner sites.
• Enabling the Nation’s Terascale Science– Make Science More Productive through a unified set of very-high
capability resources.– Strategy: leverage TeraGrid’s unique resources to create new
capabilities driven & prioritized by science partners
• Empowering communities to leverage TeraGrid capabilities– Bring TG capabilities to the broad science community (no longer just
“big” science).– Strategy: Science Gateways connecting communities, Integrated
roadmap with peer Grids and software efforts
Jay [email protected] 5
The TeraGrid Strategy
• Building a distributed system of unprecedented scale– 40+ teraflops compute – 1+ petabyte storage– 10-40Gb/s networking
• Creating a unified user environment across heterogeneous resources– User software environment,
User support resources.– Created an initial community
of over 500 users, 80 PI’s.
• Integrating new partners to introduce new capabilities– Additional computing,
visualization capabilities
– New types of resources- data collections, instruments
Make it extensible!
Jay [email protected] 6
The TeraGrid Team
• TeraGrid Team has two major components:– 9 Resource Providers (RPs) who provide resources and
expertise• Seven universities
• Two government laboratories
• Expected to grow
– The Grid Integration Group (GIG) who provides leadership in grid integration among the RPs
• Led by Director, who is assisted by Executive Steering Committee, Area Directors, Project Manager
• Includes participation by staff at each RP
• Funding now provided for people, not just networks and hardware!
Jay [email protected] 7
Integration: Converging NSF Initiatives
• High-End Capabilities: U.S. Core Centers, TeraGrid…– Integrating high-end, production-quality supercomputer centers– Building tightly coupled, unique large-scale resources– STRENGTH: Time-critical and/or unique high-end capabilities
• Communities: GriPhyN, iVDGL, LEAD, GEON, NEESGrid…– ITR and MRI projects integrate science communities– Building community-specific capabilities and tools– STRENGTH: Community integration and tailored capabilities, high-
capacity loosely coupled capabilities
• Common Software Base: NSF/NMI, DOE, NASA programs– Projects integrating, packaging, distributing software and tools from
the Grid community– Building common middleware components and integrated
distributions– STRENGTH: Large-scale deployment, common software base,
assured-quality software components and component sets
Jay [email protected] 8
User Requirements
Jay [email protected] 9
Coherence: Unified User Environment
• Do I have to learn how to use 9 systems?– Coordinated TeraGrid Software and Services (CTSS)
• Transition toward services and service oriented architecture– From “software stack” to “software and services”
• Do I have to submit proposals for 9+ allocations? – Unified NRAC for Core and TeraGrid Resources; Roaming
allocations
• Can I use TeraGrid the way I use other Grids?– Partnership with Globus Alliance, NMI GRIDS Center, Other
Grids– History of collaboration and successful interoperation with
other Grids
Jay [email protected] 10
Teragrid User Survey
• TeraGrid capabilities must be user-driven• Undertook needs analysis Summer 2004• 16 Science Partner Teams
– Realize these may not be widely representative, so will repeat this analysis every year with increasing number of groups
• 62 items considered, top 10 needs reflected in the TeraGrid roadmap
Jay [email protected] 11
TeraGrid User Input
Remote File Read/WriteHigh-Performance File TransferCoupled Applications, Co-scheduling
Advanced Reservations
Grid Portal ToolkitsGrid Workflow Tools
Batch MetaschedulingGlobal File System
Client-Side Computing ToolsBatch Scheduled Parameter Sweep Tools
Partners in Need
Overall Score
DataGrid ComputingScience Gateways
Jay [email protected] 12
Some Common Grid Computing Use Cases
• Submitting large number of individual jobs– Requires grid scheduling to multiple systems– Requires automated data movement or common
file system
• Running on-demand jobs for time-critical applications (e.g. weather forecasts, medical treatments)– Requires preemptive scheduling– Requires fault tolerance (checkpoint/recovery)
Jay [email protected] 13
Highest Priority Items
• Common to many projects that are quite different in their specific usage scenarios:– Efficient cross-site data management– Efficient cross-site computing– Capabilities to customize Science Gateways to the
needs of specific user communities– Simplified management of accounts, allocations,
and security credentials across sites
Jay [email protected] 14
Bringing TeraGrid Capabilities to Communities
Science Gateway Prototype Discipline Science Partner(s) TeraGrid Liaison
Linked Environments for Atmospheric Discovery (LEAD)
Atmospheric Droegemeier (OU) Gannon (IU), Pennington (NCSA)
National Virtual Observatory (NVO)
Astronomy Szalay (Johns Hopkins) Williams (Caltech)
Network for Computational Nanotechnology (NCN) and “nanoHUB”
Nanotechnology Lundstrum (PU) Goasguen (PU)
National Microbial Pathogen Data Resource Center (NMPDR)
Biomedicine and Biology Schneewind (UC), Osterman (Burnham/UCSD), DeLong (MIT), Dusko (INRA)
Stevens (UC/Argonne)
NSF National Evolutionary Biology Center (NESC), NIH Carolina Center for Exploratory Genetic Analysis, State of North Carolina Bioinformatics Portal project
Biomedicine and Biology Cunningham (Duke), Magnuson (UNC)
Reed (UNC), Blatecky (UNC)
Neutron Science Instrument Gateway
Physics Dunning (ORNL) Cobb (ORNL)
Grid Analysis Environment High-Energy Physics Newman (Caltech) Bunn (Caltech)
Transportation System Decision Support
Homeland Security Stephen Eubanks (LANL) Beckman (Argonne)
Groundwater/Flood Modeling Environmental Wells (UT-Austin), Engel (ORNL) Boisseau (TACC)
Jay [email protected] 15
Bringing TeraGrid Capabilities to Communities
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5
OSG
Flood
HEP
SNS
NESC/CCEGA
OLSG
NCN
NVO
LEAD
0
1000
2000
3000
4000
5000
6000
2005 2006 2007 2008 2009
OSG
Flood
HEP
SNS
NESC/CCEGA
OLSG
NCN
NVO
LEAD
A new generation of “users” that access TeraGrid via Science Gateways, scaling well beyond the traditional “user” with a shell login account.
Projected user community size by each science gateway project.
Impact on society from gateways enabling decision support is much larger!
Jay [email protected] 16
Exploiting TeraGrid’s Unique Capabilities
ENZO (Astrophysics)
GAFEM (Ground-water modeling)
GAFEM is a parallel code, developed at North Carolina State Univ., for solution of
large scale groundwater inverse problems.
Enzo is an adaptive mesh refinement grid-based
hybrid code designed to do simulations of
cosmological structure formation (Mike Norman,
UCSD).
•Given: An (unproduced) oil field; permeability and other material properties (based on geostatistical models); locations of a few producer/injector wells•Question: Where is the best place for a third injector?•Goal: To have fully automatic methods of injector well placement optimization. (J. Saltz, OSU)
Reservoir Modeling
Water moves through aquaporin channels in single file. Oxygen leads the way in. At the most constricted point of channel, water molecule flips. Protons can’t do this. Animation pointed to by 2003 Nobel chemistry prize announcement. (Klaus Schulten, UIUC)
Aquaporin mechanism
Jay [email protected] 17
Exploiting TeraGrid’s Unique Capabilities: Flood Modeling
Merry Maisel (TACC), Gordon Wells (UT)
Jay [email protected] 18
Exploiting TeraGrid’s Unique Capabilities: Flood Modeling
• Flood Modeling needs more than traditional batch-scheduled HPC systems!– Precipitation data, groundwater data, terrain data– Rapid large-scale data visualization– On-demand scheduling– Ensemble scheduling – Real-time visualization of simulations– Computational steering of possible remedies– Simplified access to results via web portals for field agents,
decisions makers, etc.
• TeraGrid adds the data and visualization systems, portals, and grid services necessary
Jay [email protected] 19
Harnessing TeraGrid for Education Example: Nanohub is used to complete coursework by undergraduate and graduate students in dozens of courses at 10 universities.
Jay [email protected] 20
User Inputs Determine TeraGrid Roadmap
• Top priorities reflected in Grid Capabilities and Software Integration roadmap: First targets – User-defined reservations– Resource matching and wait time estimation – Grid interfaces for on-demand and reserved
access – Parallel/striped data movers– Co-scheduling service defined for high-
performance data transfers – Dedicated GridFTP transfer nodes available to
production users.
Jay [email protected] 21
TeraGrid Roadmap Defined 5 Years Out
Jay [email protected] 22
Jay [email protected] 23
Working Groups, Requirements Analysis Teams
• Working Groups– Applications – Data – External Relations – Grid – Interoperability – Networks – Operations – Performance Evaluation – Portals – Security – Software – Test Harness and
Information Services (THIS) – User Services – Visualization
• RATs– Science Gateways– Security – Advanced Application
Support– User Portal– CTSS Evolution– Data Transport Tools– Job Scheduling Tools– TeraGrid Network
Jay [email protected] 24
TeraGrid Software
Jay [email protected] 25
Software Strategy
• Identify existing solutions; develop solutions only as needed–Some solutions are frameworks
• We need to tailor software to our goals– Information services/site interfaces
–Some solutions do not exist• Software function verification
– INCA project… scripted implementation of the docs
• Global account / accounting management – AMIE
• Data Movers
• Etc.
• Deploy, Integrate, Harden, and Support!
Jay [email protected] 26
TeraGrid Software Stack Offerings
• Core Software– Grid service servers and clients– Data management and access tools– Authentication services– Environment commonality and management– Applications: springboard for workflow and service oriented work
• Platform-specific software– Compilers– Binary compatibility opportunities– Performance tools– Visualization software
• Services– Databases– Data archives– Instruments
Jay [email protected] 27
TeraGrid Software Development
• Consortium of leading project members– Define primary goals and targets– Mine helpdesk data
• Review pending software request candidates• Transition test environments to production
– Eliminate software workarounds
– Implement solutions derived from user surveys
• Deployment testbeds – Separate environments as well as alternate access points
• Independent testbeds in place
– Internal staff testing from applications teams– Initial Beta users
Jay [email protected] 28
10/04GIG Kickoff
10/06 Co-scheduling tools and environment in production CTSS Grid software modules coordinated with NMI releases Data transfer at 75% peak available bandwidth General workflow services deployed Prototype Integration of co-scheduling and workflow tools CTSS installation tools for 4-8 hour site deployment On-demand service prototype Grid components of CTSS built using NMI infrastructure
10/07 Integration of TG on-demand with OSG Co-scheduling tools and environment in production Scheduled data movement/staging Production deployment of community storage service. Evaluate commercial core Grid middleware On-demand compute services for emergency applications Support for network resource tracking
10/08 Commercial Grid middleware for some CTSS components Limited global queue metascheduling Grid checkpoint/restart prototypes
10/09 Grid checkpoint/restart for job migration in support of on-demand Full resource and accounting monitoring, scheduling, tracking
Software Integration - Critical Path Pipeline
10/05 Co-scheduling service defined for high-perf data xfers Dedicated GridFTP transfer nodes Globus Toolkit 3 functionality available Joint TG/NMI plan for packaging tools TG metascheduler
Jay [email protected] 29
Software Roadmap
• Near term Work (work in progress)– Co-scheduled file transfers– Production-level GridFTP resources– Metascheduling (grid scheduling)– Simple workflow tools
• Future directions– On-demand integration with Open Science Grid– Grid checkpoint/restart
Jay [email protected] 30
Grid Roadmap
• Near term – User-defined reservations– Web services testbeds– Resource wait time estimation
• To be used by workflow tools
– Striped data movers– WAN file system prototypes
• Longer term– Integrated tools for workflow scheduling– Commercial grid middleware opportunities
Jay [email protected] 31
TeraGrid Resources & Support
Jay [email protected] 32
TeraGrid Resource Partners
Jay [email protected] 33
TeraGrid ResourcesANL/UC Caltech IU NCSA ORNL PSC Purdue SDSC TACC
Compute
Resources
Itanium2(0.5 TF)
IA-32(0.5 TF)
Itanium2(0.8 TF)
Itanium2(0.2 TF)
IA-32(2.0 TF)
Itanium2 (10 TF)
SGI SMP(6.5 TF)
IA-32(0.3 TF)
XT3(10 TF)TCS (6 TF)Marvel(0.3 TF)
Hetero (1.7 TF)
Itanium2(4.4 TF)
Power4+(1.1 TF)
IA-32(6.3 TF)
Sun (Vis)
Online Storage
20 TB 155 TB 32 TB 600 TB 1 TB 150 TB 540 TB 50 TB
Mass
Storage
1.2 PB 3 PB 2.4 PB 6 PB 2 PB
Data Collections
Yes Yes Yes Yes Yes
Visualization Yes Yes Yes Yes Yes
Instruments Yes Yes Yes
Network
(Gb/s,Hub)
30
CHI
30
LA
10
CHI
30
CHI
10
ATL
30
CHI
10
CHI
30
LA
10
CHI
Partners will add resources and TeraGrid will add partners!
Jay [email protected] 34
TeraGrid Usage by NSF Division
MCB
CTSPHY
CHE
AST
DMR
ASC
ECSCDA
CCR IBN
BCS DMS
Includes DTF/ETF clusters only
Jay [email protected] 35
TeraGrid User Support Strategy
• Proactive and Rapid Response for General User Needs
• Sustained Assistance for Groundbreaking Applications
• GIG Coordination with staffing from all RP sites
– Area Director (AD) Sergiu Sanielevici (PSC)– Peering with Core Centers User Support teams
Jay [email protected] 36
User Support Team (UST)Trouble Tickets
• Filter TeraGrid Operations Center (TOC) trouble tickets: system issue or possible user issue
• For each Ticket, designate a Point of Contact (POC) to contact User within 24 hours– Communicate status if known– Begin dialog to consult on solution or workaround
• Designate a Problem Response Squad (PRS) to assist POC– Experts who respond to POCs postings to UST list, and/or
requested by AD– All UST members monitor progress reports and contribute their
expertise– PRS membership may evolve with our understanding of the
problem, including support from hardware and software teams• Ensure all GIG/RP/Core Helps and Learns
– Weekly review of user issues selected by AD: decide on escalation – Inform TG development plans
Jay [email protected] 37
User Support Team (UST)Advanced Support
• For applications/groups judged by TG management to be groundbreaking in exploiting DEEP/WIDE TG infrastructure
• “Embedded” Point Of Contact (labor intensive)– Becomes de-facto member of the application group– Prior working relationship with the application group a plus– Can write and test code, redesign algorithms, optimize etc
• But no throwing over the fence
– Represents needs of the application group to systems people, if required
– Alerts AD to success stories
Jay [email protected] 38
Science Gateways
Jay [email protected] 39
The Gateway Concept
• The Goal and Approach– To engage advanced scientific communities that are not
traditional users of the supercomputing centers.– We will build science gateways providing community-
tailored access to TeraGrid services and capabilities
• Science Gateways take two forms:1. Web-based Portals that front-end Grid Services that provide
TeraGrid-deployed applications used by a community.
2. Coordinated access points enabling users to move seamlessly between TeraGrid and other grids.
Jay [email protected] 40
Grid Portal Gateways• The Portal accessed through a
browser or desktop tools– Provides Grid authentication and access
to services– Provide direct access to TeraGrid
hosted applications as services
• The Required Support Services– Searchable Metadata catalogs– Information Space Management.– Workflow managers– Resource brokers– Application deployment services – Authorization services.
• Builds on NSF & DOE software– Use NMI Portal Framework, GridPort– NMI Grid Tools: Condor, Globus, etc.– OSG, HEP tools: Clarens, MonaLisa
Technical Approach
Biomedical and Biology, Building Biomedical Communities
OG
CE
Sc
ien
ce
Po
rta
l
OGCE Portletswith ContainerOGCE Portletswith Container
Apache JetspeedInternal ServicesApache JetspeedInternal Services
ServiceAPI
ServiceAPI
GridProtocols
GridServiceStubs
GridServiceStubs
RemoteContentServices
RemoteContentServices
RemoteContentServersHTTP
GridService
s
Java
Co
G K
it
LocalPortal
Services
LocalPortal
Services
Grid Resources
Open Source Tools
Build standard portals to meet the domain requirements of the biology communitiesDevelop federated databases to be replicated and shared across TeraGrid
Workflow Composer
Jay [email protected] 41
Gateways that Bridge to Community Grids• Many Community Grids already exist or
are being built
– NEESGrid, LIGO, Earth Systems Grid, NVO, Open Science Grid, etc.
• TeraGrid will provide a service framework to enable access in ways that are transparent to their users.
– The community maintains and controls the Gateway
• Different Communities have different requirements.
– NEES and LEAD will use TeraGrid to provision compute services
– LIGO and NVO have substantial data distribution problems.
• All of them require remote execution of complex workflows.
Technical Approach
•Develop web services interfaces (wrappers) for existingand emerging bioinformatics tools
• Integrate of collections of tools into Life Science servicebundles that can be deployed as persistent services onTeraGrid resources
• Integration of TG hosted Life Science services withexisting end-user tools to provide scalable analysiscapabilities
Existing User Tools(e.g. GenDB)
Life ScienceGatewayService
Dispatcher
Web ServicesInterfaces forBackendComputing
Life Science Services Bundles
..
..
..
..
TeraGridResource
Partners
On-DemandGrid Computing
StreamingObservations
Forecast Model
Data Mining
Storms Forming
Science Communities and Outreach
• Communities• CERN’s Large Hadron Collider
experiments• Physicists working in HEP and
similarly data intensive scientificdisciplines
• National collaborators and thoseacross the digital divide indisadvantaged countries
• Scope• Interoperation between LHC
Data Grid Hierarchy and ETF• Create and Deploy Scientific
Data and Services Grid Portals• Bring the Power of ETF to bear
on LHC Physics Analysis: Helpdiscover the Higgs Boson!
• Partners• Caltech• University of Florida• Open Science Grid and Grid3• Fermilab• DOE PPDG• CERN• NSF GriPhyn and iVDGL• EU LCG and EGEE• Brazil (UERJ,…)• Pakistan (NUST,…)• Korea (KAIST,…)
LHC Data Distribution Model
Jay [email protected] 42
The Architecture of Gateway Services
The Users Desktop
SecuritySecurity Data ManagementService
Data ManagementService
AccountingService
AccountingService
Notification ServiceNotification Service
PolicyPolicy Administration& Monitoring
Administration& Monitoring
Grid OrchestrationGrid OrchestrationResource
Allocation
ResourceAllocation
Reservations And Scheduling
Reservations And Scheduling
TeraGrid Gateway Services
Web Services Resource Framework – Web Services Notification
Grid Portal Server
Grid Portal Server
Physical Resource Layer
Core Grid Services
Proxy CertificateServer / vault
Proxy CertificateServer / vault
Application EventsApplication EventsResource BrokerResource Broker
User MetadataCatalog
User MetadataCatalog
Replica MgmtReplica Mgmt
ApplicationWorkflow
ApplicationWorkflow
App. Resourcecatalogs
App. Resourcecatalogs
ApplicationDeployment
ApplicationDeployment
Jay [email protected] 43
Flood Modeling Gateway
Large-scale flooding along Brays Bayou in central Houston triggered by heavy rainfall during Tropical Storm Allison (June 9, 2001) caused more than $2 billion of damage.
•University of Texas at Austin• TACC• Center for Research in Water Resources
• Center for Space Research
•Oak Ridge National Lab
•Purdue University
Gordon Wells, UT; David Maidment, UT; Budhu Bhaduri, ORNL, Gilbert Rochon, Purdue
Jay [email protected] 44
Biomedical and Biology
– Building Biomedical Communities – Dan Reed (UNC)• National Evolutionary Synthesis Center• Carolina Center for Exploratory Genetic Analysis
– Portals and federated databases for the Biomed research community
Identify Genes
Phenotype 1 Phenotype 2 Phenotype 3 Phenotype 4
Predictive Disease Susceptibility
Physiology
Metabolism Endocrine
Proteome
Immune Transcriptome
BiomarkerSignatures
Morphometrics
Pharmacokinetics
EthnicityEnvironment
AgeGender
Genetics and Disease Susceptibility
Source: Terry Magnuson, UNC
Science Communities and Outreach
• Communities• Students and educators• Phylogeneticists• Evolutionary biologists• Biomedical researchers• Biostatisticians• Computer scientists• Medical clinicians
Biomedical and Biology, Building Biomedical Communities
• Partners• University of North Carolina• Duke University• North Carolina State University• NSF National Evolutionary
Synthesis Center (NESC)• NIH Carolina Center for
Exploratory Genetic Analysis(CCEGA)
QuickTime™ and aGraphics decompressor
are needed to see this picture.
QuickTime™ and aGraphics decompressor
are needed to see this picture.
Jay [email protected] 45
Neutron Science Gateway
• Matching Instrument science with TeraGrid – Focusing on application use cases that can be uniquely served by
TeraGrid. For example, a proposed scenario from March 2003 SETENS proposal:
Neutron Science TeraGrid Gateway (NSTG)John Cobb, ORNL
Jay [email protected] 46
Summary
Jay [email protected] 47
SURA Opportunities with TeraGrid
• Identify applications in SURA universities• Leverage TeraGrid technologies in SURA grid
activities• Provide tech transfer back to TeraGrid• Deploy grids in SURA region that interoperate
with TeraGrid, allow users to ‘scale up’ to TeraGrid
Jay [email protected] 48
Summary
• TeraGrid is a national cyberinfrastructure partnership for world-class computational research, with many types of resources for knowledge discovery
• TeraGrid aims to integrate with other grids, and other researchers around the world
• All Hands Meeting in April will yield new details on roadmaps, software, capabilities, and opportunities.
Jay [email protected] 49
For More Information
• TeraGrid: http://www.teragrid.org
• TACC: http://www.tacc.utexas.edu
• Feel free to contact me directly:Jay Boisseau: [email protected]
Note: TACC is about to announce the newInternational Partnerships for Advanced Computing (IPAC) program, with initial members from Latin America and Spain, which can serve as ‘gateway’ into TeraGrid.