earth system grid center for enabling technologies (esg-cet) introduction and overview
DESCRIPTION
Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview. Dean N. Williams, Don E. Middleton, Ian T. Foster, and David E. Bernholdt On behalf of the ESG-CET Team Project Web Site: http://esg-pcmdi.llnl.gov Mid-Term Project Review Rockville, MD May 11, 2009. - PowerPoint PPT PresentationTRANSCRIPT
1
Earth System Grid Center For Enabling Technologies
(ESG-CET)
Introduction and OverviewDean N. Williams, Don E. Middleton, Ian T. Foster, and David E. Bernholdt
On behalf of the ESG-CET Team
Project Web Site: http://esg-pcmdi.llnl.gov
Mid-Term Project Review
Rockville, MD
May 11, 2009
2
Agenda
1. Introduction and Overview2. Overall Architecture Design3. Gateway4. Data Node Break5. Accomplishments6. Collaborations and Partnerships7. Recap of Morning Presentations Lunch8. Research and Development Break9. Demonstration10.Future Work11. Summary
Review folderhttp://esg-pcmdi.llnl.gov/review-folder
Review presentationshttp://esg-pcmdi.llnl.gov/review-folder/presentations
3
A Brief History: ESG-I, 2000-2001
The emerging challenge of climate data Proposal to DOE’s Next Generation Internet (NGI)
program in March 1999 ANL, LANL, LBNL, LLNL, NCAR, USC/ISI Data movement and replication Prototype climate “data browser” “Hottest Infrastructure” award at SC2000 NGI cut short, follow-on funding from OBER & MICS Ideas on the table, partnerships, experience Minimal end-user deployment or use Began development of SciDAC proposal
4
A Brief History: ESG-II, 2001-2006
SciDAC Program announced, began proposal in 2000 ANL, LANL, LBNL, LLNL, NCAR, ORNL, USC/ISI “Turning Climate Datasets into Community Resources” New focus on web-based portals, metadata, seamless
access to archival storage, security, operational service Uncertain about size of audience, hoping for 100-200 Very positive mid-term assessment in 2003 PCMDI accepted WGCM/CMIP role in 2004 Operational CCSM portal in 2004 Operational IPCC/CMIP portal later in 2004 In 2006, 200 TB of data, 4000 users, 130TB served
5
Purpose and Scope
Purpose Provide climate researchers worldwide with access to data, information,
models, analysis tools, and computational resources required to make sense of enormous climate simulation datasets
Scope Petabyte-scale data volumes Gateway to climate change data products, model outputs and
informational sites (i.e., globally federated sites) Comprehensive registry of climate change Earth Science research results
and components Support climate change and its partner scientists, analysts, data
managers, educators and decision makers Resource to national and international science and societal benefit
initiatives Resource to climate change data products through interoperable web
service and climate analysis tools
6
Objectives
Meet specific distributed database, data access, and data movement needs of national and international climate projects
Provide a universal and secure web-based data access portal for broad multi-model data collections
Provide a wide-range of Grid-enabled climate data analysis tools and diagnostic methods to international climate centers and U.S. government agencies.
Develop Grid technology that enhances data accessibility and usability
Make newly developed tools and technologies available for use in other domains
7
Project Participants and Focus Areas
8
Project Team
ANL• Rachana Ananthakrishnan• Ian Foster• Neill Miller• Frank Siebenlist
LBNL• Junmin Gu• Vijaya Natarajan• Arie Shoshani• Alex Sim
LLNL• Robert Drach• Dean N. Williams
LANL• Phil Jones
NCAR• David Brown• Julien Chastang• Luca Cinquini• Peter Fox• Danielle Harper• Nathan Hook
NCAR (cont.)• Don Middleton• Eric Nienhouse• Gary Strand• Patrick West• Hannah Wilcox• Nathaniel Wilhelmi• Stephan Zednik
PMEL• Steve Hankin• Roland Schweitzer
ORNL• David Bernholdt• Meili Chen• Jens Schwidder• Sudharshan Vazhkudai
USC/ISI• S. Bharathi• Ann Chervenak• Robert Schuler• Mei-Hui Su
Key
Institutional PI
Project Co-PI
Project Lead PI
Executive Committee
9
Project Organization
10
Concept Overview
Workstation Applications,Thick Clients
Standard Browser, Web Services
11
Capabilities, Usage, and Impact
Capabilities “Virtual Datasets” created through
subsetting and aggregation Metadata-based search and discovery Bulk data access Web-based access
Usage Archive Facts
• NCAR Gateway Data holdings: 198 TB Registered users: 13,000+ Data Downloaded:100 TB http://www.earthsystemgrid.org
• PCMDI/LLNL CMIP3 Gateway Data holdings: 35 TB Registered users: 3,000+ Data Downloaded:600+ TB http://www-pcmdi.llnl.gov
Over 500 sites worldwide
Over 500 scientific papers published based CMIP3 data
Average downloads: 400 to 600 GB/day
12
Data Integration Challenges Facing Climate Science
Modeling groups will generate more data in the near future than exist today Large part of research consists of writing programs to analyze data How best to collect, distribute, and find data on a much larger scale?
• At each stage tools could be developed to improve efficiency• Substantially more ambitious community modeling projects (Petabyte (PB
1015) and Exabyte (EB 1018)) will require a distributed database Metadata describing extended modeling simulations (e.g., atmospheric aerosols
and chemistry, carbon cycle, dynamic vegetation, etc.) (But wait there’s more: economy, public health, energy, etc. )
How to make information understandable to end-users so that they can interpret the data correctly
More users than just Working Group (WG) 1-science. (WG2-impacts and WG3-mitigation) (Policy makers, economists, health officials, etc.)
Integration of multiple analysis tools, formats, data from unknown sources Trust and security on a global scale (not just an agency or country, but
worldwide )
13
Complexity of Data Distribution
Future coupled runs will produce much larger data sets Storage and retrieval needs new thinking Additional quality assurance data and software Tools to facilitate publication and cataloging of output
• Publication - the act of putting data in the database and making it visible to others
• Cataloging - describes information about where a data set, file or database entity is located
Automated updating of output availability/status pages Automated notification to users with updates tailored to their
interests (new, withdrawn, replaced data) Sophisticated discovery capabilities Common data transfer tasks can be automated
14
It’s All About the Data
Data publication Data access Data viewing Data sharing Data versioning Data replication Data products Data delivery Standards and
interoperability
15
Strategic Challenges for ESG-CET
Sustain and build upon the existing ESG archives Address future scientific needs for data management and analysis by extending
support for sharing and diagnosing climate simulation data• Coupled Model Intercomparison Project, Phase 5 (CMIP5) for scientists
contributing to the IPCC Fifth Assessment Report (AR5) in 2010 • SciDAC II: A Scalable and Extensible Earth System Model for Climate Change
Science • The Climate Science Computational End Station (CCES) • The North American Regional Climate Change Assessment Program
(NARCCAP)• Other wide-ranging climate model evaluation activities
How to make information understandable to end-users so that they can interpret the data correctly
Local and remote analysis and visualization tools in a distributed environment (i.e., subsetting, concatenating, regridding, filtering, …)
• Integrating analysis into a distributed environment• Providing climate diagnostics• Delivering climate component software to the community
16
CMIP5 (IPCC AR5) is a Major Driver for ESG Development
CMIP5 multi-model archive expected to include • 3 suites of experiments (“Near-Term” decadal prediction, “Long-Term
century & longer), and “Atmosphere-Only”) • 40+ models• 600+ TB “core” data, 6+ PB total data• Contributed by 25+ modeling centers in 17+ countries
Driver for scale of data, global distribution Timeline fixed by IPCC Already working with key international partners to establish testbed
• Program for Climate Model Diagnosis and Intercomparison - PCMDI (U.S.)• National Center for Atmospheric Research - NCAR (U.S.)• Oak Ridge National Laboratory – ORNL (U.S.)• Geophysical Fluid Dynamics Laboratory - GFDL (U.S.) • British Atmosphere Data Centre - BADC (U.K.)• Max Planck Institute for Meteorology - MPIM (Germany)• JAMSTEC and University of Tokyo Center for Climate System Research
(Japan)
17
ESG-CET AR5 Timeline
2008: Design and implement core functionality:
• Browse and search• Registration• Single sign-on / security• Publication• Distributed metadata• Server-side processing
Early 2009: Testbed• Plan to include at least seven
centers in the US, Europe, and Japan
2009: Deal with system integration issues, develop production system
2010: Modeling centers publish data 2011-2012: Research and journal
articles submissions 2013: IPCC AR5 Assessment
Report
18
Title (Type) Lead
Institution - PI
Overlapping Institutions
(Individuals)
Areas of Collaboration
Climate-Science Computational End Station (CCES)
ORNL - Drake LLNL (Williams)
ORNL (Bernholdt)
Extend the capabilities of CCSM
Center for Application-Network Total-Integration for SciDAC (CANTIS)
ORNL - Rao ORNL (Bernholdt) Data movement and delivery
Center for Enabling Distributed Petascale Science (CEDPS)
ANL - Foster LLNL (Williams)
NCAR (Middleton)
ANL (Foster)
Federation and data movement
Community Access to Global Cloud Resolving Model Data
PNNL - Schuchardt
LLNL (Williams)
NCAR (Middleton)
Data movement and analysis
Community Climate System Model (CCSM) Š (CCSM Steering Committee)
NCAR - Gent NCAR (Middleton) CCSM (NSF & DOE coupled global climate model)
Data Gateways Institute Indiana - Gannon
NCAR (Middleton) ORNL (Bernholdt, Pouchard)
Metadata, provenance
National Center for Computational Sciences (NCCS)
ORNL - Nichols
ORNL (Bernholdt)
LLNL (Williams)
Computational resources
Program for Climate Model Diagnosis and Intercomparison (PCMDI)
LLNL/PCMDI ŠBader
LLNL (Williams) IPCC AR5, CCES, CCSM, MIPs
Storage Resource Management Center for Enabling Technologies
LBNL - Sim LBNL (Shoshani) Data movement and storage
Visualization and Analytics Center for Enabling Technologies
LBNL - Bethel LLNL (Williams) Visualization and analysis
Key: - Relying on ESG to reach their goals are
highlighted in “italic blue” - Relying on ESG to develop tools and
technologies are highlighted in “italic red” - Relying on ESG to deliver their products
to the climate science community are in “italic green”
ESG-CET Collaborates Extensively
Leverage best-in-class tools and capabilities developed elsewhere
Increase outreach, ability to serve scientific community, impact
Joint development of new ideas, technologies of common interest
19
Accomplishments: Development
Gateway web application (new) Data Node components integration (new publishing client integrated with
existing TDS and LAS servers, and with Gateway) Security architecture for federation across Gateways and partner Data
Centers• OpenID for web SSO• MyProxy integration for rich client access• Web Services for user attributes retrieval
Architecture for metadata exchange among Gateways and partner Data Centers (based on OAI-PMH)
BeStMan middleware for deep storage files retrieval (new) Handling and access of detailed model metadata (in collaboration with
Earth System Curator)
Two major accomplishments are the Gateway and the Data Node which form themain components of the ESG-CET architecture.
20
Accomplishments: Operational
Sustained data deliver from 2004 – present from three ESG data portals
Register over 16,000 users worldwide Over 700 TB downloaded (coming up on 1 PB milestone) Reached milestone of 500 scientific research papers
published based on CMIP3 Added C-LAMP, NARCCAP, and CFMIP to the distributed
archive
21
Future Plans
Short-term:• Packaging and documentation of Gateway software• Packaging and documentation of the Data Node software• Integration with Data Mover Lite (DML)• Federation with partner data centers
Longer-term:• Gateway customization• Expanded visualization services• Gateway and Data Node invoking more of the LAS functionality• GIS services• Google Earth services• Remote query services for rich client access• User and Group workspaces• Server-side processing and analysis services