earth system grid center for enabling technologies (esg-cet) introduction and overview

21
1 Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview Dean N. Williams, Don E. Middleton, Ian T. Foster, and David E. Bernholdt On behalf of the ESG-CET Team Project Web Site: http://esg-pcmdi.llnl.gov Mid-Term Project Review Rockville, MD May 11, 2009

Upload: rhys

Post on 19-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview. Dean N. Williams, Don E. Middleton, Ian T. Foster, and David E. Bernholdt On behalf of the ESG-CET Team Project Web Site: http://esg-pcmdi.llnl.gov Mid-Term Project Review Rockville, MD May 11, 2009. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

1

Earth System Grid Center For Enabling Technologies

(ESG-CET)

Introduction and OverviewDean N. Williams, Don E. Middleton, Ian T. Foster, and David E. Bernholdt

On behalf of the ESG-CET Team

Project Web Site: http://esg-pcmdi.llnl.gov

Mid-Term Project Review

Rockville, MD

May 11, 2009

Page 2: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

2

Agenda

1. Introduction and Overview2. Overall Architecture Design3. Gateway4. Data Node Break5. Accomplishments6. Collaborations and Partnerships7. Recap of Morning Presentations Lunch8. Research and Development Break9. Demonstration10.Future Work11. Summary

Review folderhttp://esg-pcmdi.llnl.gov/review-folder

Review presentationshttp://esg-pcmdi.llnl.gov/review-folder/presentations

Page 3: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

3

A Brief History: ESG-I, 2000-2001

The emerging challenge of climate data Proposal to DOE’s Next Generation Internet (NGI)

program in March 1999 ANL, LANL, LBNL, LLNL, NCAR, USC/ISI Data movement and replication Prototype climate “data browser” “Hottest Infrastructure” award at SC2000 NGI cut short, follow-on funding from OBER & MICS Ideas on the table, partnerships, experience Minimal end-user deployment or use Began development of SciDAC proposal

Page 4: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

4

A Brief History: ESG-II, 2001-2006

SciDAC Program announced, began proposal in 2000 ANL, LANL, LBNL, LLNL, NCAR, ORNL, USC/ISI “Turning Climate Datasets into Community Resources” New focus on web-based portals, metadata, seamless

access to archival storage, security, operational service Uncertain about size of audience, hoping for 100-200 Very positive mid-term assessment in 2003 PCMDI accepted WGCM/CMIP role in 2004 Operational CCSM portal in 2004 Operational IPCC/CMIP portal later in 2004 In 2006, 200 TB of data, 4000 users, 130TB served

Page 5: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

5

Purpose and Scope

Purpose Provide climate researchers worldwide with access to data, information,

models, analysis tools, and computational resources required to make sense of enormous climate simulation datasets

Scope Petabyte-scale data volumes Gateway to climate change data products, model outputs and

informational sites (i.e., globally federated sites) Comprehensive registry of climate change Earth Science research results

and components Support climate change and its partner scientists, analysts, data

managers, educators and decision makers Resource to national and international science and societal benefit

initiatives Resource to climate change data products through interoperable web

service and climate analysis tools

Page 6: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

6

Objectives

Meet specific distributed database, data access, and data movement needs of national and international climate projects

Provide a universal and secure web-based data access portal for broad multi-model data collections

Provide a wide-range of Grid-enabled climate data analysis tools and diagnostic methods to international climate centers and U.S. government agencies.

Develop Grid technology that enhances data accessibility and usability

Make newly developed tools and technologies available for use in other domains

Page 7: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

7

Project Participants and Focus Areas

Page 8: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

8

Project Team

ANL• Rachana Ananthakrishnan• Ian Foster• Neill Miller• Frank Siebenlist

LBNL• Junmin Gu• Vijaya Natarajan• Arie Shoshani• Alex Sim

LLNL• Robert Drach• Dean N. Williams

LANL• Phil Jones

NCAR• David Brown• Julien Chastang• Luca Cinquini• Peter Fox• Danielle Harper• Nathan Hook

NCAR (cont.)• Don Middleton• Eric Nienhouse• Gary Strand• Patrick West• Hannah Wilcox• Nathaniel Wilhelmi• Stephan Zednik

PMEL• Steve Hankin• Roland Schweitzer

ORNL• David Bernholdt• Meili Chen• Jens Schwidder• Sudharshan Vazhkudai

USC/ISI• S. Bharathi• Ann Chervenak• Robert Schuler• Mei-Hui Su

Key

Institutional PI

Project Co-PI

Project Lead PI

Executive Committee

Page 9: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

9

Project Organization

Page 10: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

10

Concept Overview

Workstation Applications,Thick Clients

Standard Browser, Web Services

Page 11: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

11

Capabilities, Usage, and Impact

Capabilities “Virtual Datasets” created through

subsetting and aggregation Metadata-based search and discovery Bulk data access Web-based access

Usage Archive Facts

• NCAR Gateway Data holdings: 198 TB Registered users: 13,000+ Data Downloaded:100 TB http://www.earthsystemgrid.org

• PCMDI/LLNL CMIP3 Gateway Data holdings: 35 TB Registered users: 3,000+ Data Downloaded:600+ TB http://www-pcmdi.llnl.gov

Over 500 sites worldwide

Over 500 scientific papers published based CMIP3 data

Average downloads: 400 to 600 GB/day

Page 12: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

12

Data Integration Challenges Facing Climate Science

Modeling groups will generate more data in the near future than exist today Large part of research consists of writing programs to analyze data How best to collect, distribute, and find data on a much larger scale?

• At each stage tools could be developed to improve efficiency• Substantially more ambitious community modeling projects (Petabyte (PB

1015) and Exabyte (EB 1018)) will require a distributed database Metadata describing extended modeling simulations (e.g., atmospheric aerosols

and chemistry, carbon cycle, dynamic vegetation, etc.) (But wait there’s more: economy, public health, energy, etc. )

How to make information understandable to end-users so that they can interpret the data correctly

More users than just Working Group (WG) 1-science. (WG2-impacts and WG3-mitigation) (Policy makers, economists, health officials, etc.)

Integration of multiple analysis tools, formats, data from unknown sources Trust and security on a global scale (not just an agency or country, but

worldwide )

Page 13: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

13

Complexity of Data Distribution

Future coupled runs will produce much larger data sets Storage and retrieval needs new thinking Additional quality assurance data and software Tools to facilitate publication and cataloging of output

• Publication - the act of putting data in the database and making it visible to others

• Cataloging - describes information about where a data set, file or database entity is located

Automated updating of output availability/status pages Automated notification to users with updates tailored to their

interests (new, withdrawn, replaced data) Sophisticated discovery capabilities Common data transfer tasks can be automated

Page 14: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

14

It’s All About the Data

Data publication Data access Data viewing Data sharing Data versioning Data replication Data products Data delivery Standards and

interoperability

Page 15: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

15

Strategic Challenges for ESG-CET

Sustain and build upon the existing ESG archives Address future scientific needs for data management and analysis by extending

support for sharing and diagnosing climate simulation data• Coupled Model Intercomparison Project, Phase 5 (CMIP5) for scientists

contributing to the IPCC Fifth Assessment Report (AR5) in 2010 • SciDAC II: A Scalable and Extensible Earth System Model for Climate Change

Science • The Climate Science Computational End Station (CCES) • The North American Regional Climate Change Assessment Program

(NARCCAP)• Other wide-ranging climate model evaluation activities

How to make information understandable to end-users so that they can interpret the data correctly

Local and remote analysis and visualization tools in a distributed environment (i.e., subsetting, concatenating, regridding, filtering, …)

• Integrating analysis into a distributed environment• Providing climate diagnostics• Delivering climate component software to the community

Page 16: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

16

CMIP5 (IPCC AR5) is a Major Driver for ESG Development

CMIP5 multi-model archive expected to include • 3 suites of experiments (“Near-Term” decadal prediction, “Long-Term

century & longer), and “Atmosphere-Only”) • 40+ models• 600+ TB “core” data, 6+ PB total data• Contributed by 25+ modeling centers in 17+ countries

Driver for scale of data, global distribution Timeline fixed by IPCC Already working with key international partners to establish testbed

• Program for Climate Model Diagnosis and Intercomparison - PCMDI (U.S.)• National Center for Atmospheric Research - NCAR (U.S.)• Oak Ridge National Laboratory – ORNL (U.S.)• Geophysical Fluid Dynamics Laboratory - GFDL (U.S.) • British Atmosphere Data Centre - BADC (U.K.)• Max Planck Institute for Meteorology - MPIM (Germany)• JAMSTEC and University of Tokyo Center for Climate System Research

(Japan)

Page 17: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

17

ESG-CET AR5 Timeline

2008: Design and implement core functionality:

• Browse and search• Registration• Single sign-on / security• Publication• Distributed metadata• Server-side processing

Early 2009: Testbed• Plan to include at least seven

centers in the US, Europe, and Japan

2009: Deal with system integration issues, develop production system

2010: Modeling centers publish data 2011-2012: Research and journal

articles submissions 2013: IPCC AR5 Assessment

Report

Page 18: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

18

Title (Type) Lead

Institution - PI

Overlapping Institutions

(Individuals)

Areas of Collaboration

Climate-Science Computational End Station (CCES)

ORNL - Drake LLNL (Williams)

ORNL (Bernholdt)

Extend the capabilities of CCSM

Center for Application-Network Total-Integration for SciDAC (CANTIS)

ORNL - Rao ORNL (Bernholdt) Data movement and delivery

Center for Enabling Distributed Petascale Science (CEDPS)

ANL - Foster LLNL (Williams)

NCAR (Middleton)

ANL (Foster)

Federation and data movement

Community Access to Global Cloud Resolving Model Data

PNNL - Schuchardt

LLNL (Williams)

NCAR (Middleton)

Data movement and analysis

Community Climate System Model (CCSM) Š (CCSM Steering Committee)

NCAR - Gent NCAR (Middleton) CCSM (NSF & DOE coupled global climate model)

Data Gateways Institute Indiana - Gannon

NCAR (Middleton) ORNL (Bernholdt, Pouchard)

Metadata, provenance

National Center for Computational Sciences (NCCS)

ORNL - Nichols

ORNL (Bernholdt)

LLNL (Williams)

Computational resources

Program for Climate Model Diagnosis and Intercomparison (PCMDI)

LLNL/PCMDI ŠBader

LLNL (Williams) IPCC AR5, CCES, CCSM, MIPs

Storage Resource Management Center for Enabling Technologies

LBNL - Sim LBNL (Shoshani) Data movement and storage

Visualization and Analytics Center for Enabling Technologies

LBNL - Bethel LLNL (Williams) Visualization and analysis

Key: - Relying on ESG to reach their goals are

highlighted in “italic blue” - Relying on ESG to develop tools and

technologies are highlighted in “italic red” - Relying on ESG to deliver their products

to the climate science community are in “italic green”

ESG-CET Collaborates Extensively

Leverage best-in-class tools and capabilities developed elsewhere

Increase outreach, ability to serve scientific community, impact

Joint development of new ideas, technologies of common interest

Page 19: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

19

Accomplishments: Development

Gateway web application (new) Data Node components integration (new publishing client integrated with

existing TDS and LAS servers, and with Gateway) Security architecture for federation across Gateways and partner Data

Centers• OpenID for web SSO• MyProxy integration for rich client access• Web Services for user attributes retrieval

Architecture for metadata exchange among Gateways and partner Data Centers (based on OAI-PMH)

BeStMan middleware for deep storage files retrieval (new) Handling and access of detailed model metadata (in collaboration with

Earth System Curator)

Two major accomplishments are the Gateway and the Data Node which form themain components of the ESG-CET architecture.

Page 20: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

20

Accomplishments: Operational

Sustained data deliver from 2004 – present from three ESG data portals

Register over 16,000 users worldwide Over 700 TB downloaded (coming up on 1 PB milestone) Reached milestone of 500 scientific research papers

published based on CMIP3 Added C-LAMP, NARCCAP, and CFMIP to the distributed

archive

Page 21: Earth System Grid Center For Enabling Technologies (ESG-CET) Introduction and Overview

21

Future Plans

Short-term:• Packaging and documentation of Gateway software• Packaging and documentation of the Data Node software• Integration with Data Mover Lite (DML)• Federation with partner data centers

Longer-term:• Gateway customization• Expanded visualization services• Gateway and Data Node invoking more of the LAS functionality• GIS services• Google Earth services• Remote query services for rich client access• User and Group workspaces• Server-side processing and analysis services