17 may 2006 rapid prototyping capability for earth-sun system sciences preliminary design robert j....
Post on 21-Dec-2015
220 views
TRANSCRIPT
17 May 2006
Rapid Prototyping Capability
for Earth-Sun System Sciences
Preliminary Design Robert J. MoorheadMississippi State University
17 May 2006
ApproachFormulate architectures and develop baseline capacities that integrate applied sciences systems tools into configurations to support efficient evaluation of the prospects of integrating research results from NASA’s Earth observation systems (with emphasis on spacecraft instruments on missions recently launched or planned for near-term launch) and associated Earth system models
•systems engineering tools•enterprise architecture tools•information visualization and analysis tools•uncertainty characterization tools•performance assessment tools
“NASA Earth Science and Space Systems benefiting Society: Evolving Systems Engineering Capacity,” presentation by Ron Birk, August 24, 2005, SSC
17 May 2006
System Scope• Reduce the amount of time that has typically been required
to consider the utility of new or future data streams on model outcomes.
• Systematically evaluate research capabilities in a simulated operational environment in order to evaluate components and/or configurations that could be considered for verification, validation, and benchmarking for transition from research to operations and/or into an integrated system solution (ISS).
• Figure 1 illustrates the interface between the RPC and external systems that include the SN and ISS components of NASA’s Earth Science Application Plan.
17 May 2006
RPC Interface
17 May 2006
System Context
• The RPC will provide the capability to integrate and provide access to the tools needed to evaluate the use of a wide variety of current and future NASA sensors and research results, model outputs, and knowledge, collectively referred to as “resources”.
• It is assumed that the resources are geographically distributed and thus RPC will provide the support for the location transparency of the resources.
17 May 2006
RPC node
Local and remote computing and storage facilities
Remote data providers
Model configuration
Input data sets configuration Experiment design
and execution
AnalysisSystem administrationand maintenance
17 May 2006
System modes and states• Before an experiment can be performed (a particular model using a
particular data source) two conditions must be satisfied.– First, the model must be installed at some computing facility assessable to
RPC users, and configured to run;– Second, the data must be configured so that it can be used by the model.
The data configuration may involve developing tools for the data conversions (format translations, subsetting, deriving values of variables not included in the original data products, geo-processing, etc).
• From the point of view of performing a particular experiment and analysis, the RPC can be in two distinct states:– ready for the experiment and analysis by end users– requiring action of specialists for installing and configuring the model and
its data
• During its life cycle, new resources and tools will be integrated with the RPC node, increasing the repertoire of experiments and analyses that can be performed.
17 May 2006
numericalmodel
Modelresults
Modelresults
Modelresults
analysis
numericalmodel 1
Modelresults
Modelresults
analysis
numericalmodel 2
Major Categories of Experiments
Different sources Different models
17 May 2006
Capabilities Required
1. Discovery, semantic understanding, secure access, and transport mechanisms for data products available from known data providers (Science Data Manager)
2. Data assimilation and geo-processing tools for all data transformations needed to match a given data product (or products) to the model input requirements, and support for organizing the data processing into workflows built from reusable and interoperable modules, including both the workflow specification mechanisms and the workflow enacting engine (Interoperable Geo-processing Environment)
17 May 2006
Capabilities Required (cont.)3. Model management:
a. Catalog of available models, model metadata catalog (including input and output model requirements), and mechanisms for integrating new models with RPC
b. Mechanisms for creation runtime environments; data staging (in and out); job scheduling, remote execution, and monitoring
c. Mechanisms for storing model outputs together with metadata and provenance information (all information needed to recreate the output data set); the metadata necessary to enable search and discovery of model outputs
4. Tools for model output analysis (including visualizations), tools for quantitative comparing model outputs, and tools for model benchmarking (Performance Metrics Workbench)
17 May 2006
Major System Constraints• Only models and data made available to RPC users and
integrated with the RPC node can be used to perform experiments.
• Installation and/or integration of models, as well as integration and geo-processing of data, needs to be performed by a respective specialist, and the time needed to accomplish that task will depend on the complexity of the particular model and data set(s).
• Running a model may take a long time, depending on the complexity and configuration of the model. The experiments will not necessarily be performed in real time.
17 May 2006
User Categories1. System administrators – responsible for deployment, configuration,
and maintenance of the system, and its users (for access control purposes)
2. Application specialists – responsible for installation and configuration of the model on computational systems accessible to the RPC users, and integrating these models with the RPC (which includes definition of the input and output data requirements)
3. Data processing specialists – responsible for the development and the deployment of the tools for data transformations
4. Domain specialists – responsible for defining, configuring (creating workflows for data processing, setting model parameters, etc), and executing experiments
5. Domain specialist performing the data analysis
17 May 2006
Assumptions and Dependencies• The RPC will depend on data and models provided by
third parties.• Access to remote computational and storage facilities will
be controlled according to policies established by the facility owners (stakeholders).
• It is assumed that these policies will allow RPC users to submit and monitor jobs on these systems which may require penetrating firewalls.
• It is possible that the access privileges will be different for different users, depending on organizational membership, nationality, or other factors beyond the control of the RPC system developers.
17 May 2006
Operational Scenario Summary• Design of experiment – identification of models and data
sets to be used• Assessment whether the models and data are currently
integrated with the RPC node• Filling requests to model and data specialists, as needed;
the specialists issue a notification when the models and data are available
• Configuration of the experiment (setting the model parameters, configuring the data (e.g., ROI, timeframe, etc)
• Asynchronous run and monitoring of the model• Analysis
17 May 2006
Physical Issues• The RPC node will be installed on a dedicated, stand-alone system
consisting of standard commercially available computing nodes, data storage, and hosting middleware servers.
• Core RPC modular capabilities (SDM, IGE, MM, PMW) will be executed on separate computing nodes.
• The RPC node will be complemented with remote resources – high performance computing and storage facilities as needed by the models to be used in the experiments.
• The RPC node can be moved from one geographical location to another.
• Access to the remote resources will require standard internet connections.
17 May 2006
System Performance Characteristics• The primary goal of the RPC node is to provide the capability to
rapidly prototype the assimilation of new or future NASA data products and/or model derived data streams into model applications that have generated demonstrable scientific results of merit and stakeholder interest.
• However, there is no established benchmark to quantitatively specify what “rapid” means. The reference point is the current practice – manual configuration of data and models, whereas the expectation is that the RPC approach will considerably speed up the process, in particular for repeated experiments, after the baseline data and models are set up.
• However, the initial phase – setting the baseline data and models – may prove to be time consuming as it will involve model integration, data acquisition and simulation, and the development of new components for geoprocessing the data.
17 May 2006
System Performance Characteristics
• “Rapid Prototyping” performance benefits will be best realized through the reusability of configured geoprocessing tasks to provide model-ready input data to a model that has been fully integrated into the RPC.
• It is this “reuse” capability that will enable the rapid evaluation of new data types.
• By associating existing geoprocessing workflows with new data types, the rapid assimilation of next-generation data into configured models should be readily achievable.
17 May 2006
Policy and Regulation• As the RPC develops into a viable simulation system, it is
expected that activities requiring RPC resources will be requested and coordinated among those selecting an RPC for evaluation, the RPC team conducting a specific evaluation, and RPC developers who will be required to maintain and evolve the RPC to support requirements for integrating new model applications, data products, and geoprocessing tasks.
• As the RPC evolves to meet new or changing requirements, configuration management practices, version control, and developmental practices will be followed to ensure that capabilities in development will be isolated from operational RPC capabilities.
17 May 2006
Policy and Regulation• Simply stated, development activities, testing, and
integration of new functionalities into the RPC should be “contained” through the use of segregated physical or virtual systems that may be isolated from the operational instance of the RPC.
• As new capabilities mature through development processes, configuration “check-in” procedures will be followed to ensure the orderly integration of the new “proven” capabilities.
• It is likely that such activities will involve proactive participation of an RPC technical working group.
17 May 2006
System Interfaces
• The RPC node has 5 categories of users, each requiring a dedicated interface.
• In addition, the RPC interacts with two classes of external systems: data providers and remote computing and storage facilities.
• Each interface will be described in the remaining slides
17 May 2006
System Administrator Interface
The administrator interface must support the administrator tasks:
• registering and de-registering users and assigning roles
• maintaining the user credentials needed to access remote resources
• monitoring the system status and usage• backing up and restoring data and software;
recovery from faults• deployment of new software components and
services
17 May 2006
Model Specialist Interface• The model specialist is responsible for deploying and integrating the
models into the RPC environment.
• The models can be installed either locally on RPC node hardware and/or at a remote computing facility.
• To integrate the model with RPC the specialist must “register” the model, that is, generate a metadata record that describes the model in terms of its functionality, the runtime requirements (location of the executable, environmental variables, the structure of the working directory, etc.), model parameters, and definition of the input and output datasets.
• The model specialist interface must thus support the registration of new models and editing of the metadata of the existing models.
• In addition, the model specialist interface must provide support for the testing of the correctness of the model deployment.
17 May 2006
Data Specialist Interface• The data specialist identifies the data providers and designs the geo-
processing procedure for transforming the original data product to match the model input data requirements.
• The design of the geo-processing may require the development and deployment of software components to perform specified tasks.
• The data specialist interface must provide support for:– searching data products from known data providers
– assessing the structure and syntax of available data products
– assessing the model input data requirements
– discovering and evaluating the geo-processing modules already integrated with the RPC node
– integrating new geo-processing modules within the RPC node
– composing the geo-processing process from available components
– testing of the correctness of the geo-processing procedure
17 May 2006
Domain Specialist Interface• To support the design and execution of experiments, the
domain specialist interface must support:– Discovery of available models and data through the RPC facilities– Receiving and filling requests for new models and data– Configuring experiments by
• Connecting a particular model with particular data• Setting the model parameters• Configuring datasets (region of interest, timeframe, etc.)
– Submitting models for execution– Monitoring the model progress– Controlling the model execution (e.g., aborting it, if needed)– Verifying that the model completed successfully (e.g., by
examining a log file generated by the model, running a test applications, etc.)
17 May 2006
Analyst Interface
• The analyst analyses the experiment outcome. The analyst interface must:– Allow queries of the output data databases to find the
model outputs of interest
– Provide access to model outputs
– Provide access to model provenance (when and in what circumstances the model has been run, e.g., what input data sets has been used, the values of the model parameters, etc.)
– Provide access to tools (visualizations or otherwise) enabling access to the results of the experiments
17 May 2006
Data Provider Interface
• The RPC must define interfaces that allow acceptance of data streams coming from data providers.
17 May 2006
Remote Resources Interface
• The RPC must define interfaces for invoking Grid services such as allocating and monitoring remote resources, accepting notifications about status changes (i.e., a job has completed), and data transfers between RPC node and remote resources, as well as data transfers between remote resources.
• Defined interfaces must support delegation of user credentials to satisfy the access control requirements and policies of the remote resources.
17 May 2006
The End
Backup slides follow
17 May 2006
The baseline system. This four-tier architecture follows OGSA recommendations
17 May 2006
Evaluations leading to new understanding & ideas for ISS
MyRPC LIS
Functional computational capabilities of the RPC system
IGE
•Authorization•Authentication•Notification•Monitoring•Workflow•Security
•ESMF•GCMD•THREDDS•ESML•Ontology•Query
•MyRPC•Host environment•GPIR•Execution description•Application description•Grid enabled OGC Services
WorldWinds
17 May 2006
RPC Portal
MyRPC
GCMD
Service oriented architecture for Computational RPC Node[based on NSF LEAD (Drogemeier et. al., 2006)]
WRF, HSPFLIS, RAMS
DAACsCLASS
Evaluation
ESMF, GEOLEM
OGC Services
17 May 2006
CRPN
WRF ESMF
IGE
GCMD
Systems framework for CRPN, consisting of interacting subsystems in thesecure and stable RPC computational grid
[based on NSF LEAD (Drogemeier et. al., 2006)]
MyRPC workspace
LIS
WorldWinds