cyberinfrastructure for coastal forecasting and change analysis
DESCRIPTION
Cyberinfrastructure for Coastal Forecasting and Change Analysis. Gagan Agrawal Hakan Ferhatosmanoglu Xutong Niu Ron Li Keith Bedford The Ohio State University. Context. New Award from Office of Cyberinfrastructure (OCI) Under Cyberinfrastructure for Environmental Observatories Program - PowerPoint PPT PresentationTRANSCRIPT
Ohio State University Department of Computer Science and Engineering
1
Cyberinfrastructure for Coastal Cyberinfrastructure for Coastal Forecasting and Change Forecasting and Change
AnalysisAnalysisGagan Agrawal
Hakan FerhatosmanogluXutong Niu
Ron Li Keith Bedford
The Ohio State University
Ohio State University Department of Computer Science and Engineering
2
Context Context
• New Award from Office of Cyberinfrastructure (OCI)– Under Cyberinfrastructure for Environmental Observatories
Program
– September 2006 – August 2009, total amount $1,400,000
• Involves 2 Computer Scientists and 2 Environmental Scientists – G. Agrawal (PI) – Grid Middleware
– H. Ferhatosmanoglu – Databases
– K. Bedford: Great Lakes Now/Forecasting
– R. Li: Coastal Erosion Analysis
Ohio State University Department of Computer Science and Engineering
3
Coastal Forecasting and Change Coastal Forecasting and Change Detection (Lake Erie)Detection (Lake Erie)
Ohio State University Department of Computer Science and Engineering
4
Project PremiseProject Premise
• Limitation of Current Environmental Observation Systems – Tightly coupled systems
» No reuse of algorithms
» Very hard to experiment with new algorithms
– Closely tied to existing resources
• Our claim – Emerging trends towards web-services and grid-
services can help
Ohio State University Department of Computer Science and Engineering
5
Challenges Challenges
• Existing Grid Middleware Systems have not considered – Processing of Streaming Data
– Data Integration Issues
• The applications involved needs techniques for multi-modal data fusion, query planning, and data mining – Need to implement them as grid or web-services
Ohio State University Department of Computer Science and Engineering
6
Proposed Infrastructure and Proposed Infrastructure and CollaborationCollaboration
Ohio State University Department of Computer Science and Engineering
7
Application Details: Great Lakes Application Details: Great Lakes Now/ForeCasting Now/ForeCasting
• GLOS: Great Lakes Observing System – Co-designer/project manager: K. Bedford, a co-PI on
this project
– Collaboration with NOAA
• Limitations: Hard-wired – Cannot incorporate new streams or algorithms
• Create an Implementation using our Middleware for Streaming Data
Ohio State University Department of Computer Science and Engineering
8
Application Details: Coastal Erosion Application Details: Coastal Erosion Prediction and Analysis Prediction and Analysis
• Focus: Erosion along Lake Erie Shore – Serious problem – Substantial Economic Losses
• Prediction requires data from – Variety of Satellites – In-situ sensors – Historical Records
• Challenges – Analyzing distributed data – Data Integration/Fusion
Ohio State University Department of Computer Science and Engineering
9
Middleware Developed at Ohio Middleware Developed at Ohio State State
• Automatic Data Virtualization Framework – Enabling processing and integration of data in low-
level formats
• GATES (Grid-based AdapTive Execution on Streams) – Processing of distributed data streams
• FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid) – Supporting scalable data analysis on remote data
Ohio State University Department of Computer Science and Engineering
10
Automatic Data Virtualization: Automatic Data Virtualization: MotivationMotivation
• Access mechanisms for remote repositories– Complex low-level formats make accessing and
processing of data difficult
– Main desired functionality » Ability to select, down-load, and process a subset of data
• Sensor Data – Again, low level data
– Need to convert formats
– Need a flexible architecture
Ohio State University Department of Computer Science and Engineering
11
Data VirtualizationData Virtualization
An abstract view of data
dataset
Data Service Data
Virtualization
By Global Grid Forum’s DAIS working group:• A Data Virtualization describes an abstract view of data.• A Data Service implements the mechanism to access and process data through the Data Virtualization
Ohio State University Department of Computer Science and Engineering
12
Our Approach: Automatic Data Our Approach: Automatic Data VirtualizationVirtualization
• Automatically create data services – A new application of compiler technology
• A metadata descriptor describes the layout of data on a repository
• An abstract view is exposed to the users • Two implementations:
– Relational /SQL-based
– XML/XQuery based
Ohio State University Department of Computer Science and Engineering
13
Streaming Data ModelStreaming Data Model
• Continuous data arrival and processing • Emerging model for data processing
– Sources that produce data continuously: sensors, long running simulations
– Critical In Environmental Observatories • Active topic in many computer science communities
– Databases– Data Mining – Networking ….
Ohio State University Department of Computer Science and Engineering
14
Need for a Grid-Based Stream Need for a Grid-Based Stream Processing Middleware Processing Middleware
• Application developers interested in data stream processing – Will like to have abstracted
» Grid standards and interfaces » Adaptation function
– Will like to focus on algorithms only
• GATES is a middleware for – Grid-based – Self-adapting
Data Stream Processing
Ohio State University Department of Computer Science and Engineering
15
Adaptation for Real-time ProcessingAdaptation for Real-time Processing
• Analysis on streaming data is approximate • Accuracy and execution rate trade-off can be
captured by certain parameters (Adaptation parameters) – Sampling Rate – Size of summary structure
• Application developers can expose these parameters and a range of values
Ohio State University Department of Computer Science and Engineering
16
FREERIDE-G: Supporting Distributed Data-Intensive Science
Data Repository Cluster
Compute ClusterUser
?
Ohio State University Department of Computer Science and Engineering
17
Challenges for Application Challenges for Application DevelopmentDevelopment
• Analysis of large amounts of disk resident data• Incorporating parallel processing into analysis• Processing needs to be independent of other
elements and easy to specify• Coordination of storage, network and
computing resources required• Transparency of data retrieval, staging and
caching is desired
Ohio State University Department of Computer Science and Engineering
18
FREERIDE-G GoalsFREERIDE-G Goals
• Support High-End Processing– Enable efficient processing of large scale data mining
computations
• Ease Use of Parallel Configurations– Support shared and distributed memory parallelization starting
from a common high-level interface
• Hide Details of Data Movement and Caching– Data staging and caching (when feasible/appropriate) needs to be
transparent to application developer
Ohio State University Department of Computer Science and Engineering
19
Data Analysis Services Data Analysis Services
• Multi-model Multi-Sensor Data Integration – Built on our Data Virtualization Framework
• Query Planning Service – Feature Extraction: Integration with Grid Metadata
Catalogs
• Remote Mining of Spatio-Temporal Data – Built using FREERIDE-G
• Mining algorithms for Data Streams – Built using GATES