a grid portal for earth observation community g. aloisio, m. cafaro, g. cartenì, i. epicoco, g....
TRANSCRIPT
A Grid Portal for Earth Observation A Grid Portal for Earth Observation CommunityCommunity
G. Aloisio, M. Cafaro, G. Cartenì, I. Epicoco, G. Aloisio, M. Cafaro, G. Cartenì, I. Epicoco, G. QuartaG. Quarta
8° WORKSHOP SUL CALCOLO AD ALTE PRESTAZIONI IN ITALIA8° WORKSHOP SUL CALCOLO AD ALTE PRESTAZIONI IN ITALIACAPI’04CAPI’04
Milano , 24-25 Novembre 2004Milano , 24-25 Novembre 2004
Center for Advanced Computational Technologies Center for Advanced Computational Technologies University of Lecce - ItalyUniversity of Lecce - Italy
OUTLINE
• The GRID.IT project
• Remote sensing background
• Remote sensing ISSUES
• Grid computing paradigm for remote sensing data processing and management
• The Italian Grid For Earth Observation (I-GEO) project: a prototype of grid infrastructure for the remote sensing community
• SPACI Project: the testbed infrastructure
THE GRID.IT ProjectThis project, coordinated by the National Research Council (CNR), is defined within the scientific and technological context of new ITC platforms and of large scale distributed systems. The goal is to study and to experiment systems and software tools that turn out to be innovative at all levels, as well as to demonstrate their capabilities through some specific applications.
National CoordinatorProf. Marco VanneschiUniversity of Pisa and ISTI-CNR, Pisa
Research Units CNR, ISTI, Pisa (D. Laforenza) CNR, ISTM, Perugia (M. Rosi) CNR, ICAR, Napoli (A. Murli) INFN, Padova (M. Mazzucato) CNIT, Pisa (G. Prati) ASI, Matera (G. Milillo)
THE GRID.IT ProjectWork Packages
WP1. Grid Oriented Optical Switching Paradigms (Piero Castoldi, CNIT, Pisa) WP2. High Performance Photonic Testbeds (Stefano Giordano, CNIT, Pisa) WP3. Grid Deployment (Cristina Vistoli, INFN, Padova) WP4. Security (Maurizio Talamo, Univ. of Roma "Tor Vergata") WP5. Data Intensive Core Services (Cristina Vistoli, INFN, Padova) WP6. Knowledge Services for Intensive Data Analysis (Franco Turini, Univ. of Pisa) WP7. Grid Portals (Giovanni Aloisio, Univ. of Lecce, ISUFI) WP8. High-performance Component-based Programming Environment (Marco Danelutto, Univ. of Pisa) WP9. Grid-enabled Scientific Libraries (Almerico Murli, Univ. Napoli & ICAR-CNR, Napoli) WP10. Grid Applications for Astrophysics (Leopoldo Benacchio, Inaf, Padova) WP11. Grid Applications for Earth Observation Systems Applications (Giovanni Milillo, ASI, Matera) WP12. Grid Applications for Biology (Alberto Apostolico, Univ. of Padova) WP13. Grid Applications for Molecular Virtual Reality (Antonio Laganà, Univ. Perugia & ISTM-CNR WP14. Grid Applications for Geophysics (Antonio Navarra, INGV, Bologna) WP15. Project Management (Marco Vanneschi, Univ. of Pisa)
EnergySources
Variables to be detected
Systems&Observation
Platforms
Acquisition& Archiving
Systems
Pre-Processing & Post-processing
Systems
Final Users
What is the Remote Sensing?
Satellite dish
What is the Remote Sensing?
Applications:• Earth planet study
(mapping ocean features, land use, environmental change, emergency search and rescue, natural disaster monitoring and response, etc.)
• Urban development planning (unauthorized building, traffic planning and monitoring, etc.)
• Military intelligence (targets detection and recognition, spying, etc.)
• …
Izmit’s Earthquake - 1999(SAR interferogramm)
Maastricht ‘s flooding – 1995(SAR imagery)
Straits of Messina - 1994 (SAR imagery)
Ikonos meter-resolution satellite imagery
Sicily - 2003 (ENVISAT MERIS imagery)
Remote Sensing ISSUES
• Terabyte of data per month produced by several Earth observation missions: need using high performance and capacity storage systems,
• Data mining over geographically spread archiving centers: need using efficient and secure mechanisms for data searching, accessing, manipulating and integration,
• Complex computational intensive processing: need using several applications over high performance computing resources to achieve a single result,
• Need to efficiently use the high performance computing resources.
Remote Sensing ISSUES
Terabyte of data per mount produced by several Earth observation missions
ENVISAT,ERS-1/2
JERSSRTMJERS
LANDSAT…
Remote Sensing ISSUESData mining over geographically spread
archiving center / Multi-source data integration
?Laptop
user
Remote Sensing ISSUES
Complex computational intensive processing
GRID computing point of viewToday, GRID COMPUTING is an emerging technology to solve particular problems in dynamic, multi-institutional Virtual Organizations (VOs) coordinated by sharing resources such as high-performance computer, observation devices, data and databases over high speed network, etc.
VO1
VO2
VO3
The Italian GRID for EARTH Observationproject, I-GEO
The Italian GRID for EARTH Observationproject, I-GEO
Primary Node
Primary Node
Primary Node Primary Node
Primary Node
Portal Node
ConfigurationRepository
http/https
GSI
Laptop
user
administrator
contributor
The Italian GRID for EARTH Observationproject, I-GEO
primary nodes
Laptop
users
Information system
Information system
Distributed data management system
Distributed data management system
Services Discovery Module
Services Discovery Module
WORKFLOW Interface
WORKFLOW Interface
Scheduler and Monitoring Module
Scheduler and Monitoring Module
data
Laptop
Laptop
Configuration repository
The Italian GRID for EARTH Observationproject, I-GEOI-GEO Information System
it is an information source that contains an ontology about applications usually employed in the remote sensing field. Information related to an application includes: (a) input and output data formats; (b) application capabilities (we distinguish applications with pre-processing capability, post-processing or data format conversion capability); (c) information needed to launch the application i.e. hostname, pathname, shared libraries on which the application depends, environment variables, application arguments and so on;
softwaretools
characterization
hardware resources
characterization
dataformats
characterization
The Italian GRID for EARTH Observationproject, I-GEO• Centralized approach,• Web-based management,• XML-based,• Several level of access (user, contributor, administrator)
I-GEO Information System
The Italian GRID for EARTH Observationproject, I-GEOI-GEO Distributed Data Management System
based on a common metadata schema for describing EO and geospatial products, it uses some modules belonging to the Grid Relational Catalog Project to provide transparent, efficient and secure heterogeneous data integration:
• Based on GRelC project libraries, • Peer to peer approaches,• Management of heterogeneous data sources,• Secure access to the distributed catalogues,• Efficiency granted using GridFTP data transfer protocol and data compression algorithms (Lempel-Ziv77 and Huffman coding),• Distributed and geographic queries are allowed,• Common schema for describing remote sensing data
The Italian GRID for EARTH Observationproject, I-GEOI-GEO Distributed Data Management System
• The GRelC Service (GS) modules provide a robust and uniform access interface to each remote sensing data repository (basic primitives to trasparently interact with heterogeneous data sources),• The Enhanced GRelC Gather Service (EGGS) modules gather partial resultsets coming from several GS and other EGGS, submit queries to the connected GS, forward queries to other EGGSs connected with it
The Italian GRID for EARTH Observationproject, I-GEOI-GEO Workflow Management System
an integrated workflow management system for EO applications that includes a web based user interface and a resource manager optimized for EO applications; it interacts with underlying services like the I-GEO Information System and Monitoring Systems provided by the grid middleware;• grid-workflow composition interface is provided,• the submitted workflows are verified and mapped on grid resources into real jobs and data transfers,• the workflow engine automatically starts data format conversion, if needed, in order to grant the maximum compatibility among data formats and applications
The Italian GRID for EARTH Observationproject, I-GEOI-GEO Workflow Management System
The Italian GRID for EARTH Observationproject, I-GEOI-GEO Scheduler and Monitoring Module
Is the component responsible for job scheduling and file transfer taking into account available computational resources, the locations where the datasets are stored and where the services are installed on, and several performance parameters provided both by the Network Weather Service and by an historical archive.
The software scheduling modules which operates in these two layers, are:
•Workflow Controller
•Job Scheduler
They are implemented as web services and so they communicate via SOAP messages (APACHE-AXIS is the web services container).
The execution of a workflow produces the so called Concrete Workflow the physical translation of the user’s defined, with the details of the executions and data transfers of the set of jobs.
The Italian GRID for EARTH Observationproject, I-GEO Video …
The Italian GRID for EARTH Observationproject, I-GEO• As a grid middleware we have used the GLOBUS TOOLKIT and the GRB/ GRB-GSIFTP libraries developed at the CACT/ISUFI of the University of Lecce:
the use of the Globus Toolkit as grid middleware and GRB and GRB-GSIFTP libraries, beyond simplifying the access to remote computational resources, does guarantee security, data integrity and confidential communications
• For the development of the distributed catalogue access mechanism, we have used the Grid Relational Catalogue project (GRelC) libraries developed at the CACT/ISUFI of the University of Lecce:
the use of the GRelC libraries beyond simplifying the access to the distributed catalogues, the management of heterogeneous sources of data and guarantee the needed efficiency
Some considerations about the employed technologies:
The Italian GRID for EARTH Observationproject, I-GEO
The web application has been developed using:
• cgi written in C (for the implementation of the distributed catalogue access and the authentication functionalities),
• Java Server Pages (for the implementation of the configuration repository management functionalities and the user interface),
• Java Applet (for the implementation of the workflow composition interface),
The web application is hosted on a Linux server with Apache 2 as web sever and Jakarta-Tomcat as application server.
Some considerations about the employed technologies:
The Southern Partnership for Advanced Computational Infrastrures: the testbed scenario
MIUR/HPCC
Center of Excellence forHigh Perfomance Computing
University of Calabria Director: Prof. Lucio Grandinetti
CPS/CNR
Center for Research on Parallel Computing and Supercomputing
(now Section of Naples of ICAR/CNR)Director: Prof. Almerico Murli
Southern Partnership for Advanced Computational Infrastructures
A grid infrastructure based on three geographically spread High Performance Computing Centers located in Southern Italy
ISUFI/CACT
Center for Advanced ComputingTechnologies
University of Lecce Director: Prof. Giovanni Aloisio
•In order to build an efficient environment, several aspect have been considered: the description of the resources, their efficient usage, the efficient composition of the user’s processing applications, and so on.
•Each aspects has been investigated and the architecture of the modules that have been developed has been described.
•The environment is currently being tested on a national grid test bed which uses the SPACI (Southern Partnership for Advanced Computational Infrastructures) geographically spread resources, that provides altogether a computational power of about 1800 Gflops.
•The people involved in the testbed are computer scientists, physics, experts in Earth Observations domain and so on. The heterogeneity of the involved people, guarantees that required feedback and all of the involved aspects will be properly considered and improved if needed.
•As future work, we plan to migrate towards a Grid Services architecture (OGSA compliant, based on the emerging WSRF standard)
Conclusions
References[1] G. Aloisio, M. Cafaro, "A Dynamic Earth Observation System", Parallel Computing, Volume 29, Issue 10 (2003), pp. 1357-1362, Special Issue on High Performance computing with geographical data.
[2] G. Aloisio, M. Cafaro, I. Epicoco, G. Quarta, "A Problem Solving Environment for Remote Sensing Data Processing", Proceedings of the International Conference on Information Technology (ITCC 2004), IEEE Press, April 5 to 7, Las Vegas (Nevada) USA, Volume II, pp. 56-61. [3] G. Aloisio, E. Blasi, M. Cafaro, I. Epicoco, G. Quarta, M. Tana, A. Zuccalà, “A Distributed Architecture for Remote Sensing Data Management”, Proceedings of Systemics, Cybernetics and Informatics (SCI 2004), IIIS Press, ISBN 980-6560-13-2, July 18 to 21, 2004, Orlando (Florida) USA, Volume I, pp. 236-240. [4] G. Aloisio, M. Cafaro, I. Epicoco, G. Quarta, “Information Management for Grid-Based Remote Sensing Problem Solving Environment”, Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2004), CSREA Press, ISBN 1-932415-24-6, June 21 to 24, 2004, Las Vegas (Nevada) USA, Volume II, pp. 887-893.
[5] G. Aloisio, M. Cafaro, I. Epicoco, G. Quarta, “Grid Application for Earth Observation Systems”, Proceedings of the VII Congresso SIMAI, September 20 to 24, 2004, Venice ITALY.
[6] G. Aloisio, M. Cafaro, S. Fiore, G. Quarta, “A Grid-Based Architecture for Earth Observation Data Access”, accepted to the 20th Annual ACM Symposium on Applied Computing, March 13 -17, 2005, Santa Fe, New Mexico.
More information can be found at the URL:
http://leonardo.unile.it/igeo
Director: Prof. Giovanni ALOISIO Project P. I.: Gianvito QUARTA