registries and metadata-driven searches in a solar-terrestrial grid context
DESCRIPTION
Registries and metadata-driven searches in a solar-terrestrial grid context. Rob Bentley (UCL/MSSL) and the EGSO Team 27-29 October 2004, Greenbelt MD VOs in Space and Solar Physics Workshop. Outline. Overview of EGSO Outline of the query requirements How metadata is used within the system - PowerPoint PPT PresentationTRANSCRIPT
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Registries and metadata-driven searches in a solar-terrestrial grid
context
Rob Bentley (UCL/MSSL)and the EGSO Team
27-29 October 2004, Greenbelt MDVOs in Space and Solar Physics Workshop
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Outline
Overview of EGSOOutline of the query requirementsHow metadata is used within the system
The EGSO catalogues and Registries
Example query
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
EGSO – European Grid of Solar Observations
EGSO is a Grid test-bed related to a particular applicationDesigned to improve access to solar data for the solar physics and other communitiesAddresses the generic problem of a distributed heterogeneous data set and a scattered user community
Funded under the Information Society Technologies (IST) thematic priority of the EC’s Fifth Framework Program (FP5)
Started March 2002; duration of 36 months (or so)
Involves 12+ groups in Europe and the US, led by UCL-MSSL
4 in UK, 3 in France, 2+ in Italy, 1 in Switzerland, 2 in USSeveral associate partners, mainly in the US
Objectives include:Building enhanced search capability for solar dataSupport of user community scattered around the worldProvide access to data centres & observatories around the worldWhere possible, provide ability to process data at source
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
The Solar Virtual Observatory family
Partners and collaborators provide expertise in solar physics and IT
UK UCL-MSSL & UCL-CS, RAL, University of Bradford
France IAS (Orsay), Obs. de Paris-Meudon, International Space Univ. (Strasbourg)
Italy Istituto Nazionale di Astrofisica, Politecnico di Torino
INAf includes Obs. of Turin, Trieste, Florence and NaplesSwitzerland University of Applied Sciences (Aargau)US SDAC at NASA-GSFC, National Solar
ObservatorySDAC and NSO are also part of the US VSO
Belgium Royal Observatory of BelgiumNetherlands ESA-ESTEC – Solar Group
US VSO: Stanford University, Montana State University
CoSEC:Lockheed-MartinVSPO: LEP at NASA-GSFC (Lab. Extraterrestrial
Physics)
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Accessible Providers
Observatories Archive and Location Access type
Space-based:
SOHO SDAC/GSFC, Greenbelt MD, USAMEDOC/IAS, Orsay, France
HTTPSQL
Yohkoh SDAC/GSFC, Greenbelt MD, USA FTPRHESSI HEDC/ETHZ, Zurich, Switzerland FTPGround-based:
BBSO, KANZ, OACT, YNAO, HSOS
BBSO, Big Bear CA, USA FTP
KPNO, KSAC NSO, Tucson AZ, USA WS (VSO)
Extracted TRACE data available to EGSO through CoSEC; need to improve links to VSO “sourced” datasets (e.g for SHA, HAO/MLSO, etc.)Want to add access to space plasma data through VHO, VSPO, etc.Planning addition of BASS2000 (France), SolarNet (Italy), plus other optical & radio ground-based observatories in Europe and US
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Use of solar observations
The appearance of the Sun changes dramatically with wavelength
Emissions originate from different layers in the atmosphere and different physical phenomena
For a complete picture we need to use as wide a range of observations as possible
Mixture of multi-wavelength observations from space- and ground-based platforms
Identifying observations that match some User search criteria and then retrieving them are major problems
Heliosphere
Corona
Chromosphere-TR
Surface Magnetic Field
Photosphere
2x106 K
8x104 K
6x103 K
31JAN0316JAN03
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Linking into the wider context
Increasing desire to use solar data in study problems that span communities
Space weathero heliosphere,
magnetosphere, ionosphere…
Climate physicsPlanetary physicsAstrophysics
Need to find ways of tying these data togetherSingle data model covering all solar system not practicalIntersecting data models in a general pool should be possible
EGSO trying to achieve interoperability (at some level) with the space plasma community
Main purpose of bringing the VOs together…
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Linking into the wider context
Increasing desire to use solar data in study problems that span communities
Space weathero heliosphere,
magnetosphere, ionosphere…
Climate physicsPlanetary physicsAstrophysics
Need to find ways of tying these data togetherSingle data model covering all solar system not practicalIntersecting data models in a general pool should be possible
EGSO trying to achieve interoperability (at some level) with the space plasma community
Main purpose of bringing the VOs together…
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Generic Query
Identify suitable observations (many serendipitous)Want to access as many different types of data as are available
o Identification should be possible without accessing the datao Existing cataloguing differs in quality, contents, and
dependencieso Data volumes are increasing rapidly - SDO will produce 2 TB/day
User only wants to know if data addressing a problem exists
Locate the dataData scattered, with differing means of access (some proprietary)
o Large and small data providers, with varying resources
Process the dataInvolves extraction and calibration of a subset of raw data
o Often only need a subset of each data seto Uses code defined by instrument teams (SolarSoft, C…)
Return results to the User
Compare results from different instrumentsSolarSoft (IDL) provides a standard platform for analysis
Note exchange in order of 3rd and 4th bullets in the Grid expression of the problem
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
The EGSO Search Engine
Enhanced cataloguing describes the data more fullyStandardized versions of observing catalogues (UOC) tie together the heterogeneous data setsSearch Registry is an abstraction of entries in the UOC and allows narrowing of the search in initial stages
New types of catalogue allow searches on events, features and phenomena, not just date & time, pointing, etc…
Solar Event Catalogue (SEC) - derived from published listsSolar Feature Catalogue (SFC) - generated by feature recognition
Ancillary data used to provide additional search criteriaQLK Server provides Phone book access to images, time-series, derived products, etc.; can also do limited processingDSO Server gives Yellow Page information on instruments, etc.Similar hierarchical cataloguing required in other data Grid projects
EGSO is improving the quality and availability of metadata
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Catalogue relationships
Built from
Database of solar observations- instrument - observatory- EGSO available?- observing location- observing interval- description ….
Time coverages- instrument - observing date start- observing date end- observing parameter name- observing parameter value- data source
Solar Observations - date start- date end- wavelengths- coordinates- ….many more relevant characteristics needed for searches
UOC
SR
Event/Feature Catalogs- catalog name- event name- observing date- description ….
SEC/SFC
DSO Built from
Data Archives
Manually Built
QKL
Objective of the improved
metadata, etc. is to be able to
pose questions like:
Identify events when a filament
eruption occurred within 30° of
the north-west limb and there
were good observations in H,
EUV and soft X-rays
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Simplified Architecture
After R. Linsolas, IAS
Architecture defined in terms of three roles:
Consumer, Broker and Provider
ARCHIVES
Consumer Broker
Broker
Provider
Provider
Provider
Cat.
Consumer
GUI
GUI
GUI
EGSO GRID
API
Special Providers
Results
SEC, SFC, UOC,
DSO, QLK, CoSEC
Archive access can be
by FTP, HTTP,
Web Services, cgi-bin…through adaptor modulesBrokers manage
the metadata and decides and
allocates resources
Consumer supports GUI
and API access
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
The UOC and Search Registry
Unified Observing Catalogues (UOC)Unified (metadata) form of observing catalogues used to tie together the heterogeneous data, leaving the data unchanged
o Increase interoperability by expressing coordinates in “standard” formats
Self describing, quantized by time and instrument, with no dependencies on ancillary data or proprietary software (and with any errors corrected)Standards defined for future data sets (e.g. STEREO, ILWS, Solar-B)
Search Registry is an abstraction of entries in the UOCRegistry allows the Broker to identify instruments that:
o have data properties matching the search - Static SRo probably have observations during search time interval - Dynamic SR
Reduces need to interact with Data Providers that are unlikely to have data matching the searchStatic Search Registry (sSR) is able to support access to different types of data from solar and heliospheric observations
o Tries to describe instrument capabilities & observing objectives in common way
o Location of observation platform more important as include space plasma data, and for some future solar missions
o First step in search - later steps can be dealt with by other VOso EGSO sSR includes instruments on Ulysses, ACE, Cassini, SDO, STEREO…
The reasoning behind the UOC is universal, allows observations to be described in more interoperable way
Significant similarities in the way we use data means that the static Search Registry can be used to tie solar and heliospheric data together
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Part of the EGSO Data Model
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Contents of the Static Search Registry
Instrument and ObservatoryIncludes space and ground based, solar and heliospheric
Observing DomainWhat trying to observe
o Solar disk, interior, corona, heliosphere, magnetosphere
Observable entityPhotons, particles, fields with sub-divisions
Common termsImager, spectrometer, polarimeter, coronagraphOscillations, waves, H-alpha, composition, irradiance…
Information related to location of observatories and operating interval of the observatories & instruments in separate table
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Finding the data
Data could be located anywhere in the worldUser only needs to know observations exist, not where locatedSystem should isolate the user from the intricacies of access
System should be able to optimize use of sourcesHandling of replicated data and aggregated sources Choice of source - most capable, closest, least used, etc.Must respect any data use policies (proprietary data) and ensure integrity of data providersBurden on data providers minimized to encourage participation
In EGSO, data sources are interfaced by the Provider Role
Information about the data sources is held in the Data Registry - this is managed by the Broker role
o Which instrument data sets are hosted by each data archiveo Which data archives interfaced to each Provider Role
Provider Role uses adaptor modules used to handle different access protocols
o Standardizes way data source interface appears to the system and simplifies addition of new data sources
o Also allows access to data “hosted” by other VOs
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
EGSO GUI
GUI supports Event Drive and Date Driven queries
Others being added
Series of portlets allow user to tailor their search depending on the thrust of their query
Wavelengths, Instruments…
Advanced static Search Registry currently being deployed
API will access similar capabilities
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
EGSO GUI
GUI supports Event Drive and Date Driven queries
Others being added
Event Drive search returns list of events the used can examine and then selectA list of instruments that made observations is returnedAfter instruments are selected, list of files returned
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
EGSO GUI
GUI supports Event Drive and Date Driven queries
Others being added
Event Drive search returns list of events the used can examine and then selectA list of instruments that made observations is returnedAfter instruments are selected, list of files returned
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
EGSO GUI
GUI supports Event Drive and Date Driven queries
Others being added
Event Drive search returns list of events the used can examine and then selectA list of instruments that made observations is returnedAfter instruments are selected, list of files returned
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Query Work Flow
User specifies query through the GUI or APIStatic Search Registry narrows the search, based on the criteria specified in the query
Identifies instruments that make the desired type of observationsSearch can includes solar, heliospheric, etc… instruments
Dynamic Search Registry determines (at some granularity) which were actually observing
Includes pointing, observatory location…
User returned list of identified instruments to refine selectionData Registry used to locate archive holding the instrument data and make data requestList of files returned that can be retrieved, used to generate data products, etc.
Convert to processed products if required
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Summary
Although mainly presented EGSO approach, structure allows cross-coupling with other VOsLayered metadata provides several entry points
Commonality with other VOs differ in detail and need
Already exchange of information and resources between the solar VO projects
Approaches of the solar VOs very complementaryNeed to better link into the space plasma, etc. VOs
Main objective of this meeting is to find common ground in the way we describe and handle data to facilitate interoperability
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Useful URLs
General information about EGSO can be found under URL: http://www.egso.orgThe different components of the EGSO system, including the main entry portal, can be accessed from URL:
http://www.egso.org/demo
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
So, where are we?
Release 4a of EGSO recently become availablePeople welcome to try this…
Flexible GUI using selectable portlets has been deployed
User able to conduct date driven and event driven searches
o Event driven search accesses EGSO Solar Event CatalogueSystem searches for datasets that match search criteriaUser able to make selections at each stage of the search with aid of supporting data
Search Registry (SR) is still being developedSimplified version currently installed in GUI
o More complex version in preparation (Release 4b)Fully functional Search Registry will allow comprehensive selection of the types of instrument, data, region observed, etc
o Interoperable with the STP & heliospheric observations
Support for refining selection of data for time and spatial coordinates using cursors will be added shortly
Eur
opea
n G
rid
of S
olar
Obs
erva
tions
So, where are we? (cont…)
Special Servers are becoming operationalSEC Server is fully integrated into EGSO
o Web Service accessible relational database - SQL in, VOTable outo Several lists are already included: Flares, particle events, CMEs…o Hope to add many more in the future
DSO and SFC Servers functional, but integration not yet complete
o Both available as Web interfaces; Web Service interfaces planned o Over 3 years of data processed for Solar Feature Catalogue
QLK Server is still under development
Processing using CoSEC partially integratedGOES X-ray & energetic particle lightcurves generated for GUIIn process of adding ability to:
o generate quicklook images & movieso extract and process certain datasetso create composite plots for use in searches and publications
Several popular data sources have already been integratedSpace-based: Yohkoh, SOHO & RHESSIGround-based: NSO & Global H-alpha Network (GHAN)Planning to add TRACE & numerous GBO sources in near future