life on the edge - global geoscience data delivery - dgal 24 oct 2007 life on the edge: global...
TRANSCRIPT
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
Life On The Edge:
Global Geoscience Data Delivery
Life On The Edge:
Global Geoscience Data Delivery
Ollie RaymondOllie Raymond
with Nick Ardlie, Dale Percival, Lesley Wyborn, and Aaron Sedgmen with Nick Ardlie, Dale Percival, Lesley Wyborn, and Aaron Sedgmen
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
OutlineOutline
• Living on the scientific fringe
• A short history of digital geological map data standards at BMR-AGSO-GA
• Customers, the web, and why we need digital data standards
• GeoSciML and O&M - what are they?
• Developing web services using data standards
– Testbeds and living on the technological edge
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
Data modellers and standards developers have always been regarded by geologists as a quaint lunatic fringe who are a bit of an annoyance for their important scientific research…
…until those same geologists want to exchange their data with other geologists….
…and they spend the next week reformatting data from different sources.
Data modellers and standards developers have always been regarded by geologists as a quaint lunatic fringe who are a bit of an annoyance for their important scientific research…
…until those same geologists want to exchange their data with other geologists….
…and they spend the next week reformatting data from different sources.
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
Life on the fringe is exacerbated bydata modelling technobabble….
Life on the fringe is exacerbated bydata modelling technobabble….
“Ontology”“Ontology”
“The specification of one's conceptualisation of a knowledge domain”
“The specification of one's conceptualisation of a knowledge domain”
Que?
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
• a set of controlled vocabularies (ie, lists of agreed terms) which describe concepts in a field of interest
eg, mineral names and lithology names describing rocks in Geology
• the relationships between concepts and between the agreed terms used to describe those concepts
eg, “geological units” are composed of “rocks”
“granite” is a type of “felsic intrusive rock”
• a set of rules about how to specify the terms and relationships
“Ontology”“Ontology”
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
A short history of GA’s digital geoscience map data standards…
A short history of GA’s digital geoscience map data standards…
Pre 1990’s
• Geological data and the standards that govern it have come a long way since the day of the old BMR cartographic symbols book
• Last printed in 1989, the symbols book described the appearance, but rarely the meaning, of every line and symbol on a printed geological map
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
The 1990’s – the decade of GIS
• BMR’s first proposed GIS data dictionary for geological map data was written in 1992 by a very young Robyn Gallagher when BMR realised that the new digital GIS products had no quality control
• Some basic map data themes
– geological unit polygons and boundaries
– structures including faults, veins and dykes, and folds
• It also described some point located datasets including
– outcrop locations, structural measurements, geochemical analyses and mineral occurrences
• some cartographic frames and graticules
• less than 8 pages long
A short history of GA’s digital geoscience map data standards…
A short history of GA’s digital geoscience map data standards…
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
• The data dictionary was extended over the next 5 years until AGSO merged with AUSLIG and we geologists were exposed to the much more rigorous standards definitions used by AUSLIG
• The result was the GA Geoscience Data Dictionary for Spatial Data
– 86 different spatial data themes
– minerals, petroleum, regolith and marine geology and geophysics
– mines, wells/drillholes, topography, urban, cultural and infrastructure themes
– cartographic layers
A short history of GA’s digital geoscience map data standards…
A short history of GA’s digital geoscience map data standards…
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
• They want the best and most current geoscience data
• they want it free
• and they want it NOW, 24-7
• And they want to take GA’s and other Federal Govt data, the States’ data, CSIRO’s data, international data, and combine it with their own data
• And they want to use all of this data in any number of 2D and 3D modelling and display software applications
• Our software-specific, agency-specific data standards don’t cut the mustard when customers are trying to integrate data across jurisdictions
The customers…The customers…
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
The problem• access to Government geoscience information is fragmented and inefficient
Delivering Government Digital
Geoscience Data
Delivering Government Digital
Geoscience Data
Minerals Exploration Action Agenda …
• existing information is distributed across eight state and federal agencies
• each with its own information management systems and data formats
• up to 80% of time acquiring pre-competitive data is taken up by reformatting disparate data from government sources
• a disincentive to exploration
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
Government Geoscience OnlineGovernment Geoscience Online
• 8 online geoscience delivery systems
• 8 data structures
• 2 proprietary (software-specific) data formats
• cannot access more than one agency’s data at a time
• 8 online geoscience delivery systems
• 8 data structures
• 2 proprietary (software-specific) data formats
• cannot access more than one agency’s data at a time
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
WA NT
Description
Label
Age
Rationalising data sourcesRationalising data sources
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
WA NT
Rationalising data sourcesRationalising data sources
ESRI ESRI MAPINFO MAPINFO
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
The problem
• You CAN get them to agree on a software-independent
DATA TRANSFER STANDARD
The solution
• How do you get 8 Australian jurisdictions to provide digital geoscience map data in the same format?
• You will never get them to agree to change their agency database structures to a single structure
• You will never get them to agree to use the same software for data maintenance and delivery
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
InteroperabilityInteroperability
What is a Digital Data Standard good for?What is a Digital Data Standard good for?
• A common data structure in which you deliver your data,
• is software-independent,
• but most of all, a digital data standard enables ….
Lesley Wyborn, 2005
My stuff works with your stuffMy stuff works with your stuff
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
GeoSciML
GeoScience
Markup Language
GeoSciML
GeoScience
Markup Language
O & M
Observations and
Measurements
O & M
Observations and
Measurements
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
Committee for the Management and Application
of Geoscience Information
Committee for the Management and Application
of Geoscience Information
Australia
USA
Canada
France
UK
Sweden
Italy
Japan
Interoperability Working Group
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
CGI Interoperability Working GroupCGI Interoperability Working Group
geologists, geophysicists, information modellers, web programmersgeologists, geophysicists, information modellers, web programmers
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
Geological Data Model
• A logical data structure
• a complex model (hierarchical, relational)
• tells users what geological information goes where
• and what terminology is to be used (vocabularies)
• scientifically robust, developed by the scientific community
• internationally agreed
• data providers need only to “map” their own local data structures to the data transfer structure
• data providers don’t need to change their local database structures to use the transfer standard
What is GeoSciML? (Part 1)What is GeoSciML? (Part 1)
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
• Constructed using UML (Unified Modelling Language) tools
• Presented as a series of class diagrams which show the attributes of and relationships between geological features and other data types
The GeoSciML Data ModelThe GeoSciML Data Model
e.g. Geologic(al) units
• composition (earth materials)• metamorphism• weathering character• physical properties• related structures
• unit types (eg, lithostratigraphic, chronostratigraphic)
• age and geological history (events)
• unit parts (child/parent relations)
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
GEODX.STRATNAMES.TOPMINAGENAME
GEODX.STRATNAMES.BASEMAXAGENAME
GEODX.STRATNAMES.TOPMINAGENAME
GEODX.STRATNAMES.BASEMAXAGENAME
GEODX.STRATLITHS.LITHOLOGYGEODX.STRATLITHS.LITHOLOGY
GEODX.RANKSYNONYMS.RANKNAMEGEODX.RANKSYNONYMS.RANKNAME
SDE.CDI_VICSTRATS.FORMTYPESDE.CDI_VICSTRATS.FORMTYPE
“Mapping” your database
to GeoScIML
“Mapping” your database
to GeoScIML
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
What is GeoSciML? (Part 2)What is GeoSciML? (Part 2)
XML encoding
• the markup language used to deliver the model to the internet
• builds on established internet standards such as GML (Geographic Markup Language)
• open source
• software independent
• machine readable
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
<Rock> <gml:description>Medium to fine-grained lithic sandstone to siltstone</gml:description> <gml:name codeSpace="http://www.ga.gov.au">Site 95846001 Rock #1</gml:name> <color> <CGI_TermValue> <value codeSpace="http://www.ga.gov.au/GeologicalVocabs">grey-green</value> </CGI_TermValue> </color> <compositionCategory> <CGI_TermValue> <value codeSpace="http://www.ga.gov.au/GeologicalVocabs">siliciclastic</value> </CGI_TermValue> </compositionCategory> <geneticCategory> <CGI_TermValue> <value codeSpace="http://www.ga.gov.au/GeologicalVocabs">clastic sedimentary</value> </CGI_TermValue> </geneticCategory> <particleGeometry> <ParticleGeometryDescription> <size> <CGI_Value xsi:type="CGI_TermValueType"> <value codeSpace="http://www.ga.gov.au/GeologicalVocabs">medium (1-5mm)</value> </CGI_Value> </size> </ParticleGeometryDescription> </particleGeometry> <fabric> <FabricDescription> <fabricType xlink:type="simple"> <ControlledConcept gml:id="unique_ID_for_NormalGrading"> <preferredName>normal grading</preferredName> <vocabulary xlink:type="simple" xlink:href="www.ga.gov.au/GA_fabric_vocabulary"/> </ControlledConcept> </fabricType> </FabricDescription>
.
.
.
.
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
What is O&M?What is O&M?
Like GeoSciML, it is a data model and GML schema
• A data model for any type of scientific observation, measurement and sampling frame
• It is a more generic model, not just for geoscience - it is less prescriptive than GeoSciML
• It provides a platform on which individual science communities can build more domain-specific types of observation, measurement and sampling
• For example, the GeoSciML working group have adopted the “sampling point” and “sampling curve” models of the O&M standard for geological use in delivering outcrop sample locations and boreholes
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
What is O&M?What is O&M?
• The O&M standard is more mature than GeoSciML
• It is nearing full ratification by the Open Geospatial Consortium
• Two Australian members on the review panel - Simon Cox (CSIRO) senior editor, and Nick Ardlie (GA)
• Aim to be submitted to ISO this year
• Already used in GA in developmentof the Located Sample Data SPOT
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
International TestbedsInternational Testbeds
Testbed 1. 2005 - A borehole demonstrator between UK and France
Testbed 2. 2006 – A six nation demonstrator delivering geologicalmap data from globally distributed sources using GeoSciML v1.1
• successfully demonstrated WMS/WFS delivery, display and download of distributed data sources and simple query functions
• but lacked true interoperability between data sources
• leading edge technology
• suffered a little from immature and shifting standards in WMS/WFS, GML and GeoSciML
• developing web services software
• and no-one had done this before with such a complex data model
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
GeoSciML
Vancouver, CAVancouver, CAUppsala, SVUppsala, SV
Canberra, AUCanberra, AU
Ottawa, CAOttawa, CA
Reston, VAReston, VA
Keyworth, UKKeyworth, UK
Portland, ORPortland, OR
Orleans, FROrleans, FR
GeoSciML Testbed2
Accessing GeoSciML data using a web client in Canada
GeoSciML Testbed2
Accessing GeoSciML data using a web client in Canada
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
GeoSciML Testbed2 (Canadian client)• Display data from distributed sources in a map and query a feature
GeoSciML Testbed2 (Canadian client)• Display data from distributed sources in a map and query a feature
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
International TestbedsInternational Testbeds
Testbed 3. 2007/8 (in progress)
• An eight nation demonstrator using GeoSciML v2.0
Aims:
• to test true interoperability of WMS and WFS services usingboth the GeoSciML and O&M data standards
• to test WFS query functionality within a complex data model
• to test the ability of various software applications to consume data in GeoSciML format
• to test registry services to discover and deliver geoscience information from distributed sources
Testbed 3. 2007/8 (in progress)
• An eight nation demonstrator using GeoSciML v2.0
Aims:
• to test true interoperability of WMS and WFS services usingboth the GeoSciML and O&M data standards
• to test WFS query functionality within a complex data model
• to test the ability of various software applications to consume data in GeoSciML format
• to test registry services to discover and deliver geoscience information from distributed sources
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
GeoSciML
Vancouver, CAVancouver, CAUppsala, SVUppsala, SV
Canberra, AUCanberra, AU
Ottawa, CAOttawa, CA
Reston, VAReston, VA
Keyworth, UKKeyworth, UK
Portland, ORPortland, OR
GeoSciML Testbed3 Registry services
GeoSciML Testbed3 Registry services
JapanJapanItalyItaly
Orleans, FRREGISTRYREGISTRY
• Multilingual vocabularies• Map legends - (StyledLayerDescriptors)• Lists of available WMS and WFS services from distributed sources
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
Integrated MapMap Data Service 1
Map DataService 2
Levels of Map Data Interoperability (after Brodaric, 2007)
Levels of Map Data Interoperability (after Brodaric, 2007)
systems systems Data Services (WMS, WFS)
semantic semantic Data Content (Vocabularies)
schematic schematic Data Structure (GeoSciML, O&M)
syntax syntax Data Language (GML)
pragmatic pragmatic Data Context (Geologist)
Achieving Interoperability with Map DataAchieving Interoperability with Map Data
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
The 3 most important things to consider in constructing
an interoperable web service testbed
1. compliance
2. compliance
3. compliance
- to OGC web standards (WMS, WFS)
- to the data model schema (GeoSciML)
- to agreed vocabularies
Lessons Learnt from Testbed2Lessons Learnt from Testbed2
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
• The GeoSciML data model contains much interpretive and text-
based data. There is not a large amount of relatively simple
numerical data
• This means that semantic compliance (ie, compliance to many
controlled vocabularies) is not a trivial exercise
• But compliance to vocabularies (eg, for Age) is crucial to be able to
construct standardised WFS / WMS requests on distributed data
• This became evident very quickly in Testbed2 in trying to execute
the agreed use cases
- eg, select geologic features where Age = “xxx”
Semantic InteroperabilitySemantic Interoperability
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
Cainozoic?
Palaeozoic?
Archaean?
Bolindian?Eastonian?Gisbornian?
Late?Early?
Semantic InteroperabilitySemantic Interoperability
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
Flexibility in data representation
• a feature of the GeoSciML model allows representation
of some data in different ways according to a user’s need
eg, geologic age- single numeric value (eg: 455 Ma)
- single defined text value (eg: Ordovician)
- lower and upper value range
(eg: 420 to 460 Ma; Silurian to Ordovician)
Schematic InteroperabilitySchematic Interoperability
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
This pattern is flexible and entirely representative of how
geologists use Age information,
BUT….
• It is an issue for interoperability
- how do you process a query on Age if the data
in
different datasets is in different, but still schema
compliant, formats?
Schematic InteroperabilitySchematic Interoperability
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
• Testbed2 example of a WFS query on “age”
• Client’s decision to query on “upper age” only
Schematic InteroperabilitySchematic Interoperability
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
• GeoSciML v2.0 now contains a preferredAge attribute
- a single value attribute designed purely to allow simpler and more straightforward queries on Age
Schematic Interoperability+
pragmatism
Schematic Interoperability+
pragmatism
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
• Existing proprietary vendor software and open source software
aims to support the detail of OGC web service specifications
(e.g. GML and complex features)
…but they are still being developed
• Much collaborative work was done with software developers
during Testbed2 to be able to serve the complex feature model
needed for geological information
Software capabilitiesSoftware capabilities
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
• Highlighted both the capabilities and the limitations of
Web Feature Service and OGC standards in a real-world,
complex feature environment
• Highlighted technical challenges for software developers and
vendors to be able to deliver and consume OGC-compliant,
complex feature WFS services
Lessons learnt from “Life on the Edge” (a.k.a. GeoSciML Testbed2)
Lessons learnt from “Life on the Edge” (a.k.a. GeoSciML Testbed2)
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
• Highlighted the need to establish well-defined limits on use cases for any web data services. Unlimited interoperability of complex geoscience data is not realistic
• Highlighted the importance of rigorous documentation of the data model to guide participants in a distributed network
• Risk analysis at the pointy end of R&D testbed projects is crucial
• Success in Testbed3 is vital to achieve wide up-take of web services in the production environment
Lessons learnt from “Life on the Edge” (a.k.a. GeoSciML Testbed2)
Lessons learnt from “Life on the Edge” (a.k.a. GeoSciML Testbed2)
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
Where to from here?Where to from here?
OneGeology
~1:1 million scale digital geology of the world over 50 nations on all continents
OneGeology
~1:1 million scale digital geology of the world over 50 nations on all continents
Other Geoscience “ML’s” under development involving GA
• Landslides
• Mineral Occurrences
• Geochronology
• Geochemistry
• many more that GA could be involved in
Other Geoscience “ML’s” under development involving GA
• Landslides
• Mineral Occurrences
• Geochronology
• Geochemistry
• many more that GA could be involved in
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
Where to from here?Where to from here?
Within Australia…
An Australian Geoscience Portal ?
• All government geoscience map data
• Data served from distributed state and federal sites to a single portal using the GeoSciML and O&M data transfer standards
Within Australia…
An Australian Geoscience Portal ?
• All government geoscience map data
• Data served from distributed state and federal sites to a single portal using the GeoSciML and O&M data transfer standards
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
• Open data standards–not dependent on proprietary software; internationally agreed
• Efficiencies for industry–data from government providers is up-to-date, easily discoverable, and
standard; no reformatting required• Efficiencies for government
–no need to change local data structures; just map each database to GeoSciML
–new data is immediately available to the internet as a web service–no need to maintain data in several different software formats–standard format for industry mandatory reporting
• Benefits for the wider geoscience community–same methodologies used to develop the GeoSciML standard can be
used by other scientific communities
Benefits of Web Services for Government GeoscienceBenefits of Web Services for Government Geoscience
Life On The Edge - Global Geoscience Data Delivery - DGAL 24 Oct 2007
QuestionsQuestions