geoscience knowledge representation using the sweet ontologies
DESCRIPTION
Geoscience Knowledge Representation Using the SWEET Ontologies. Rob Raskin Jet Propulsion Laboratory. Transforming Data into Knowledge. Data InformationKnowledge. Basic Elements Bytes NumbersModelsFacts - PowerPoint PPT PresentationTRANSCRIPT
Geoscience Knowledge Representation Using the SWEET Ontologies
Rob RaskinJet Propulsion Laboratory
Transforming Data into Knowledge
Data Information Knowledge
Basic Elements Bytes Numbers Models FactsServices Ingest Archive Visualize Infer Understand PredictStorage File Database HDF-EOS GIS MIS Ontology MindInteroperability Syntactic OPeNDAP WMS/WCS SemanticVolume/Density High/Low Low/HighStatistics Checksum Moments Descriptive InferentialAnalysis Fourier Wavelet EOF SSAMethodology Exploratory-analysis Model-based-mining
Syntax Semantics
What is Knowledge? Facts, relations, meanings, contexts
Organized information Core ingredient in “common sense” Common understanding In a form to apply reasoning/inference Dynamic
Expandable
Let’s eat, Grandma.Let’s eat Grandma.
Time flies like an arrow.Fruit flies like a pie.
Semantic Understanding is Semantic Understanding is Difficult!Difficult!
Variable t: temperatureVariable t: time
Sea surface temperature: measured 3 m above surface Sea surface temperature: measured at surface
LA Times headline
Data quality= 5
“Mission accomplished. Major combat operations in Iraq have ended”
Database vs Knowledge Base Database
Entities and Relations Closed world
All facts included
Knowledge base Classes and Properties Collection of facts Captures corporate memory Open world
Facts not stated may be either true or untrue
PO.DAAC Knowledge Bases
PeopleDocuments
Data Products
MetadataTools/
Services
Roles/Tasks
ScienceConcepts
Missions
Applications
Instruments
Web Pages
DataProcessing
Organiza-tions
Announce-ments
Inquiries Computers
(Docushare)
Public access
Relations People have roles Instruments measure science
parameters Inquiries relate to data products etc.
Example of Knowledge-Assisted Service Yellow Page Lookup:
cars vs automobiles Hotels vs motels vs resorts
Semantic-based Service Example: Google Type into Google: “gymnasiums in Seattle”
Generates map of Seattle with dots locating gyms Google understands that
Seattle is a place Gymnasiums is a place-based service
Google understands semantics so that the search results also could include
locations near Seattle Similar services (e.g., health club)
Assertion of Facts as TriplesSubject-Verb-Object representation
Flood subClassOf WeatherPhenomena HDF subClassOf FileFormat Pressure subClassOf PhysicalProperty
Ocean hasSubstance Water AIRS measures Temperature
Applications Software tools can find “meaning” in resources for
Discovery Fusion Lineage …
Requirements Data products associated with objects in “science concept space”
Richer descriptions than DIFs Data services associated with objects in “service concept space”
Richer descriptions than SERFs Search/fusion tools that exploit ontologies
Semantic Web Vision Web page creators place XML tags around
technical terms on web pages XML tags point to knowledge base where
term is “defined” Search tools use this information to provide
value-added services Common search engines (Google) use these
capabilities only minimally, at present
Ontologies
Current preferred method to store “facts” General definition: “all that is known” Computer science definition: Machine-readable definition
of terms and how they relate to one another As with a dictionary, terms are defined in terms of other terms
Provide shared understanding of concepts Support knowledge reuse Support machine-to-machine communications with
deeper semantics than controlled vocabulary
XML-based Ontology Languages
XML satisfies desired properties for language syntax Readable by both humans and machines
However, there are too many possible ways that XML tags can be named and used
No standardization of XML tag meanings as in HTML (<b> </b> pair => renders in bold)
Additional standardized semantics needed to exploit shared understanding of concepts
RDF and OWL
W3C has adopted languages that specialize XML Resource Description Formulation (RDF)
Ontology Web Language (OWL) Languages predefine specific tags
RDF: Class, subclass, property, subproperty, … RDF and OWL form a nested collection of languages, each roughly
a specialization of the preceding language with further shared understanding
XML RDF RDFS OWL Lite OWL DL OWL Full
Semantic Web for Earth and Environmental Terminology (SWEET)
SWEET is a concept space Enables scalable classification of Earth system science
concepts Currently being expanded to Space science
Anybody can import, expand, and specialize the work of others
No need to regenerate a physics, chemistry, or math ontology Concept space is translatable into other
languages/cultures using “sameAs” notions
Non-LivingSubstances
LivingSubstances
PhysicalProcesses
Earth Realm
PhysicalProperties
Time
NaturalPhenomena
Human Activities
Integrative Ontologies
Space
Data
SWEET Ontologies and Their Interrelationships
Faceted Ontologies
Units
Numerics
SWEET as an Upper Level Earth Science Ontology
Math Physics Chemistry
Space
TimeProperty
EarthRealmProcess, Phenomena
SubstanceData
StratosphericChemistry
Biogeochemistry Specialized domains
import
import
SWEET
Why an Upper-Level Ontology for Earth System Science?
Many common concepts used across Earth Science disciplines (such as properties of the Earth)
Provides common definitions for terms used in multiple disciplines or communities
Provides common language in support of community and multidisciplinary activities
Provides common “properties” (relations) for tool developers Reduced burden (and barrier to entry) on creators of
specialized domain ontologies Only need to create ontologies for incremental knowledge
How SWEET was Initially Populated
Initial sources GCMD
Over 10,000 datasets Over 1000 keywords Data providers submit far more than the 1000 terms for “free-text” search
CF Over 500 keywords Very long term names
surface_downwelling_photon_spherical_irradiance_in_sea_water
Decomposed into facets
Spatial Ontology
Concepts of 0-D, 1-D, 2-D, and 3-D objects Default coordinate system: lat/lon/up Polygons used to store spatial extents
Spatial attributes added (population, area, etc.) Scientific applications include: geology to represent
3-D structure
Numerical Ontologies
Numerics Extents: interval, point, 0, positiveIntegers, … Relations: lessThan, greaterThan, …
SpatialEntities Extents: country, Antarctica, equator, inlet, … Relations: above, northOf, …
TemporalEntities Extents: duration, century, season, … Relations: after, before, …
Numerical Ontologies (cont.) Numeric concepts defined in OWL only through
standard XML XSD spec Intervals defined as restrictions on real line
Numerical relations defined in SWEET lessThan, max, …
Cartesian product (multidimensional spaces) added in SWEET
Numeric ontologies used to define spatial and temporal concepts
Conceptual Ontologies
Phenomena ElNino, Volcano, Thunderstorm, Deforestation) Each has associated, spatial/temporal extent, EarthRealms,
PhysicalProperties etc. Specific instances included
e.g., 1997-98 ElNino
Human Activities Fisheries, IndustrialProcessing, Economics, Public Good
State History or state of planet or component
SWEET Users
ESML- Earth Science Markup Language ESIP - Earth Science Information Partner Federation GEON- Geosciences Network GENESIS- Global Environmental & Earth Science
Information System IRI- International Research Institute (Columbia) LEAD- Linked Environments for Atmospheric
Discovery MMI- Marine Metadata Initiative NOESIS PEaCE- Pacific Ecoinformatics and Computational Ecology SESDI- Semantically Enabled Science Data Integration VSTO- Virtual Solar-Terrestrial Observatory
Collaboration Web Site
Discussion tools Blog, wiki, moderated discussion board
Version Control/ Configuration Management Trace dependencies on external ontologies Tools to search for existing concepts in registered
ontologies Ontology Validation Procedure
W3C note is formal submission method Registry/discovery of ontologies Support workflows/services for ontology development
Community Issues
Content Maintain alignment given expansion of classes and
properties Standards and Conventions
Agreement on standards for use of OWL Fuzzy representation conventions
Review Board Who will oversee and maintain for perpetuity (or at least
through the next funding cycle) ESIP Federation? ESSI?
Global Support Provide tools to visualize and appreciate the big picture
Update/Matching Issues
No removal of terms except for spelling or factual errors
Subscription service to notify affected ontologies when changes made
Must avoid contradictions Additions can create redundancy if sameAs not used Humans must oversee “matching”
CF has established moderator to carry out analogous additions
OWL “import” imports entire file Associate community with ontology terms
Community tagging
Best Practices Keep ontologies small, modular
Be careful that “Owl:Import” imports everything
Use higher level ontologies where possible Identify hierarchy of concept spaces
Model schemas Try to keep dependencies unidirectional