201109021 mcguinness ska_meeting
DESCRIPTION
Invited talk for the Square Kilometer Array meeting in Wellington New Zealand in Sept 2011 on Semantic eScience and Semantically enabled Virtual Observatories along with directionsTRANSCRIPT
The Evolving Semantic Web and Semantic eScience Landscape
Deborah L. McGuinness
Tetherless World Senior Constellation Chair
Professor of Computer and Cognitive Science
Rensselaer Polytechnic Institute
Troy, NY, USAJoint work with the Tetherless World Constellation eScience , Provenance, and Linked Open Data Teams. Particularly Peter Fox, Jim Hendler, Patrick West, Stephan Zednik, Cynthia Chang, … tw.rpi.edu/people
Introduction– Science data is exploding – sensors creating more than we
can handle, Linked open data initiatives, etc. – Virtual Observatories expanding – in breadth, depth, and
semantic usage– Introduction to a leading edge interdisciplinary virtual
observatory – Virtual Solar Terrestrial Observatory– Directions – (that may be even more important for BIG
science )• Provenance • Semantic eScience Framework
– Discussion
Rensselaer Tetherless World Constellation (TWC)
http://tw.rpi.eduChaired Professors: McGuinness, Fox, HendlerResearch Prof: Luciano; Research Staff: Bao, Chang, Erickson, Shi, West, Zednik
Themes:•Semantic Foundations
• Knowledge Provenance / Explanation
• Ontology Environments
• Inference• Trust• Linked Data
•Xinformatics• Semantic eScience• Data Science• eHealth• eEnvironment
•Future Web• Web Science• Policy• Social
McGuinness NSF/NCAR May 6, 2008
Semantic e-Science Motivations
AI Goal: AI in service of supporting the next generation of science – interdisciplinary, distributed e-Science
Science Goal: Scientists should be able to access a global, distributed knowledge base of scientific data that:• appears to be integrated• appears to be locally available
But… data is obtained by multiple instruments, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed.
We look to semantic technologies to help.
5
Virtual Solar Terrestrial Observatory (vsto.org)
• Interdisciplinary Virtual Observatory for searching, integrating, & analyzing observational, experimental, & model databases.
• Subject matter: solar, solar-terrestrial and space physics• Provides virtual access to specific data, model, tool and
material archives containing items from a variety of space- and ground-based instruments and experiments, as well as individual and community modeling and software efforts bridging research and educational use
• 3 year NSF project; initial deployment in year 1, multiple deployments by year 2; year 3 outreach and broadening
• While aimed at one interdisciplinary area, it serves as a replicable prototype for interdisciplinary virtual observatories• Numerous follow-ons (Semantic Provenance Capture in Data Ingest Systems, SESDI, SESF,
SSIII, …)
9/15/2009 McGuinness - Cog Sci - RPI 6With NCAR, UTEP
McGuinness NSF/NCAR May 6, 2008
Some Learnings
Successful demonstration of semantic technologies Serves as operational prototype and has been
replicated in volcanology and climate response, semantic sea ice, ….
Semantic Web methodology for development Modularization of ontologies is critical for re-use
(along with designing the ontologies for re-use) Provenance is critical for acceptance Tools, toolkits, and smart frameworks are one next
step that we are taking (and we love partners in this endeavor…)
Semantic Web Methodology and Technology Development Process
• Establish and improve a well-defined methodology vision for Semantic Technology based application development; Leverage controlled vocabularies, etc.
10
Use Case
Small Team, mixed skills
Analysis
Adopt Technology Approach
Leverage Technology
InfrastructureRapid
Prototype
Open World: Evolve, Iterate,
Redesign, Redeploy
Use Tools
Science/Expert Review & Iteration
Develop model/ ontology
Evaluation
James L. Benedict, Deborah L. McGuinness, and Peter Fox. A Semantic Web-based Methodology for Building Conceptual Models of Scientific Information. In American Geophysical Union, Fall Meeting (AGU2006), San Francisco, Ca., December, 2007. Eos Trans. AGU 88(52), Fall Meet. Suppl., Abstract IN53A-0950. abstract
11
Semantic Provenance Capture for Data Ingest Systemcs (SPCDIS)
Fact: Scientific data services are increasing in usage and scope, and with these increases comes growing need for access to provenance information.
Provenance Project Goal: to design a reusable, interoperable provenance infrastructure.
Science Project Goal: design and implement an extensible provenance solution that is deployed at the science data ingest/ product generation time.
Outcome: implemented provenance solution in one science setting AND operational specification for other scientific data applications.
Extends vsto.org
Advanced Coronal Observing System (ACOS) Provenance Use Cases
• What were the cloud cover and seeing conditions during the observation period of this image?
• What calibrations have been applied to this image?
• Why does this image look bad?
12
ACOSData Ingest
• Typical science data processing pipelines
• Distributed
• Some metadata in silos
• Much metadata lost
• Many human-in-loop decisions, events
• No metadata infrastructure for any user
• Community is broadening
Chromospheric Helium Imaging Photometer (CHIP) Data IngestACOS – Advanced Coronal Observing System 13
PML Usage in SPCDIS
• Justification– Explanation– Causality graph
• Provenance– Conclusion– Source– Engine– Rule
• Trust– Trust/Belief metrics
NodeSetNodeSet
JustificationJustification
ConclusionConclusion
NodeSetNodeSet
JustificationJustification
ConclusionConclusion
NodeSetNodeSet
JustificationJustification
ConclusionConclusion
EngineEngine RuleRule RuleRule
hasAntecedentList
hasSourceUsagehasInferenceRule
hasInferenceEngine
SourceUsageSourceUsage
SourceSource
DateTimeDateTime
14
PML in Action
• This is the PML provenance encoding for a “quick look” gif file that is generated from two image data datasets
Node set for the quickloook gif file
hasConclusion: a reference to the gif file itself
InferenceStep: how the gif file was derived
hasAntecedents
hasInferenceRulehasInferenceEngine
The “antecedents” of the quicklook gif file are other node sets
Integrated View
• Observer log’s information added into quicklook image’s provenance
17
Knowledge Provenance in Action
Mobile Wine Agent
GILA
Combining Proofs in
TPTP
Cognitive Asst
17
Virtual Observatories
17
Intelligence Analyst Tools
McGuinness – Inference Web
Discussion
• Semantic technologies can help in many ways – we have demonstrated their use in integration, discovery, access, validation, …
• Many subject area ontologies exist… and some are modular enough and vetted enough and maintained enough to depend on
• Moving from semantically-enabled systems to semantically-enabled frameworks is part of our present and future and we think it will be for others
• Provenance is critical and should be part of the design from day 1 (not an afterthought)…. And languages and tools are emerging
• Linked data can play a role – e.g., SemantAqua
• Things you might consider:– Use our framework / tools / tutorials such as linked data, Inference Web, Ontologies, SESF– Contribute your ontologies, tools, use cases to SESF – Collaborate with us…………..– Questions dlm @ cs. rpi. edu
Tropopause
http://aerosols.larc.nasa.gov/volcano2.swf
Atmosphere Use Case
Determine the statistical signatures of both volcanic and solar forcings on the height of the tropopause
From paleoclimate researcher – Caspar Ammann – Climate and Global Dynamics Division of NCAR - CGD/NCAR
Layperson perspective: - look for indicators of acid rain in the part of the
atmosphere we experience… (look at measurements of sulfur dioxide in relation
to sulfuric acid after volcanic eruptions at the boundary of the troposphere and the stratosphere)
Nasa funded effort with Fox – NCAR->RPI, Sinha - Va. Tech, Raskin - JPL
Use Case: A Volcano Erupts
Preferentially it’s a tropical mountain (+/- 30 degrees of the equator) with ‘acidic’ magma; more SiO2, and it erupts with great intensity so that material and large amounts of gas are injected into the stratosphere.
The SO2 gas converts to H2SO4 (Sulfuric Acid) + H2O (75% H2SO4 + 25% H2O). The half life of SO2 is about 30 - 40 days.
The sulfuric acid condensates to little super-cooled liquid droplets. These are the volcanic aerosol that will linger around for a year or two.
Brewer Dobson Circulation of the stratosphere will transport aerosol to higher latitudes. The particles generate great sunsets, most commonly first seen in fall of the respective hemisphere. The sunlight gets partially reflected, some part gets scattered in the forward direction.
Result is that the direct solar beam is reduced, yet diffuse skylight increases. The scattering is responsible for the colorful sunsets as more and more of the blue wavelength are scattered away.in mid-latitudes the volcanic aerosol starts to settle, but most efficient removal from the stratosphere is through tropopause folds in the vicinity of the storm tracks.
If particles get over the pole, which happens in spring of the respective hemisphere, then they will settle down and fall onto polar ice caps. Its from these ice caps that we recover annual records of sulfate flux or deposit.
We get ice cores that show continuous deposition information. Nowadays we measure sulfate or SO4(2-). Earlier measurements were indirect, putting an electric current through the ice and measuring the delay. With acids present, the electric flow would be faster.
What we are looking for are pulse like events with a build up over a few months (mostly in summer, when the vortex is gone), and then a decay of the peak of about 1/e in 12 months.
The distribution of these pulses was found to follow an extreme value distribution (Frechet) with a heavy tail.
Inference Web: Making Data Transparent and Actionable Using Semantic Technologies
• How and when does it make sense to use smart system results & how do we interact with them?
23
Knowledge Provenance in Virtual
Observatories
Hypothesis Investigation /
Policy Advisors
(Mobile) Intelligent
Agents
Intelligence Analyst Tools
NSF Interops:SONETSSIII – Sea Ice
Core and framework semantics
Ontology Spectrum
Catalog/ID
GeneralLogical
constraints
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance Value
Restrs.
Disjointness,
Inverse, part-of…
From 99 AAAI panel, 2000 Dagstuhl talk
November 9, 2006 26
Virtual Observatory (VSTO)
• General: Find data subject to certain constraints and plot appropriately
• Specific: Plot the observed/measured Neutral Temperature as recorded by the Millstone Hill Fabry-Perot interferometer while looking in the vertical direction at any time of high geomagnetic activity in a way that makes sense for the data.
VSTO Results
Many Benefits:– Reduced query formation from 8 to 3 steps and reduced choices at each stage– Allowed scientists to get data from instruments they never knew of before (e.g.,
photometers in example)– Supported augmentation and validation of data– Useful and related data provided without having to be an expert to ask for it– Integration and use (e.g. plotting) based on inference– Ask and answer questions not possible before
But Needed Provenance (SPCDIS, PML), reusability & modularity (SESF)
– Deborah McGuinness, Peter Fox, Luca Cinquini, Patrick West, Jose Garcia, James L. Benedict, and Don Middleton. The Virtual Solar-Terrestrial Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. In the Proceedings of the Nineteenth Conference on Innovative Applications of Artificial Intelligence (IAAI-07). Vancouver, British Columbia, Canada, July 22-26, 2007.
– Peter Fox, Deborah L. McGuinness, Luca Cinquini, Patrick West, Jose Garcia, James L. Benedict, and Don Middleton. Ontology-supported Scientific Data Frameworks: The Virtual Solar-Terrestrial Observatory Experience. In Computers and Geosciences - Elsevier. Volume 35, Issue 4 (2009).
VSTO Instrument
28
VSTO Infrastructure
29
November 9, 2006 Deborah L. McGuinness 30
Partial exposure of Instrument class hierarchy - users seem to LIKE THIS
Users Require Provenance!Users demand it! If users (humans and agents) are to use, reuse, and integrate system
answers, they must trust them.
Intelligence analysts: (from DTO/IARPA’s NIMD)Andrew. Cowell, Deborah McGuinness, Carrie Varley, and David A. Thurman. Knowledge-Worker Requirements for Next Generation
Query Answering and Explanation Systems. Proc. of Intelligent User Interfaces for Intelligence Analysis Workshop, Intl Conf. on Intelligent User Interfaces (IUI 2006), Sydney, Australia.
Intelligent Assistant Users: (from DARPA’s PAL/CALO)Alyssa Glass, Deborah L. McGuinness, Paulo Pinheiro da Silva, and Michael Wolverton. Trustable Task Processing Systems. In Roth-
Berghofer, T., and Richter, M.M., editors, KI Journal, Special Issue on Explanation, Kunstliche Intelligenz, 2008.
Virtual Observatory Users: (from NSF’s VSTO)Deborah McGuinness, Peter Fox, Luca Cinquini, Patrick West, Jose Garcia, James L. Benedict, and Don Middleton. The Virtual Solar-
Terrestrial Observatory: A Deployed Semantic Web Application Case Study for Scientific Research. Proc. of the Nineteenth Conference on Innovative Applications of Artificial Intelligence (IAAI-07). Vancouver, British Columbia, Canada.
And… as systems become more diverse, distributed, embedded, and depend on more varied data and communities, more provenance and more types are needed
.
Advanced Coronal Observing System (ACOS) Provenance Use Cases
• What were the cloud cover and seeing conditions during the observation period of this image?
• What calibrations have been applied to this image?
• Why does this image look bad?
32
ACOSData Ingest
• Typical science data processing pipelines
• Distributed
• Some metadata in silos
• Much metadata lost
• Many human-in-loop decisions, events
• No metadata infrastructure for any user
• Community is broadening
Chromospheric Helium Imaging Photometer (CHIP) Data IngestACOS – Advanced Coronal Observing System 33
PML Usage in SPCDIS
• Justification– Explanation– Causality graph
• Provenance– Conclusion– Source– Engine– Rule
• Trust– Trust/Belief metrics
NodeSetNodeSet
JustificationJustification
ConclusionConclusion
NodeSetNodeSet
JustificationJustification
ConclusionConclusion
NodeSetNodeSet
JustificationJustification
ConclusionConclusion
EngineEngine RuleRule RuleRule
hasAntecedentList
hasSourceUsagehasInferenceRule
hasInferenceEngine
SourceUsageSourceUsage
SourceSource
DateTimeDateTime
34
A PML-Enhanced Image
provenance
CHIP Quick-LookCHIP PML-Enhance Quick-Look
Integrated View
• Observer log’s information added into quicklook image’s provenance
Provenance aware faceted search
Tetherless World Constellation 37
Technologies
• Semantic Web methodology
• Medium weight ontologies (although adapted from existing ontologies)
• Access to data
• Mapping info / services
• Reasoning (previous application was linking and exploration)
• Note – this project was operational in 8 months and is still in use years later
Semantically-Enabled Systems -> Semantically-Enabled Frameworks
• We could continue to build somewhat extensible and reusable systems…. But
• We wanted broader base of builders and users
• Frameworks provide many entry and exit points and re-usable (hopefully) seamless components
• Open source ontologies and software!
• We love partners in this endeavor…39
Background
• Began knowledge environment for GeoSciences discussions – early 2000s
• Chose a particular interdisciplinary virtual observatory (VSTO) powered by semantic technologies
• Use case driven – in solar and solar-terrestrial physics with an emphasis on instrument-based measurements and real data pipelines
• First step – proof of concept semantically-enabled pilot – VSTO quite successful
• We pushed semantics into applications that were already built on advanced cyberinfrastructure 40
Background II
• Provenance demands led to Semantic Provenance Capture for Data Ingest Systems
• Test in new domains – Semantically-Enabled Scientific Data Integration – predict climate impacts following volcanic eruption
• Reuse worked: semantic integration, semantic provenance, (with modularization and tool requests)
• Goal now – configurable, re-usable framework with embedded toolkit
Framework overview
Tetherless World Constellation 42
Semantic Web Methodology and Technology Development Process
43James L. Benedict, Deborah L. McGuinness, and Peter Fox. A Semantic Web-based Methodology for Building Conceptual Models of Scientific Information. In American Geophysical Union, Fall Meeting (AGU2006), San Francisco, Ca., December, 2007. Eos Trans. AGU 88(52), Fall Meet. Suppl., Abstract IN53A-0950. abstract
Application integration with smart, scalable search
• Rozell et al.
Core and framework semantics
Status & Discussion
• Ontology and tool re-use in process or beginning with many projects– VSTO re-implementation– BCO-DMO (biological and chemical oceanography)– Semantic Sea Ice (NSF Interop project)– Scientific Observations Network (SONET – NSF Interop)– National Ecological Observatory Network (NEON)– CSIRO Water Monitoring– Your Project Here!
• Modularization in process• Tools like S2S in place and being tested
Commonalities
• Applications – simple linking; integration of many existing vocabularies, simple inference
• Encoding of meaning – often lightweight – ontologies• Semantic Web methodology• Often light weight data encodings – triple stores• Usually simple reasoners• Provenance encodings
• These are all options that can be used incrementally and at varying degrees of sophistication.
• While initial applications are often on larger platforms, many can be adapted to mobile platforms
Comments
• Broader groups of people are now building linked data applications – e.g., hackathons for linked govt data, TWC/Elsevier Hackathon, Health 2.0 , etc.
• Broader groups of people are now building Virtual Observatories AND wanting to integrate more data, disciplines, etc.
• More interest in encodings of meaning to create smarter and more context aware application
• Growing demand for provenance for attribution, trust, transparency
• *More applications are moving to mobile and becoming ubiquitous
• More data from sensors and from open data initiative is fueling some applications
• Things you might consider:– Use our framework / tools / tutorials such as linked data, Inference Web, Ontologies, SESF– Contribute your modules to SESF – Collaborate with us…………..– Questions dlm @ cs. rpi. edu
Extras
Ontology
Regulation Ontology– Model federal and state
water quality regulations for drinking water sources
– Can use to define: for example, in California, “any measurement has value 0.01 mg/L is the limit for Arsenic”
– Combine with core ontology, we can infer “any water source contains 0.01 mg/L of Arsenic is a polluted water source.”
Portion of Cal. Regulation Ontology.
Visualization
• Map Visualization:1. Presents analyzed
results with Google Map
2. Presents explanation on why a water source is marked as polluted
3. Use “Facet” type filter to select type of data
1
2
3
http://was.tw.rpi.edu/swqp/map.html
Selected Follow-up options
Limit
Violation
PopSciGrid in Action
http://logd.tw.rpi.edu/demo/tax-cost-policy-prevalence
Directions
• Use sensed personal data to provide context and integrate with aggregated data to provide actionable health advisors – diet & nutrition, exercise, etc.
• Use PopsciGrid model for other data, e.g., CLASS data about exercise and nutrition in schools
• Relate to health impacts
• Expose provenance more effectively
Tetherless Faceted Browsing
PopSciGrid Revisited
CSV2RDF4LODDirect
SemDiff
Archive
CSV2RDF4LODEnhance
visualize
derive derive
integrate
derive
archive
Publish
Ban coverage
Data sets, simple ontologyProvenance toolsVisualization tools
VSTO DataProduct
58
Semantic Web Methodology
McGuinness, Fox, West, Garcia, Cinquini, Benedict, Middleton http://www.vsto.org
60
Semantic Provenance Capture for Data Ingest Systemcs (SPCDIS)
Fact: Scientific data services are increasing in usage and scope, and with these increases comes growing need for access to provenance information.
Provenance Project Goal: to design a reusable, interoperable provenance infrastructure.
Science Project Goal: design and implement an extensible provenance solution that is deployed at the science data ingest/ product generation time.
Outcome: implemented provenance solution in one science setting AND operational specification for other scientific data applications.
Extends vsto.org
PML in Action
• This is the PML provenance encoding for a “quick look” gif file, which is generated from two image data datasets
Node set for the quickloook gif file
hasConclusion: a reference to the gif file itself
InferenceStep: how the gif file was derived
hasAntecedents
hasInferenceRulehasInferenceEngine
The “antecedents” of the quicklook gif file are other node sets
62
CHIP Pipeline(Chromospheric Helium Image Photometer)
Mauna Loa Solar Observatory (MLSO)Hawaii
National Center for Atmospheric Research (NCAR) Data Center.Boulder, CO
Intensity Images (GIF)
Velocity Images (GIF)
•Follow-up Processing on Raw Data (e.g., Flat Field Calibration)•Quality Checking(Images Graded: GOOD, BAD, UGLY)
•Raw Image Data
Raw Image DataCaptured by CHIPChromosphericHelium-I ImagePhotometer
•Raw Data Capture
Publishes
62
Core and Framework Semantics - Multi-tiered interoperability
used by
SPARQL to Xquery translator RDFS materialization(Billion triple winner)
Govt metadata searchLinked Open Govt Data
SPARQL WG, earlier QL –OWL-QL, Classic’ QL, …
OWL 1 & 2 WG Edited main OWL Docs, quick reference, OWL profiles (OWL RL),
Earlier languages: DAML, DAML+OIL, Classic
RIF WGAIR accountability tool
DL, KIF, CL, N3Logic
Inference Web, Proof Markup Language, W3C Provenance Working group formal model
Inference Web IW Trust, Air + Trust
Visualization APIsS2S
Govt Data
Ontology repositories (ontolinguag),Ontology Evolution env:Chimaera, Semantic eScience Ontologies, MANY other ontologies
Transparent AccountableDatamining Initiative (TAMI)
TWC and the Semantic Web Layer Cake
SemantAqua (part of SemantEco)
• Enable/Enpower citizens & scientists to explore water pollution sites, facilities, and regulations along with provenance.
• Demonstrates semantic web technologies in environmental informatics systems.
• Map presentation of analysis• Explanations and Provenance
available • Use “Facet” type filter to select
type of data
1
2
3
http://was.tw.rpi.edu/swqp/map.html
System Architecture
access
Virtuoso
Ontology
• Core TWC Water ontology– Extends existing best
practice ontologies, e.g. SWEET, OWL-Time.
– Includes terms for relevant pollution concepts
– Can use to conclude: “any water source that has a measurement outside of its allowable range” is a polluted water source.
Portion of the TWC Water Ontology.
Provenance
• Preserves provenance in the Proof Markup Language (PML).
• Data Source Level Provenance:– The captured provenance data are used to
support provenance-based queries.
• Reasoning level provenance: – When water source been marked as polluted,
user can access supporting provenance data for the explanations including the URLs of the source data, intermediate data and the converted data.
Some Foundations
• Growing body of Open Linked Data• Growth and acceptance of ontologies and ontology-
enabled service• RPI Tetherless World backend tools and service
• LOGD• Inference Web and Proof Markup Language• eScience ontologies and infrastructure
The Tetherless World Constellation Linked Open Government Data
Portal
70
Create
TWC LOGD
ConvertQuery/Access
LOGDSPARQL Endpoint
Enhance
• RDF• RSS• JSON• XML• HTML• CSV• …
Community Portal
Data.gov deployment
A PML-Enhanced Image
provenance
CHIP Quick-LookCHIP PML-Enhance Quick-Look