1 shifting the burden from the user to the data provider peter fox high altitude observatory, ncar...

43
1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA projects

Upload: maria-king

Post on 26-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

1

Shifting the Burden from the User to the Data Provider

Peter FoxHigh Altitude Observatory,NCAR (***)

With thanks to eGY and various NSF, DoE and NASA projects

Page 2: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

2

Outline• Background, definitions• Informatics -> e-Science• Data has lots of uses

– Virtual Observatories: use cases– Data Framework: Examples– Data ingest, integration, mining and …

• Discussion

Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Page 3: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

3

BackgroundScientists should be able to access a global, distributed

knowledge base of scientific data that:• appears to be integrated• appears to be locally available

But… data is obtained by multiple instruments, using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed

And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…

Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Page 4: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

4

But data has Lots of Audiences

From “Why EPO?”, a NASA internalreport on science education, 2005

More Strategic

Less Strategic

InformationInformation products have

SCIENTISTS TOO

Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Page 5: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

5

The Information Era: Interoperability

• managing and accessing large data sets• higher space/time resolution capabilities • rapid response requirements• data assimilation into models• crossing disciplinary boundaries.

Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal. Open access to data and services enables us to meet the new challenges of understand the Earth and its space environment as a complex system:

Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Page 6: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

6

Shifting the Burden from the Userto the Provider

Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Page 7: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

7

Modern capabilities

Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Page 8: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

8

Mind the Gap!

• As a result of finding out who is doing what,

sharing experience/ expertise, and substantial

coordination:

• There is/ was still a gap between science and the

underlying infrastructure and technology that is

available• Cyberinfrastructure is the new

research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.

Informatics - information science includes the

science of (data and) information, the practice

of information processing, and the engineering

of information systems. Informatics studies the

structure, behavior, and interactions of natural

and artificial systems that store, process and

communicate (data and) information. It also

develops its own conceptual and theoretical

foundations. Since computers, individuals and

organizations all process information,

informatics has computational, cognitive and

social aspects, including study of the social

impact of information technologies. Wikipedia.

Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Page 9: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

9

Progression after progression

IT Cyber

Infrastructure

Cyber Informatics

Core Informatics

Science Informatics,

aka

Xinformatics

Science, SBAs

Informatics

Fox HDF: Semantic Data Burden Shift Oct 15, 2008

Page 10: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

10

Virtual Observatories• Conceptual examples: • In-situ: Virtual measurements

– Related measurements

• Remote sensing: Virtual, integrative measurements– Data integration

• Managing virtual data products/ sets

Page 11: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

11

Virtual ObservatoriesMake data and tools quickly and easily accessible to a

wide audience.

Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated

Likely to provide controlled vocabularies that may be used for interoperation in appropriate domains along with database interfaces for access and storage and “smart” tools for evolution and maintenance.

Page 12: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

12

Early days of discipline specific VOs

… … … …

VO1

VO2 VO3

DB2 DB3DBn

DB1

?

Page 13: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

13

The Astronomy approach; data-types as a service

… … … …

VO App1

VO App2VO App3

DB2 DB3DBn

DB1

VOTable

Simple Image

Access Protocol

Simple Spectrum

Access Protocol

Simple Time Access

Protocol

VO layer

Limited interoperability

Lightweight semantics

Limited meaning, hard coded

Limited extensibility

Under review

Open Geospatial Consortium:

Web {Feature, Coverage, Mapping} Service

Sensor Web Enablement:

Sensor {Observation, Planning, Analysis} Service

use the same approach

Page 14: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

14… … … …

VO Portal

Web Serv.

VO API

DB2 DB3DBn

DB1

Semantic mediation layer - VSTO - low level

Semantic mediation layer - mid-upper-level

Education, clearinghouses, other services, disciplines, et c.

Metadata, schema, data

Query, access and use of data

Semantic query, hypothesis and inference

Semantic interoperability

Added value

Added value

Added value

Added value

Mediation Layer• Ontology - capturing concepts of Parameters,

Instruments, Date/Time, Data Product (and associated classes, properties) and Service Classes

• Maps queries to underlying data• Generates access requests for metadata, data• Allows queries, reasoning, analysis, new

hypothesis generation, testing, explanation, et c.

Page 15: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

15

Content: Coupling Energetics and Dynamics of Atmospheric Regions WEB

Community data archive for observations and models of Earth's upper atmosphere and geophysical indices and parameters needed to interpret them. Includes browsing capabilities by periods, > 310 instruments, models, > 820 parameters…

Page 16: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

16

Content: Mauna Loa Solar Observatory Near real-time

data products from Hawaii from a variety of solar instruments.

Source for space weather, solar variability, and basic solar physics

Other content used too - Center for Integrated Space Weather Modeling

Page 17: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

17

Semantic Web Methodology and Technology Development Process

• Establish and improve a well-defined methodology vision for Semantic Technology based application development

• Leverage controlled vocabularies, et c.

Use Case

Small Team, mixed skills

Analysis

Adopt Technology Approach

Leverage Technology

Infrastructure

Rapid Prototype

Open World: Evolve, Iterate,

Redesign, Redeploy

Use Tools

Science/Expert Review & Iteration

Develop model/

ontology

Page 18: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

18

Science and technical use casesFind data which represents the state of the neutral

atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity.

– Extract information from the use-case - encode knowledge– Translate this into a complete query for data - inference and

integration of data from instruments, indices and models

Provide semantically-enabled, smart data query services via a SOAP web for the Virtual Ionosphere-Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.

Page 19: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

Fox RPI: Semantic Data Frameworks May 14, 2008

19

VSTO - semantics and ontologies in an operational environment: vsto.hao.ucar.edu, www.vsto.org

Web Service

Page 20: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

20

Partial exposure of Instrument class hierarchy - users seem to LIKE THIS

Semantic filtering by domain or instrument hierarchy

Page 21: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

21

Page 22: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

Fox RPI: Semantic Data Frameworks May 14, 2008

22

Inferred plot type and return formats for data products

Page 23: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

Fox RPI: Semantic Data Frameworks May 14, 2008

23

Inferred plot type and return required axes data

Page 24: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

24

Semantic Web Benefits• Unified/ abstracted query workflow: Parameters, Instruments, Date-Time• Decreased input requirements for query: in one case reducing the

number of selections from eight to three• Generates only syntactically correct queries: which was not always

insurable in previous implementations without semantics• Semantic query support: by using background ontologies and a

reasoner, our application has the opportunity to only expose coherent query (portal and services)

• Semantic integration: in the past users had to remember (and maintain codes) to account for numerous different ways to combine and plot the data whereas now semantic mediation provides the level of sensible data integration required, now exposed as smart web services– understanding of coordinate systems, relationships, data synthesis,

transformations, et c.– returns independent variables and related parameters

• A broader range of potential users (PhD scientists, students, professional research associates and those from outside the fields)

Page 25: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

25

What is a Non-Specialist Use Case?

Teacher accesses internet goes to An Educational Virtual Observatory and enters a search for “Aurora”.

Someone should be able to query a virtual observatory without having specialist knowledge

Page 26: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

26

Teacher receives four groupings of search results:

1) Educational materials: http://www.meted.ucar.edu/topics_spacewx.php and http://www.meted.ucar.edu/hao/aurora/

2) Research, data and tools: via VSTO, VSPO and VITMO, knows to search for brightness, or green/red line emission

3) Did you know?: Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights

4) Did you mean?: Aurora Borealis or Aurora

Australis, et c.

What should the User Receive?

Page 27: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

Fox RPI: Semantic Data Frameworks May 14, 2008

27

Semantic Information Integration: Concept map for educational use of

science data in a lesson plan

Page 28: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

Fox RPI: Semantic Data Frameworks May 14, 2008

28

Page 29: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

29

• Scaling to large numbers of data providers and redefining the role(s)/ relations with them

• Crossing discipline boundaries• Security, access to resources, policies• Branding and attribution (where did this data come

from and who gets the credit, is it the correct version, is this an authoritative source?)

• Provenance/derivation (propagating key information as it passes through a variety of services, copies of processing algorithms, …)

• Data quality, preservation, stewardship

Issues for Virtual Observatories

These are currently burden areas for users

Page 30: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

30

Problem definition• Data is coming in faster, in greater volumes and outstripping our ability to

perform adequate quality control

• Data is being used in new ways and we frequently do not have sufficient information on what happened to the data along the processing stages to determine if it is suitable for a use we did not envision

• We often fail to capture, represent and propagate manually generated information that need to go with the data flows

• Each time we develop a new instrument, we develop a new data ingest procedure and collect different metadata and organize it differently. It is then hard to use with previous projects

• The task of event determination and feature classification is onerous and we don't do it until after we get the data

Page 31: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

31

• Determine which flat field calibration was applied to the image taken on January, 26, 2005 around 2100UT by the ACOS Mark IV polarimeter.

• Which flat-field algorithm was applied to the set of images taken during the period November 1, 2004 to February 28, 2005?

• How many different data product types can be generated from the ACOS CHIP instrument?

• What images comprised the flat field calibration image used on January 26, 2007 for all ACOS CHIP images?

• What processing steps were completed to obtain the ACOS PICS limb image of the day for January 26, 2005?

• Who (person or program) added the comments to the science data file for the best vignetted, rectangular polarization brightness image from January, 26, 2005 1849:09UT taken by the ACOS Mark IV polarimeter?

• What was the cloud cover and atmospheric seeing conditions during the local morning of January 26, 2005 at MLSO?

• Find all good images on March 21, 2008.• Why are the quick look images from March 21, 2008, 1900UT missing?• Why does this image look bad?

Use cases

Page 32: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

32

Provenance

• Origin or source from which something comes, intention for use, who/what generated for, manner of manufacture, history of subsequent owners, sense of place and time of manufacture, production or discovery, documented in detail sufficient to allow reproducibility

Page 33: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

33

Page 34: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

34

Page 35: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

35

Page 36: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

36

Visual browse

Page 37: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

37

Page 38: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

38

Page 39: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

39

Discussion (1)

• Taken together, an emerging set of collected experience manifests an emerging informatics core capability that is starting to take data intensive science into a new realm of realizability and potentially, sustainability– Use cases (i.e. real users)– X-informatics– Core Informatics– Cyber Informatics

• There are implications for data models

Page 40: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

40

Progression after progression

IT Cyber

Infrastructure

Cyber Informatics

Core Informatics

Science Informatics

Science, SBAs

Informatics

Example:

•CI = OPeNDAP server running over HTTP/HTTPS

•Cyberinformatics = Data (product) and service ontologies, triple store

•Core informatics = Reasoning engine (Pellet), OWL

•Science (X) informatics = Use cases, science domain terms, concepts in an ontology

Page 41: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

41

Discussion (2)• Data and information science is becoming

the ‘fourth’ column (along with theory, experiment and computation)

• Semantics (of the data) are a very key ingredient -> may imply richer data models

Page 42: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

Fox RPI: Semantic Data Frameworks May 14, 2008

42

Summary• Informatics is playing a key role in filling the gap

between science (and the spectrum of non-expert) use and generation and the underlying cyberinfrastructure, i.e. in shifting the burden– This is evident due to the emergence of Xinformatics (world-

wide)• Our experience is implementing informatics as

semantics in Virtual Observatories (as a working paradigm) and Grid environments– VSTO is only one example of success– Data mining, data integration, smart search, provenance are

close behind• Informatics is a profession and a community activity

and requires efforts in all 3 sub-areas (science, core, cyber) and must be synergistic

Page 43: 1 Shifting the Burden from the User to the Data Provider Peter Fox High Altitude Observatory, NCAR (***) With thanks to eGY and various NSF, DoE and NASA

43

More Information• Virtual Solar Terrestrial Observatory (VSTO):

http://vsto.hao.ucar.edu, http://www.vsto.org• Semantically-Enalbed Science Data Integration (SESDI):

http://sesdi.hao.ucar.edu • Semantic Provenance Capture in Data Ingest Systems

(SPCDIS): http://spcdis.hao.ucar.edu • Semantic Knowledge Integration Framework (SKIF/SAM):

http://skif.hao.ucar.edu • Semantic Web for Earth and Environmental Terminology

(SWEET): http://sweet.jpl.nasa.gov • Conferences: AGU 2008, EGU 2009, ISWC 2008, CIKM

2008, …• Peter Fox [email protected]