1 informatics: filling the gap between science and ict in a sustainable way peter fox tetherless...

19
1 Informatics: Filling the gap between science and ICT in a sustainable way Peter Fox Tetherless World Constellation Rensselaer Polytechnic Institute Formerly: High Altitude Observatory, NCAR

Upload: denis-fields

Post on 26-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

1

Informatics: Filling the gap between science and ICT in a sustainable way

Peter FoxTetherless World ConstellationRensselaer Polytechnic InstituteFormerly: High Altitude Observatory, NCAR

2

BackgroundScientists should be able to access a global, distributed

knowledge base of scientific data that:• appears to be integrated• appears to be locally available

But… data is obtained by multiple means (instruments and models), using various protocols, in differing vocabularies, using (sometimes unstated) assumptions, with inconsistent (or non-existent) meta-data. It may be inconsistent, incomplete, evolving, and distributed

And… there exist(ed) significant levels of semantic heterogeneity, large-scale data, complex data types, legacy systems, inflexible and unsustainable implementation technology…

3

Virtual ObservatoriesMake data and tools quickly and easily accessible

to a wide audience.

Operationally, virtual observatories need to find the right balance of data/model holdings, portals and client software that researchers can use without effort or interference as if all the materials were available on his/her local computer using the user’s preferred language: i.e. appear to be local and integrated.

Fox Informatics and Semantics, © 2008

4

Science and technical use casesFind data which represents the state of the neutral

atmosphere anywhere above 100km and toward the arctic circle (above 45N) at any time of high geomagnetic activity.

– Extract information from the use-case - encode knowledge– Translate this into a complete query for data - inference and

integration of data from instruments, indices and models

Provide semantically-enabled, smart data query services via a SOAP web for the Virtual Ionosphere-Thermosphere-Mesosphere Observatory that retrieve data, filtered by constraints on Instrument, Date-Time, and Parameter in any order and with constraints included in any combination.

5

But data has Lots of Audiences

From “Why EPO?”, a NASA internalreport on science education, 2005

More Strategic

Less Strategic

InformationInformation products have

SCIENTISTS TOO

6

What is a Non-Specialist Use Case?

Teacher accesses internet goes to An Educational Virtual Observatory and enters a search for “Aurora”.

Someone should be able to query a virtual observatory without having specialist knowledge

7

Teacher receives four groupings of search results:

1) Educational materials: http://www.meted.ucar.edu/topics_spacewx.php and http://www.meted.ucar.edu/hao/aurora/

2) Research, data and tools: via a range of science VOs, knows to search for brightness, or green/red line emission

3) Did you know?: Aurora is a phenomena of the upper terrestrial atmosphere (ionosphere) also known as Northern Lights

4) Did you mean?: Aurora Borealis or Aurora

Australis, etc.

What should the User Receive?

Fox Informatics and Semantics, © 2008

8

Shifting the Burden from the Userto the Provider

9

Response (so far)

• As a result of finding out who is doing what,

sharing experience/ expertise, and substantial

coordination:

• There is/ was still a gap between science and the

underlying infrastructure and technology that is

available• Cyberinfrastructure is the new

research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.

Informatics - information science includes the

science of (data and) information, the practice

of information processing, and the engineering

of information systems. Informatics studies the

structure, behavior, and interactions of natural

and artificial systems that store, process and

communicate (data and) information. It also

develops its own conceptual and theoretical

foundations. Since computers, individuals and

organizations all process information,

informatics has computational, cognitive and

social aspects, including study of the social

impact of information technologies. Wikipedia.

10

Progression after progression

IT Cyber

Infrastructure (CI)

Cyber Informatics

Core Informatics

Science Informatics

Science, SBAs

Informatics

•CI = Discipline neutral, e.g. OPeNDAP server running over HTTP/HTTPS

•Cyberinformatics = Data (product) and service ontologies, triple store, map to schema

•Core informatics = Reasoning engine (Pellet), OWL (computer science)

•Science (X) informatics = Use cases, science domain terms, concepts in an ontology or controlled vocabulary

A moment of history

• In the late 1950’s (actually around 1957-1958) the modern informatics term was coined

• Existed for a while but then split into library science and computer science and developed their own fields, became disconnected

• Now coming back to be relevant to science

• Informatics IS NOT just having a scientist work with an “IT/ICT” person (NOT, NOT, NOT)

11

Cyberinformatics

• The first match between the domain and the underlying domain-neutral e-infrastructure/ cyberinfrastructure

• When the underlying infrastructure (when it becomes real infrastructure and not just software) changes this is one part that needs to change

• Less brittle since upper layers remain intact

12

Core informatics

• The realm of computer science (for the most part, also librarians)

• Strongly influenced by science (and medical applications) above and below this layer

• If we can leverage this, we do not need to do the specialist work, however …

• We must work with these scientists, sustainably

13

Science Informatics

• Where science meets the underlying technical capabilities and methods

• Must be expressible in science terms; increasingly use cases

• The people in this area are multi-lingual and both interdisciplinary and multi-disciplinary, few are trained or literate here

• Team, or really a community of practice (CoP)

14

Assume

• Mark and Charlie and others have addressed aspects of professional and credit for data aspects/ management

• Dave, Hans and others have ‘data’ journals and ability to cite data

• Projects and communities adopt these

• Probably others but this is enough for now

15

Sustaining

• Visibility: capitalize and maintain this– ICSU/SCID report– U.S. NRC Decadal survey– IUGG/UCDI, IUGS/CGI, geounions– EGU/ESSI

• Need a CoP close to their science and able to share experience, expertise

• Balance research and production ***• Crosses disciplines by definition **

16

More sustaining

• Institutional structure that is sustained is the academic one– Peers, journals, curricula, incentives, rewards– Can then feed into institutions, agencies, projects– Need instructors with experience

• MUST not become isolated as its own field, to some extent this is happening now within AGU

• MUST re-engage library and computer science• MUST stay close to science (X-informatics)• MUST maintain interfaces across layers of informatics

17

Harmonizing the Hierarchies• Working level (L, self-G), e.g. many

• National/ regional societies (L, what is role for G?), e.g. AGU, EGU, more needed

• ‘Mission’/ ‘Production’ agencies (G, what is role for L?), e.g. BGS, USGS, ESA, NASA, NOAA, JAXA, BGR, USGS …

• Programmes - regional and global (some L, G?), e.g. GEOSS, GMES, GCOS, OneGeology,

• International association/ union (some L and some G but not uniform), e.g. IAGA, IAU, IUGS, IUG

• International alliances, e.g. IVOA, CEOS, SPASE

• Global, inter-union (G, need L), e.g. ICSU, GEO, CODATA, WGISS

Leadership - L : Governance - G

19

Discussion

• Taken together, an emerging set of collected experience manifests an emerging informatics core capability that is starting to take data intensive science into a new realm of realizability and potentially, sustainability– X-informatics, Core Informatics, Cyber Informatics

• Gaps – must bridge these very soon (I*Ys work)– Asia: WPGM, AOGS, Japan, China– Russia, Australia, Africa, South America– In hierarchies: from group to world

• Pursue the academic model