science environment for ecological knowledge jessie kennedy school of computing, napier university,...

30
Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Upload: kathryn-fowler

Post on 03-Jan-2016

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Science Environment for Ecological Knowledge

Jessie KennedySchool of Computing,

Napier University, Edinburgh

Page 2: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Geographic Space Ecological Space

occurrence points on native distribution

ecological niche modeling

Projection back onto geography

Native range prediction

Invaded range prediction

The SEEK Prototype: Ecological Niche Modeling

temperature

Model of niche in ecological dimensions

pre

cip

itatio

n

Biodiversity information e.g.

data from museum specimens,

ecological surveys

Geospatial and remotely sensed

data

Results taken to integrate with

other data realms (e.g., human populations, public health,

etc.)

Page 3: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Species prediction map

PredictedDistribution:Amur snakehead(Channa argus)

Image from http://www.lifemapper.org

Page 4: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

SEEK Overview

Analysis and Modelling System (Kepler)Modelling scientific workflows

EcoGrid:Making diverse environmental data systems interoperate

Semantic Mediation System:“Smart” data discovery and integration

Taxon WG:Taxonomic name/concept resolution server

Page 5: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Scientific workflows

EML provides semi-automated data binding

Scientific workflows represent knowledge about the process; AMS captures this knowledge

Page 6: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Kepler: Ecological Niche Model

Page 7: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Metadata driven data ingestionKey information needed to read and machine process a data

file is in the metadata Physical descriptors (CSV, Excel, RDBMS, etc.) Logical Entity (table, image, etc) and Attribute (column) descriptions

Name Type (integer, float, string, etc.) Codes (missing values, nulls, etc.) Integrity constraints

Semantic descriptions (ontology-based type systems)

Page 8: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Ecological ontologies What was measured (e.g., biomass) Type of quantity measured (e.g., Energy) Context of measurement (e.g., Psychotria limonensis) How it was measured (e.g., dry weight)

Page 9: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Label data with semantic typesLabel inputs and outputs of analytical components with

semantic types

Use reasoning engines to generate transformation steps

Use reasoning engine to discover relevant components

Semantic Mediation

Data Ontology Workflow Components

Page 10: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Data integration Homogeneous data integration

Integration of homogeneous data via EML metadata is relatively straightforward Heterogeneous Data integration

Requires advanced metadata and processing Attributes must be semantically typed Collection protocols must be known Units and measurement scale must be known Measurement relationships must be known

e.g., that ArealDensity=Count/Area

Page 11: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Life Sciences DataMuch of the data gathered in ecological studies and

used in ecological data analysis is bio-referenced data typically organisms are referenced by a Latin name

Many analyses requires integrating data originating in many locations and at various points in time for most bio-referenced data, integration involves matching

on organism name

Page 12: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Biological (scientific) NamesUsed for communicating information about known organisms

and groups of organisms – taxa Framework for all biologists to communicate with…

Taxonomists apply scientific names to species and higher taxa in their classifications

Formalized and validated according to strict codes of nomenclature (different depending on kingdom)

Latin name is a polynomial for species and below; monomial for genus and above

Quoted as: LatinName NameAuthors YearExample: Carya floridana Sarg. 1913

Page 13: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Taxon_concept

Taxon_concept Taxon_concept Taxon_concept

classify

Pile of specimens

Genus

Species

Taxonomic Hierarchy

_a

_b _c _d

Classification, Concepts & Names

Page 14: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

classify

Pile of specimens

Classification, Concepts & Names

Page 15: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

In Linneaus 1758 In Archer 1965 In Tucker 1991

In Pargiter 2003

In Pyle 1990

Aus aus L.1758

(ii) Aus L.1758

Aus bea Archer 1965

Archer 1965

(i) Aus L.1758

Aus aus L.1758

Linneaus 1758

In Fry 1989

(iii) Aus L.1758

Aus aus L.1758

Aus bea Archer 1965

Aus cea BFry 1989

Fry 1989

(v) Aus L.1758

Xus beus (Archer) Pargiter 2003.

Aus ceus BFry 1989

Xus Pargiter 2003

Pargiter 2003

Aus aus L. 1758

bea and cea noted as invalid names and replaced with beus and ceus. Pyle 1990

Aus aus L.1758

Tucker 1991

(iv) Aus L.1758

Aus cea BFry 1989

Publications of Taxonomic Revisions

Publicationsof Purely Nomenclatural Observation

A diligent nomenclaturist, Pyle (1990), notes that the species epthithets of Aus bea and Aus cea are of the wrong gender and publishes the corrected names Aus beus corrig. Archer 1965 and Aus ceus corrig. BFry 1989

Tucker publishes his revison without noting Pyle’s corrigendum of the name of Aus cea

Pargiter publishes his revision using Pyle’s corrigendum of the epithet bea to beus and Aus cea to Aus ceus.

type specimengenus nameGenus

concept

Species concept

species name

publication

specimen

Archer splits Aus aus L. 1758 into two species, retains the name for one and creates a new one

Fry splits Aus bea Archer. 1965 into two species, retains the name for one and creates a new one

Tucker finds new specimens and combines Aus aus L. 1758 and Aus bea Archer. 1965 into one species, retains the name.

Pargiter decides to resplit Aus aus but believes bea(beus) is in a new genus Xus.

Taxonomic history of Aus L. 1758

Page 16: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Problems with Scientific NamesOften recorded inappropriately in datasets

No author and/or year (e.g. Carya floridana) Abbreviated (e.g. C. floridana) Internal code (e.g. PicRub for Picea rubens) Vernacular used (e.g. Scrub Hickory) Misspelled

Are not unique “Re-use” of names with changed definition Name is ambiguous without definition

Subject to name alterations and 'corrections' over time (e.g. Code changes its rules)

Page 17: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Concepts ……Full Scientific name + “according to” (Author + Publication +

Date) + Definition Carya floridana Sarg. (1913) “according to” Charles Sprague Sargent,

Trees & Shrubs 2:193 plate 177 (1913) [+Definition]Original concept

1st use of name as described by the taxonomist same author + date in scientific name and the “according to” same publication for original concepts and name

Revised concept Re-classification of a group different author + date in “according to” Carya floridana Sarg. (1913) “according to” Stone FNA 3:424 (1997)

[+Definition]Should be used for communicating about groups of organisms

Full Scientific name + “according to” (Author + Publication + Date) definition clear – can get the definition comparing or integrating data based on concepts is more accurate

Can GUIDs help?

Page 18: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

ConceptsConcepts are are described in many ways

Created by someone - an Author Described in a Publication Given a Name

May or may not be valid in terms of the nomenclatural codes

Depending on the taxonomists working practice, defined by the set of Specimens examined

(type specimens and others)

Common set of Characters data recorded by taxonomists to describe specimens and taxa

context dependent; differentiate taxa rather than fully describe them; use natural language with all its ambiguities

Relationships to other Taxon Concepts Taxon circumscription

the lower level taxa

Congruence, overlap etc to taxa in other classifications

Page 19: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Legacy Data … In legacy data names often appear in place of conceptsNames are imprecise

are inappropriate for referring to information regarding taxon e.g. observational/collection data

BUT…sometimes that’s all we have

How do we interpret names?….. potentially multiple definitions

the sum of all definitions that exist for the name would that make any sense – conflicts?

one of the existing definitions how can we choose?

the “attributes” in common to all the definitions would that leave any?

represented by the type specimen but what does that mean? – very subjective…..

Page 20: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Legacy Names as Concepts…Nominal concepts

Sub-set of TaxonConcepts Name but no AccordingTo

non-unique (concept) identifier elements can have a unique concept GUID

No definition Explicitly saying it’s something with this name but not really

sure what is/was meant Encourage people to understand and address the issue of

names Allowing mark-up of collections with names allows people to believe

names are really good enough Important problem - needs to be tackled sooner rather than

later will improve long term usefulness of scientific data ease integration

Page 21: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

SEEK TaxonBuild a Name/Concept resolution server

TOS (Kansas)

Taxonomic Concept Schema TCS (Napier) Exchange of taxonomic Info

TDWG/GBIF standard

Basis for TOS

GUIDs GBIF/SEEK etc..

Tools to relate and compare concepts Taxonomy Comparison Visualisation Tool (Napier) Concept Mapper Tool (UNC)

Page 22: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Concept Comparison Visualisation

Page 23: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Taxon Concept SchemaTCS developed to allow exchange of taxonomic

names/concept dataBased on consultation with range of users

understand users’ notions of taxonomic concept what information they consider part of a concept

Presentations at meetings including 2 TDWG Agreement that concepts are important and necessary Taxon Names are independent from Taxon concepts Agreement that observations/identifications etc. should

record concepts not names

Page 24: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

TCS XML based exchange schemaNot designed as the “correct way” to model a Taxon

Concept No “rules” as to what a taxon must have

certain things needed to be useful

Design to accommodate different ways concepts described Lots of optionality or flexibility in elements

to address different work practices in the community

Includes Taxon Names are more constrained as they are governed the codes of

nomenclature

Page 25: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Considerable debate on what should be top level elements Related closely to the question

What gets a GUID?

Taxon concepts Taxon Names Specimens Publications Taxon Relationship Assertions

Concepts refer to Names Names must not change Can’t record original taxon concept

TCS

Page 26: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Exchange of DataExchange of definitional data

name definition information on history of name and type specimen and publication details

taxon concept definition Name, publication details for the defining source, characters, specimens,

related taxa etc

Exchange of usage data for observations/lists (should only use taxon concepts)

need only exchange references to existing taxon concepts user readable keys, e.g. Full Scientific name “according to” Author + Publication GUIDs

for name checking purposes need only exchange name without history or typification

user readable keys, e.g. Full Scientific name GUIDs

Page 27: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Issues of GUIDs for integrationWhat gets a GUID?

TCS top level elements?? The “physical thing” or “electronic record of the thing”

What is data and what is metadata associated with the GUID? Depends on your perspective on life…..

Stability of data associated with a GUIDWho issues GUIDs?

Centralised authority of some sort – peer review?? + One GUID per concept or name (no duplicates) + ensure business rules are applied to new names/concepts created - bottleneck? - too restrictive in what the business rules might be

Distributed free for all + Anyone can publish their own name/concept and get a GUID - Mess of GUIDs to sort out

Which technology? LSIDs, DOI etc.

Page 28: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

TCS and SEEK and…Taxon Object Server

Core of concept/name resolution service Kansas team has been implementing the TOS Schema based on the TCS model Tool to import data from TCS documents

EML Proposed modifications to EML to accommodate SEEK's taxonomic

resolution services in the future

User interface tools Uses cut down TCS as input format

Inform other biology meta-data standards on taxonomic issues Cataloguing the complete genome standard

Page 29: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Taxonomic Object ServerTOS Allows

registration, retrieval, integration of datasets Matches concepts given names, other concepts and

taxonomies Allow taxonomists to

Author new ideas Make new relationships between concepts

Allow researchers to Easily see previous taxonomic opinions Use a stable identification system to reference concepts

(LSIDs) Find concepts…

Integration with Kepler

Page 30: Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

TOS operationsVia TCS document

addConcept addRelationship

Public APIs getConcept –on GUID getBestConcept – on name string getHigherTaxon – on GUID and authority – up tree getAuthoritativeList – down tree findConcepts – on any property(s) findRelatedConcepts – on GUID and relationships getSynonymousNames – returns name strings getHigherTaxon getAuthoritativeList

Dictionary for name-concept matching N-gram matching algorithm getBestConcept