science environment for ecological knowledge jessie kennedy school of computing, napier university,...
TRANSCRIPT
Science Environment for Ecological Knowledge
Jessie KennedySchool of Computing,
Napier University, Edinburgh
Geographic Space Ecological Space
occurrence points on native distribution
ecological niche modeling
Projection back onto geography
Native range prediction
Invaded range prediction
The SEEK Prototype: Ecological Niche Modeling
temperature
Model of niche in ecological dimensions
pre
cip
itatio
n
Biodiversity information e.g.
data from museum specimens,
ecological surveys
Geospatial and remotely sensed
data
Results taken to integrate with
other data realms (e.g., human populations, public health,
etc.)
Species prediction map
PredictedDistribution:Amur snakehead(Channa argus)
Image from http://www.lifemapper.org
SEEK Overview
Analysis and Modelling System (Kepler)Modelling scientific workflows
EcoGrid:Making diverse environmental data systems interoperate
Semantic Mediation System:“Smart” data discovery and integration
Taxon WG:Taxonomic name/concept resolution server
Scientific workflows
EML provides semi-automated data binding
Scientific workflows represent knowledge about the process; AMS captures this knowledge
Kepler: Ecological Niche Model
Metadata driven data ingestionKey information needed to read and machine process a data
file is in the metadata Physical descriptors (CSV, Excel, RDBMS, etc.) Logical Entity (table, image, etc) and Attribute (column) descriptions
Name Type (integer, float, string, etc.) Codes (missing values, nulls, etc.) Integrity constraints
Semantic descriptions (ontology-based type systems)
Ecological ontologies What was measured (e.g., biomass) Type of quantity measured (e.g., Energy) Context of measurement (e.g., Psychotria limonensis) How it was measured (e.g., dry weight)
Label data with semantic typesLabel inputs and outputs of analytical components with
semantic types
Use reasoning engines to generate transformation steps
Use reasoning engine to discover relevant components
Semantic Mediation
Data Ontology Workflow Components
Data integration Homogeneous data integration
Integration of homogeneous data via EML metadata is relatively straightforward Heterogeneous Data integration
Requires advanced metadata and processing Attributes must be semantically typed Collection protocols must be known Units and measurement scale must be known Measurement relationships must be known
e.g., that ArealDensity=Count/Area
Life Sciences DataMuch of the data gathered in ecological studies and
used in ecological data analysis is bio-referenced data typically organisms are referenced by a Latin name
Many analyses requires integrating data originating in many locations and at various points in time for most bio-referenced data, integration involves matching
on organism name
Biological (scientific) NamesUsed for communicating information about known organisms
and groups of organisms – taxa Framework for all biologists to communicate with…
Taxonomists apply scientific names to species and higher taxa in their classifications
Formalized and validated according to strict codes of nomenclature (different depending on kingdom)
Latin name is a polynomial for species and below; monomial for genus and above
Quoted as: LatinName NameAuthors YearExample: Carya floridana Sarg. 1913
Taxon_concept
Taxon_concept Taxon_concept Taxon_concept
classify
Pile of specimens
Genus
Species
Taxonomic Hierarchy
_a
_b _c _d
Classification, Concepts & Names
classify
Pile of specimens
Classification, Concepts & Names
In Linneaus 1758 In Archer 1965 In Tucker 1991
In Pargiter 2003
In Pyle 1990
Aus aus L.1758
(ii) Aus L.1758
Aus bea Archer 1965
Archer 1965
(i) Aus L.1758
Aus aus L.1758
Linneaus 1758
In Fry 1989
(iii) Aus L.1758
Aus aus L.1758
Aus bea Archer 1965
Aus cea BFry 1989
Fry 1989
(v) Aus L.1758
Xus beus (Archer) Pargiter 2003.
Aus ceus BFry 1989
Xus Pargiter 2003
Pargiter 2003
Aus aus L. 1758
bea and cea noted as invalid names and replaced with beus and ceus. Pyle 1990
Aus aus L.1758
Tucker 1991
(iv) Aus L.1758
Aus cea BFry 1989
Publications of Taxonomic Revisions
Publicationsof Purely Nomenclatural Observation
A diligent nomenclaturist, Pyle (1990), notes that the species epthithets of Aus bea and Aus cea are of the wrong gender and publishes the corrected names Aus beus corrig. Archer 1965 and Aus ceus corrig. BFry 1989
Tucker publishes his revison without noting Pyle’s corrigendum of the name of Aus cea
Pargiter publishes his revision using Pyle’s corrigendum of the epithet bea to beus and Aus cea to Aus ceus.
type specimengenus nameGenus
concept
Species concept
species name
publication
specimen
Archer splits Aus aus L. 1758 into two species, retains the name for one and creates a new one
Fry splits Aus bea Archer. 1965 into two species, retains the name for one and creates a new one
Tucker finds new specimens and combines Aus aus L. 1758 and Aus bea Archer. 1965 into one species, retains the name.
Pargiter decides to resplit Aus aus but believes bea(beus) is in a new genus Xus.
Taxonomic history of Aus L. 1758
Problems with Scientific NamesOften recorded inappropriately in datasets
No author and/or year (e.g. Carya floridana) Abbreviated (e.g. C. floridana) Internal code (e.g. PicRub for Picea rubens) Vernacular used (e.g. Scrub Hickory) Misspelled
Are not unique “Re-use” of names with changed definition Name is ambiguous without definition
Subject to name alterations and 'corrections' over time (e.g. Code changes its rules)
Concepts ……Full Scientific name + “according to” (Author + Publication +
Date) + Definition Carya floridana Sarg. (1913) “according to” Charles Sprague Sargent,
Trees & Shrubs 2:193 plate 177 (1913) [+Definition]Original concept
1st use of name as described by the taxonomist same author + date in scientific name and the “according to” same publication for original concepts and name
Revised concept Re-classification of a group different author + date in “according to” Carya floridana Sarg. (1913) “according to” Stone FNA 3:424 (1997)
[+Definition]Should be used for communicating about groups of organisms
Full Scientific name + “according to” (Author + Publication + Date) definition clear – can get the definition comparing or integrating data based on concepts is more accurate
Can GUIDs help?
ConceptsConcepts are are described in many ways
Created by someone - an Author Described in a Publication Given a Name
May or may not be valid in terms of the nomenclatural codes
Depending on the taxonomists working practice, defined by the set of Specimens examined
(type specimens and others)
Common set of Characters data recorded by taxonomists to describe specimens and taxa
context dependent; differentiate taxa rather than fully describe them; use natural language with all its ambiguities
Relationships to other Taxon Concepts Taxon circumscription
the lower level taxa
Congruence, overlap etc to taxa in other classifications
Legacy Data … In legacy data names often appear in place of conceptsNames are imprecise
are inappropriate for referring to information regarding taxon e.g. observational/collection data
BUT…sometimes that’s all we have
How do we interpret names?….. potentially multiple definitions
the sum of all definitions that exist for the name would that make any sense – conflicts?
one of the existing definitions how can we choose?
the “attributes” in common to all the definitions would that leave any?
represented by the type specimen but what does that mean? – very subjective…..
Legacy Names as Concepts…Nominal concepts
Sub-set of TaxonConcepts Name but no AccordingTo
non-unique (concept) identifier elements can have a unique concept GUID
No definition Explicitly saying it’s something with this name but not really
sure what is/was meant Encourage people to understand and address the issue of
names Allowing mark-up of collections with names allows people to believe
names are really good enough Important problem - needs to be tackled sooner rather than
later will improve long term usefulness of scientific data ease integration
SEEK TaxonBuild a Name/Concept resolution server
TOS (Kansas)
Taxonomic Concept Schema TCS (Napier) Exchange of taxonomic Info
TDWG/GBIF standard
Basis for TOS
GUIDs GBIF/SEEK etc..
Tools to relate and compare concepts Taxonomy Comparison Visualisation Tool (Napier) Concept Mapper Tool (UNC)
Concept Comparison Visualisation
Taxon Concept SchemaTCS developed to allow exchange of taxonomic
names/concept dataBased on consultation with range of users
understand users’ notions of taxonomic concept what information they consider part of a concept
Presentations at meetings including 2 TDWG Agreement that concepts are important and necessary Taxon Names are independent from Taxon concepts Agreement that observations/identifications etc. should
record concepts not names
TCS XML based exchange schemaNot designed as the “correct way” to model a Taxon
Concept No “rules” as to what a taxon must have
certain things needed to be useful
Design to accommodate different ways concepts described Lots of optionality or flexibility in elements
to address different work practices in the community
Includes Taxon Names are more constrained as they are governed the codes of
nomenclature
Considerable debate on what should be top level elements Related closely to the question
What gets a GUID?
Taxon concepts Taxon Names Specimens Publications Taxon Relationship Assertions
Concepts refer to Names Names must not change Can’t record original taxon concept
TCS
Exchange of DataExchange of definitional data
name definition information on history of name and type specimen and publication details
taxon concept definition Name, publication details for the defining source, characters, specimens,
related taxa etc
Exchange of usage data for observations/lists (should only use taxon concepts)
need only exchange references to existing taxon concepts user readable keys, e.g. Full Scientific name “according to” Author + Publication GUIDs
for name checking purposes need only exchange name without history or typification
user readable keys, e.g. Full Scientific name GUIDs
Issues of GUIDs for integrationWhat gets a GUID?
TCS top level elements?? The “physical thing” or “electronic record of the thing”
What is data and what is metadata associated with the GUID? Depends on your perspective on life…..
Stability of data associated with a GUIDWho issues GUIDs?
Centralised authority of some sort – peer review?? + One GUID per concept or name (no duplicates) + ensure business rules are applied to new names/concepts created - bottleneck? - too restrictive in what the business rules might be
Distributed free for all + Anyone can publish their own name/concept and get a GUID - Mess of GUIDs to sort out
Which technology? LSIDs, DOI etc.
TCS and SEEK and…Taxon Object Server
Core of concept/name resolution service Kansas team has been implementing the TOS Schema based on the TCS model Tool to import data from TCS documents
EML Proposed modifications to EML to accommodate SEEK's taxonomic
resolution services in the future
User interface tools Uses cut down TCS as input format
Inform other biology meta-data standards on taxonomic issues Cataloguing the complete genome standard
Taxonomic Object ServerTOS Allows
registration, retrieval, integration of datasets Matches concepts given names, other concepts and
taxonomies Allow taxonomists to
Author new ideas Make new relationships between concepts
Allow researchers to Easily see previous taxonomic opinions Use a stable identification system to reference concepts
(LSIDs) Find concepts…
Integration with Kepler
TOS operationsVia TCS document
addConcept addRelationship
Public APIs getConcept –on GUID getBestConcept – on name string getHigherTaxon – on GUID and authority – up tree getAuthoritativeList – down tree findConcepts – on any property(s) findRelatedConcepts – on GUID and relationships getSynonymousNames – returns name strings getHigherTaxon getAuthoritativeList
Dictionary for name-concept matching N-gram matching algorithm getBestConcept