example projects using metadata and thesauri: the biodiversity world project richard white cardiff...

Download Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK

If you can't read please download the document

Upload: camron-pearson

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

3 Some difficult biodiversity questions How should conservation efforts be concentrated? (example of Biodiversity Richness & Conservation Evaluation) Where might a species be expected to occur, under present or predicted climatic conditions? (example of Bioclimatic & Ecological Niche Modelling) How can geographical information assist in inferring possible evolutionary pathways? (example of Phylogenetic Analysis & Palaeoclimate Modelling)

TRANSCRIPT

Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK 2 The Biodiversity World project 3 year e-Science project funded by the UK BBSRC research council, Universities of Cardiff, Reading and Southampton The Natural History Museum (London) 3 Some difficult biodiversity questions How should conservation efforts be concentrated? (example of Biodiversity Richness & Conservation Evaluation) Where might a species be expected to occur, under present or predicted climatic conditions? (example of Bioclimatic & Ecological Niche Modelling) How can geographical information assist in inferring possible evolutionary pathways? (example of Phylogenetic Analysis & Palaeoclimate Modelling) 4 Point data from various herbaria 5 GARP prediction of climatic suitability 6 Distribution data from ILDIS database 7 Types of resource used in these biodiversity studies Data sources: Catalogue of Life (names of species: Species 2000, GBIF) Biodiversity data Descriptive data Distribution of specimens and observations Geographical data Boundaries of geographical & political units Climate surfaces Genetic sequences Analytic tools: Biodiversity richness assessment various metrics Bioclimatic modelling bioclimatic envelope generation Phylogenetic analysis (generation of phylogenetic trees) 8 Some challenges Finding the resources Knowing how to use these heterogeneous resources Originally constructed for various reasons Often little thought was given to standards or interoperability 9 The Biodiversity World vision (1) Problem Solving Environment for Biodiversity studies Heterogeneous diverse resources Facilitating integration of both legacy and newly-developed resources Flexible workflows Main challenges centre around interoperability, resource discovery, metadata, etc; High-performance computing secondary (though relevant) Our architecture 11 Biodiversity World as a flexible PSE Species 2000 & ITIS Catalogue of Life Analytic tool Thematic data source BDW Grid Ontology: Metadata Resource & analytic tool descriptions Maintenance tools Wrapper Abiotic data source User Local tools Problem Solving Environment user interface (Triana) Problem Solving Environment: Resource discovery Support for workflows Wrapper Analytic tool GSD User interaction with BDWorld 13 Example work-flow (Climate-space Modelling) Projection Prediction Species 2000 Localities Climate Space Model Base Maps Climate Submit scientific name; retrieve accepted name & synonyms for species Retrieve distribution data for species of interest Present or recent climate surfaces Model of climatic conditions where species is currently found Possibly different climate surfaces (e.g. predicted climate) World or regional maps Prediction of suitable regions for species of interest Projection of predicted distribution on to base map 14 BDWorld / Triana in operation: Workflow creation (design, editing) 15 Triana screen-shots 16 Triana screen-shots 17 Triana screen-shots 18 Triana screen-shots 19 Triana screen-shots 20 Triana screen-shots 21 BDWorld / Triana in operation: Workflow execution (enactment, run-time) 22 Triana screen-shots 23 Triana screen-shots 24 Triana screen-shots 25 Triana screen-shots 26 Triana screen-shots 27 A dream A desktop environment in which scientists can drag & drop data sources, analysis and modelling tools and visualisation interfaces into a desired sequence of operations which can be run automatically BDWorld just about at this stage With additional features, the environment could be made richer, more productive, and support research groups. Essentially a component-based visual programming environment Not just for biodiversity! 28 Role of metadata Metadata is needed to enable discovery of resources and to indicate how they are to be used Properties to help locate appropriate resources Check interoperability, suggest transformations Provenance of data sets Log of work-flows executed 29 Resources have to be matched To the users requirements To the capabilities of the users workstation environment To each other, so that data sets generated by one task can be used by another 30 Finding a resource that matches the users requirements Metadata is stored when a resource is registered This metadata is used to find a resource which meets the users needs (possibly interpreted with the help of an ontology) can run in the users environment (users have to register their metadata too) 31 Metadata about resources Description, functionality Input and output data sets User interaction, if any Platform, requirements, restrictions Quality & reputation 32 Users needs What the resource does (or data source delivers) Algorithm used Whether it uses the right data type Quality, reliability, reputation 33 Matching resources to users Users have varying capabilities and privileges which may affect their ability to use resources which: run on specific platforms only have IPR or cost limitations imposed on their use interact with their user locally in real time have other unexpected requirements 34 The users environment Platform: OS, supporting software Privileges, licences held, etc. Connection (bandwidth etc.) Workstation hardware (display, memory, speed, etc.) 35 Matching resource inputs and outputs The output of an earlier task may be the input of a later one Thus inputs and outputs of resources have to be tested for matching The only real criterion for this is the later resource it has been programmed to read a data set, and will complain if it isnt suitable However 36 Matching input and output data sets Can be done with various levels of rigour: Is the same word used to describe their type? Do they have the same schema? Do they have schemas which contain the same elements? Do they have schemas which can be proved to be equivalent? (this is very hard) Are there additional parameters which have to match? (e.g. matrix dimensions) 37 Transforming data sets If the data sets dont match, the metadata may allow the workflow designer to supply parameters to the wrapper to adjust its generation of output data or its interpretation of input data choose a transformation tool which can be inserted into the workflow called as a local tool on the users workstation control a more flexible data transformation tool 38 Summary Need metadata about Resources Operations Data set types (and schemas) Conversion tools Users (and their workstations)