global biodiversity information facility vishwas chavan and nicholas king february 12, 2008...
TRANSCRIPT
GLOBALBIODIVERSITYGLOBALBIODIVERSITY
INFORMATIONFACILITYINFORMATIONFACILITY
Vishwas Chavan and Nicholas King
February 12, 2008
WWW.GBIF.ORG
WWW.GBIF.ORG
GBIF efforts in GBIF efforts in digitizing and digitizing and mobilising primary mobilising primary biodiversity databiodiversity data
GBIF efforts in GBIF efforts in digitizing and digitizing and mobilising primary mobilising primary biodiversity databiodiversity data
GBIF’s MissionGBIF’s MissionGBIF’s MissionGBIF’s Mission
…to make the world’s biodiversity data freely and universally available via the Internet
What is biodiversity?What is biodiversity?What is biodiversity?What is biodiversity?GBIF follows the broadly outlined CBD recognition of levels of biological diversity:
•Molecules / genes•Species•Ecosystems / ecology
Scientists, experts, consultants
Government officials at all levels
Farmers, foresters, indigenous communities
Education at all levels NGOs and the general public
These needs are highly varied, but can be met by open access to the same datasets
The same data can be analysed differently for different uses
Who needs primary biodiversity Who needs primary biodiversity data!data!Who needs primary biodiversity Who needs primary biodiversity data!data!
But this needs easy access to But this needs easy access to (digitised) data(digitised) dataBut this needs easy access to But this needs easy access to (digitised) data(digitised) data
Screen shot: 26 Oct 2007
As of As of end 07 end 07 GBIF GBIF facilitatfacilitates es access access to 142 to 142 million million primary primary data data recordsrecords
GBIF Data Portal: Dispelling Mythes!GBIF Data Portal: Dispelling Mythes!
Searches Taxonomic Geographic (by country or
bounding-box) By dataset
Taxonomic browse navigation using choice of classification
Integration of data: DiGIR-Darwin Core & BioCASe-ABCD (new versions), TAPIR, tab-delimited, TCS, SDD
Search and download by one to many species, geography, dataset (or combination)
Web services
Distributed, Distributed, Decentralised, Data Decentralised, Data Discovery and Discovery and Access through Access through network of network of heterogenous and heterogenous and multicultural multicultural partners is possible!partners is possible!
Countries are organised alphabetically on the lhs,
and show numbers of national records on the
rhs.
Here we can see that there are more than 3.2
million records available for South Africa
(2,8 million with coordinates), referring to nearly 41.500 species
Example of a country summary page. This
map provides an overview of the
density of records currently available.
Sample of records available for South Africa at
September 2007. The GBIF portal offers a range of
options for further use of the data…
It is also possible to get the full list of organisations providing data collected in a specific country or
region
In this case 68 collections from all over the world are making available data for South Africa through GBIF – a good exemplar of data
repatriation activities promoted and facilitated
by GBIF
South African institutions
are also providing data
relevant to other countries
and regions in the world, as
demonstrated in this
example from the Shark
Collection at the Iziko South
African Museum
The GBIF data portal also allows
for more detailed views of
regions, datasets, taxonomic
groups, etc.
Here it is possible to see nearly
100 000 records from the
Linefish dataset collected in
1989 by the Marine and
Coastal Management (MCM)
at the Department of
Environmental Affairs and
Tourism in South Africa
Exporting data from
the GBIF data portal
to other applications
such as Google
Earth is a matter of
a click!
Coverage for AfricaCoverage for AfricaCoverage for AfricaCoverage for Africa
>5m records currently for Africa
> 1m from EU country institutions
Estimated >100m not yet digitised
Within Google Earth overlays it is
also possible to go down to the level
of individual primary records, getting
back to the original data provider
With the filter functionality it is possible to
perform complex queries on the data.
In this example we are looking for all records
on Lepidoptera (butterflies) collected or
observed in South Africa from 1950 to 2000.
Distances moved (km)
Average altitude
(m)
Average latitude
(°S)Present 0 88.57 33.21
20% 25.3 113.83 33.4340% 20.0 137.93 33.5960% 17.2 194.85 33.7280% 46.4 269.91 33.98
100% 17.4 296.06 34.09
Leucospermum tomentosum: range centres in 10 year time slices
But, this is just a But, this is just a beginning.......beginning.......But, this is just a But, this is just a beginning.......beginning.......
We need to cover much We need to cover much beyond imagination, and beyond imagination, and much much faster than we much much faster than we think?think?
We need to cover much We need to cover much beyond imagination, and beyond imagination, and much much faster than we much much faster than we think?think?
Biological Data Domain - Biological Data Domain - challengeschallengesBiological Data Domain - Biological Data Domain - challengeschallenges
Persistent digital and physical data stores, moderately accessible
Migration of legacy data, metadata generation, taxonomy (species)
80% ? digitalEcological & Ecosystem Data
Persistent physical data stores, accessible with difficulty
Digitisation, migration of legacy data, indexing
<5% digitalSpecies- & Specimen Data
Persistent digital, universally accessible data stores
Data migration, cleansing, vouchering, taxonomy (gene & species)
95% digitalMolecular Sequence & Gene/Genome Data
Sub-domain Digital Status
Greatest Informatics Problems
Data Status
Primary Biodiversity DataPrimary Biodiversity DataPrimary Biodiversity DataPrimary Biodiversity Data
• Both biodiversity and biodiversity data are unevenly distributed around the world:
Developing WorldDeveloping World
BiodiversityBiodiversity
Biodiversity Biodiversity DataData
Developed WorldDeveloped World
Digital Divide Content Divide Lingual Divide
Knowledge Divide
Emerging catastrophe…………
Primary Biodiversity DataPrimary Biodiversity DataPrimary Biodiversity DataPrimary Biodiversity Data
Observations / Monitoring
Multimedi
a Resource
s
Biological Collections
NNAAMMEESS
NNAAMMEESS
Growth rate of GBIF data Growth rate of GBIF data sharingsharingGrowth rate of GBIF data Growth rate of GBIF data sharingsharing
Growth in Data Sharing Oct 2003 - Oct 2007
0
50
100
150
200
250
Data
Pro
vid
ers
0.0
20.0
40.0
60.0
80.0
100.0
120.0
140.0
160.0
Data
Reco
rds (in
millio
ns)
Providers Records
1 Billion Record by 2008 – We need to 1 Billion Record by 2008 – We need to expedite!expedite!1 Billion Record by 2008 – We need to 1 Billion Record by 2008 – We need to expedite!expedite!
Many specimens remain to have their data digitised
Many records are already digital...
… but are not yet being shared
Goal for Growth in Occurrence Data* by End 2008
0
200
400
600
800
1000
1200
1400
1600
1800
Oct-03
Jan-
04
Jan-
05
Jan-
06
Jan-
07
Oct-07
Feb-0
8
Dec-08
Dat
a P
rovi
der
s
0.0
100.0
200.0
300.0
400.0
500.0
600.0
700.0
800.0
900.0
1,000.0
Data R
ecord
s (in m
illion
s)
Providers Records
* data useful in analyses that contribute to sustainable management of biodiversity* data useful in analyses that contribute to sustainable management of biodiversity
GBIF is all about our shared GBIF is all about our shared vision and partnershipvision and partnershipGBIF is all about our shared GBIF is all about our shared vision and partnershipvision and partnership
28 Voting Country Participants
15 Associate country Participants
35 International Organisations and Economies
GBIF Working PrinciplesGBIF Working PrinciplesGBIF Working PrinciplesGBIF Working Principles
Collaboration and sharing — notnot compilation Ownership of data (specimens oorr names)
remains entirelyentirely with providers
Standardised schemata for data sharing — software free to providers
Worldwide network of collaborating institutions that share data (data providers)
GBIF’s Participants’ Nodes promote and coordinate activities of data providers
GBIF Working PrinciplesGBIF Working PrinciplesGBIF Working PrinciplesGBIF Working Principles
Procedures for interoperability and data integration
Web services (mostly for machines, but for people too)
Global registry for advertisement of shared data
Vision and coordination GBIF has a unique global mandate in both
Informatics and Content GBIF is a multi-purpose, open-ended cyber-
infrastructure that facilitates biologists serving biodiversity and society in new ways
GBIF Strategic Areas 2007 – GBIF Strategic Areas 2007 – 2011 2011 GBIF Strategic Areas 2007 – GBIF Strategic Areas 2007 – 2011 2011
Informatics Data portal powerful and friendly Consolidated infrastructure and standards Tools and support for Nodes and providers
Content Data quantity and richness in priority areas Data integration and discovery Documented data quality
Participation Nodes' expertise shared across the network Guidance on setting up and maintaining
Nodes
• In a database, the data have no actual quality or value; they only have potential value. That value is realized only when someone uses the data to do something useful (English 1999).
•The quality of data cannot be assessed independently of the uses of that data (Strong et al. 1997).
•Data are of high quality if they are fit for their intended use in operations, decision-making, and planning (Juran 1964).
Data: Fitness for UseData: Fitness for Use
Data standards / protocols used Data standards / protocols used by GBIFby GBIFData standards / protocols used Data standards / protocols used by GBIFby GBIF Darwin Core (TDWG data standard)
Simple XML data model to represent taxon occurrence records (only core attributes)
Extensions to handle e.g. curation details, geospatial data, microbial specimens
ABCD - Access to Biological Collection Data (TDWG data standard)
More complex XML data model to represent collection or observation data
Detailed document structure including features for different communities
Taxon Concept Schema (TDWG data standard) XML data model for exchange of nomenclatural/taxonomic data Will be supported in new GBIF data portal
Tab-delimited links to species information Lists of scientific names, URLs and key words Will be supported in order to establish links to external resources
from the new GBIF data portal
DiGIR / BioCASe / TAPIR (TDWG access protocols) XML protocols for searching remote data resources Suitable for use with a wide range of different data models TAPIR (latest version) supports flexible views and simple URLs
SPICE protocol (Species 2000 access protocol) Web service interfaces for exploring taxonomic data (hierarchies,
synonymy, common names) Will be supported for connecting data resources to new GBIF
data portal
LSIDs – Life Science Identifiers (TDWG-adopted GUID mechanism)
Globally unique identifiers to simplify tracking data records Include protocol for resolving data for any LSID
Data standards / protocols used Data standards / protocols used by GBIFby GBIFData standards / protocols used Data standards / protocols used by GBIFby GBIF
Examples of resources provided Examples of resources provided by GBIFby GBIFExamples of resources provided Examples of resources provided by GBIFby GBIF
freefree
GBIF Training Manual 1: GBIF Training Manual 1: Digitisation of Natural History Digitisation of Natural History CollectionsCollections
GBIF Training Manual 1: GBIF Training Manual 1: Digitisation of Natural History Digitisation of Natural History CollectionsCollectionsCONTENTS Introduction The Uses of Primary Species Occurrence Data Initiating a Natural History Collection Digitisation Project Principles of Data Quality Principles and Methods of Data Cleaning BioGeomancer Guide to Best Practices for Georeferencing Guide to Best Practices for Generalizing Glossary and Acronym Expansion
To be released by end February 2007.
Observational Data Task ForceObservational Data Task ForceObservational Data Task ForceObservational Data Task Force
Quantum of observational data is unprecedented Over 60% of GBIF mediated data is observational
Observational Data Task Group• Recommend GBIF on mobilisation of observational data• Criteria for Observational Data Sharing Infrastructure• Metadata Schema for Observational Schema• Protocols / Standards for observational data exchange / sharing• Best Practices Guide for observational data management• Encourage participation of potential data providers
Report by September 2008
Broader range of supported import formats and protocols Occurrence data
Darwin Core (original v1.2, MaNIS, OBIS, new v2.0 with extensions)
ABCD (v1.20, v2.06) Taxonomic data
Catalogue of Life CD-ROM (moving to dynamic checklist) Nomenclators via tab-delimited lists of LSIDs (work under way) Data from ECAT projects (models and tools under way)
Other resources Discussions under way with other resources (GenBank, BOLD,
ARKive) General support for handling XML and tab-delimited formats
Enhanced support for data Enhanced support for data providersproviders
Enhanced support for data Enhanced support for data providersproviders
Validation and annotation of data during indexing Presence of required fields Consistency between country name and
coordinates Reports for data providers
Clear separation between “raw” and “processed” index data Scientific name string versus interpreted taxon Country name string versus interpreted country
“Home page” for each data resource
Enhanced support for data Enhanced support for data providersproviders
Enhanced support for data Enhanced support for data providersproviders
Training, Capacity Building, Training, Capacity Building, MentoringMentoringTraining, Capacity Building, Training, Capacity Building, MentoringMentoring
Training programs on how to share data Training on Ecological Niche Modeling Mentoring to developing countries Help Desk services
Call for Action!Call for Action!Call for Action!Call for Action!
With GBIFs’ decentralised approach of NBIFs, RBIFs, and ThBIFs Africa has lots to contribute.....
Individual, institutional, national, regional and global level!
With GBIFs’ decentralised approach of NBIFs, RBIFs, and ThBIFs Africa has lots to contribute.....
Individual, institutional, national, regional and global level!
How to contact GBIF:How to contact GBIF:How to contact GBIF:How to contact GBIF:
Web site: www.gbif.org Data portal: www.gbif.net
GBIF SecretariatUniversitetsparken 152100 CopenhagenDenmark
E-mail: [email protected]: +45 3532 1470Fax: +45 3532 1480
GBIF Secretariat building, supported by a grant from the Aage V. Jensens Fonde