species link a system for integrating distributed primary biodiversity data vanderlei perez canhos

Post on 18-Jan-2016

20 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

species Link A System for integrating distributed primary biodiversity data Vanderlei Perez Canhos Centro de Referência em Informação Ambiental, CrIA. Overview. CRIA SinBiota and The Species Analyst speciesLink Type of collections involved Number of records Technical features - PowerPoint PPT Presentation

TRANSCRIPT

speciesLink

A System for integrating distributed primary biodiversity data

Vanderlei Perez Canhos

Centro de Referência em Informação Ambiental, CrIA

OverviewOverview

• CRIA

• SinBiota and The Species Analyst

• speciesLink

• Type of collections involved

• Number of records

• Technical features

• Future plans

CrIAReference Center on Environmental Informationhttp://www.cria.org.br

Focus on Biodiversity

Informatics

• Open source software

• Standards and protocols

• Systems interoperability

• Partnerships

http://speciesanalyst.net/

Location of participant collections: mainly United Statesmainly United States

Taxonomic groups: several taxaseveral taxa

Protocol: Z39.50 (migration to DiGIR on process)Z39.50 (migration to DiGIR on process)

Number of records: ~ 50.000.000~ 50.000.000

Importance of data sharingImportance of data sharing

Paris

KU – Natural HistoryMuseum

British Museum

Field Museum

The main goal of

speciesLink was to

build a distributed

system integrating

several biological

collections and

making their primary

data available on the

Internet.

speciesLinkDistributed Information System for Biological Collections

http://splink.cria.org.br

fish: 3

herbaria: 4 microorganisms: 3

mites: 2

inventories: SinBiota

Geographic distribution of the participant collections – phase I

São Paulo State CollectionsSão Paulo State Collections

Number of RecordsNumber of Records

available existing

Herbaria 72,000 of 740,000

Microorganisms 1,000 of 2,700

Mites 18,000 of 22,000

Fish 70,000 of 123,000

Inventories (species)

38,000 of 38,000

~200,000 of ~1,000,000

Microbial CollectionsMicrobial Collections

CBMAI 110 700

IBSBF 929 2,000

Observational DataObservational Data

SinBiota 38,109 38,109

Botanical CollectionsBotanical Collections

ESA 730 80,000

SP 11,280 350,000

IAC 25,245 45,000

SPF 21,828 133,500

UEC 12,860 130,000

Zoological CollectionsZoological Collections

ACARISJRP 5,382 7,000

ACARIESALQ 12,392 15,000

DSZSJRP(fish)

5,714 23,000

LIRP(fish)

4,314 30,000

MZUSP

(fish)

60,000 110,000

Collection Management SoftwareCollection Management Software

Support to collectionsSupport to collections

• Providing basic equipment and network infrastructure

• Helping to choose a management system, when needed

• Helping to train and to import data, when needed

Protocol and Content SchemaProtocol and Content Schema

• DiGIR protocol (Distributed Generic Information Retrieval)

Potential to be globally accepted

• DiGIR software (Java Portal & PHP Provider)

Collaborative development

• DarwinCore v.2

Covers the basic content elements (taxonomic

identification, location and date of collecting event)

Simple Search Simple Search InterfaceInterface

speciesLink site

Presentation Layer

speciesLink site

Presentation Layer

DiGIRPortal(Java)

DiGIRPortal(Java)

PerlPerl

Slow or unstable connectivity

Fast and stable connectivity

DataSOAP client

CollectionManagement

System

SQL

Collection C

DataRepository

DataSOAP client

CollectionManagement

System

SQL

Collection B

DataRepository

PostgresPHP

Provider

SOAP Server

SQL

Regional Server

DataPHP

Provider

Collection Management

System

SQL

Collection A

System’s System’s ArchitectureArchitecture

RegionalServer

RegionalServer

RegionalServer

RegionalServer

Network DesignNetwork Design

speciesLink site

Presentation Layer

speciesLink site

Presentation Layer

DiGIRPortal(Java)

DiGIRPortal(Java)

PerlPerl

Slow or unstable connectivity

Fast and stable connectivity

DataSOAP client

CollectionManagement

System

SQL

Collection C

DataRepository

DataSOAP client

CollectionManagement

System

SQL

Collection B

DataRepository

PostgresPHP

Provider

SOAP Server

SQL

Regional Server

DataPHP

Provider

Collection Management

System

SQL

Collection A

System’s System’s ArchitectureArchitecture

Data Migration ClientData Migration Client

• Platform independent (java)

• Connects to any database accessible via JDBC(simple text files are also supported)

• Complete control over data

• Low traffic

• Possibility to filter sensitive data using a regular expression

speciesLink site

Presentation Layer

speciesLink site

Presentation Layer

DiGIRPortal(Java)

DiGIRPortal(Java)

PerlPerl

Slow or unstable connectivity

Fast and stable connectivity

DataSOAP client

CollectionManagement

System

SQL

Collection C

DataRepository

DataSOAP client

CollectionManagement

System

SQL

Collection B

DataRepository

PostgresPHP

Provider

SOAP Server

SQL

Regional Server

DataPHP

Provider

Collection Management

System

SQL

Collection A

System’s System’s ArchitectureArchitecture

Regional serverRegional server

Features

• perl / PostgreSQL combination

• Can hold data from several collections

• Interpretation rules can be applied to specific data

PostgresProvider

PHP

SOAP Server(perl)

SQL

Query Result (brief)Query Result (brief)

speciesLink – phase IIspeciesLink – phase II

>35 collections available>35 collections available

Future plansFuture plans

• Mapping tools

Future plansFuture plans

• Mapping tools

• Data cleaning tools

Future plansFuture plans

• Mapping tools

• Data cleaning tools

• Modelling framework

DiGIRPortal

DiGIRPortal Precipitation

Vegetation

Temperature

Environmental layers

ACME

BioclimNeural

Net GARP

specimens

BioCASEPortal

BioCASEPortal

Modelling algoritms

Infrastructure for Species Distribution ModellingInfrastructure for Species Distribution Modelling

Instituto de Botânica Universidade Estadual de Campinas

Universidade de São Paulo

Instituto Agronômico de Campinas

Instituto Biológico

Universidade Estadual Paulista

Acknowledgements (phase I)Acknowledgements (phase I)

Escola Superior de Agricultura “Luiz de

Queiroz”

FellowshipsFellowships

• Visiting researchers

– Andrew Townsend Peterson (3 months)– Arthur Chapman (1 year)

• Pos-doctor

– Ingrid Koch

• Technical training (6 TT fellowships)

Summing upSumming up

• Achieved proof of concept

• Data is already available

• Low cost for connecting new collections

• Triggered off a movement within the collections to improve the quality of data and to increase the amount of available information

• Adoption of standards and protocols

• International partnerships: DiGIR, modelling framework

• Interoperability with similar initiatives

Thank you!Thank you!

http://splink.cria.org.br

vcanhos@cria.org.br

top related