the emerging biodiversity data ecosystem

22
The emerging biodiversity data ecosystem Cynthia Parr, Katja Schulz, Jennifer Hammock Smithsonian Institution Nathan Wilson, Patrick Leary Marine Biological Laboratory Richard Allen Environmental Protection Agency

Upload: cyndy-parr

Post on 21-Aug-2015

653 views

Category:

Technology


0 download

TRANSCRIPT

The emerging biodiversity data ecosystem

Cynthia Parr, Katja Schulz, Jennifer Hammock Smithsonian Institution

Nathan Wilson, Patrick LearyMarine Biological Laboratory

Richard AllenEnvironmental Protection Agency

Today’s story

What is EOL

Core questions

Network analysis

Hotlist development

Page richness algorithm

Conclusion: improving the health and richness of our knowledge network advances understanding

What is EOL

http://www.eol.org• Global access to knowledge

about life on earth• All species• Freely accessible & reusable:

open access, open source• Available from a single portal

in a common format• Quality• Always growing

EOL Topics

Associations Behaviour ConservationStatus Cyclicity Cytology DiagnosticDescription Diseases Dispersal Distribution Evolution GeneralDescription Genetics Growth Habitat Legislation LifeCycle LifeExpectancy LookAlikes Management Migration MolecularBiology Morphology Physiology PopulationBiology Procedures Reproduction RiskStatement Size Threats Trends TrophicStrategy Uses Description Conservation Key Biology Ecology Introduction Education Barcode CitizenScience EducationResources Genome NucleotideSequences FunctionalAdaptations FossilHistory SystematicsOrPhylogenetics Development IdentificationResources

Content providersDatabasesJournalsLifeDesksPublic contributions

Curating

CommentingTagging

http://www.eol.org

EOL is a content curation community

Aggregation

Core questions

Where is our knowledge about biodiversity?

Where are the gaps?

What are the most effective ways to fill gaps given our limited resources?

Network analysis

EOL

GBIF

NCBI

with Anne Bowser, University of Maryland

EOL connects hubs

The GBIF hub has subnetworks

Key individuals seek out hubs

TOLWeb

Implications and next steps

Need more data

Identify isolated projects & mechanisms for connecting them to the network

Improve resilience & redundancy

Distribute annotation & quality control

Model data flow quantity and impact

Viewer of Life on EOL – Kris Urie

Low % of descendents with text in Arthropods

Within arthropods coverage varies . . . Perhaps as expected

http://synthesis.eol.org/media/treemap/

Developing the EOL hot list

Consultation with taxonomic experts

Development of criteria

Assembly of critical lists

Establishing targets for rich taxon pages, lesser known pages

EOL’s hot lists

Hot List

70,000 taxa

Conservation concern

Invasives

Model organisms

Ecologically important

Pests

Charismatics

Data availability

Red Hot List

2,800 taxa

Most searched

Top 100 invasives

Crops (food)

Zoos & aquaria

High traffic

Higher taxa

Taxon page richness algorithm

a (Breadth) b (Depth) c (Diversity)+ +

Breadth: Images, topics of text objects, references, maps, videos, sounds, conservation status

Depth: # words per text object, # words total

Diversity: Sources (partners)

60% 30% 10%

0 – 1, Threshold 0.4

Summary of EOL page richness

Overall

640,000 have content

2 % are rich

25 % have only links

to literature

Hot List

28 % of 75K are rich

Average richness = 0.30

Red Hot List

56 % of 3K are rich

Average richness = 0.43

Strategies for improving richness

Crowd-sourcing

Collections

Communities

Mobile apps

Leveraging

Enabling platforms

Enabling journals

Data mining BHL etc.

Version 2Coming in Fall

2011!

The page richness index

Helps fill gaps with existing knowledge

Helps prioritize funding and training so that it has maximum impact on closing true gaps

Will be available via API

Computing and storing richness index on EOL is a step towards storing and serving computable data

Summarize data within a partner, then across partners.

For example: compute an average value for one taxon (x specimens), compare to range of values across all taxa (621,393 samples)

Dynamic data summaries = new knowledge

Jen Hammock (EOL)Edward van den Berge (OBIS)

Atlantic CodGadus morhua

Conclusions

There is a lot of data out there in a lot of knowledge bases

Understanding how it is connected can help us improve the ecosystem

• Quality control

• Resilience

• Richness assessment

Large-scale data summaries can foster gap-filling and standing, dynamic knowledge analyses

Thank you

http://www.eol.org

160+ content partners

2000 Flickr contributors

1000s Wikipedia contributors

43,000 EOL members

Funding:John D. and Catherine T. MacArthur Foundation, Alfred P. Sloan Foundation, Cornerstone Institutions, Private Donors

Leadership: Erick Mata, Bob Corrigan, Mark Westneat, Marie Studer, Tom Garnett, Jim Edwards, David Patterson,

Developers: Peter Mangiafico, Jeremy Rice, Dimitri Mozzherin, David Shorthouse, Lisa Whalley and others

Biologists: Tanya Dewey, Audrey Aronowsky, Leo Shapiro

See Demo and Version 2 sneak peak in Software Bazaar