martin graham & jessie kennedy edinburgh napier university vesper visual exploration of...

16
Martin Graham & Jessie Kennedy Edinburgh Napier University VESPER Visual Exploration of Species- Referenced Repositories

Upload: hilda-welch

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Martin Graham & Jessie KennedyEdinburgh Napier University

VESPERVisual Exploration of Species-Referenced

Repositories

• VESPER – an exploration into data quality issues for Darwin Core Archives (DWCA)

• DWCA’s are files for storing detailed species-based data sets

• How does a user know which data sets are useful and complete?

Introduction

• GBIF has tools to test DWCA validity

• This work is about visualising data we assume is “valid” but are unsure of “usefulness”– Taxonomy is broken– Dates are wrong– Lions in the sea

• In many cases the usefulness of such data is only seen when visualised in context

Valid vs. Useful

• Web-based visualisation of DWCAs– Uses HTML5

• SVG, CSS3, FileWriters, ArrayBuffers– D3 toolkit– Client side only

• Visualise basic dimensions of data– Taxonomy– Geography– Time– & Miscellaneous Stats

Approach

Darwin Core Archives

Meta.xml

Eml.xml

CoreTaxa/Occurrence

Data

Extension

Extension

Meta Files (XML)

Data Files (CSV)

Describ

es

Exactly one

Zero or more

Extension ID == Core ID

• Zip files make things smaller– Good for network transport– But analysing the data means we have to

make things big again

Zapped by Zip

Expand a lot

Expand even more(String copying, UTF-16 etc)

• Partial Unzip• Analyse fields listed in meta file

– Disregard verbose fields• Find combinations of fields that can

be used to generate a visualisation

• List choice of available visualisations for a meta.xml and just extract chosen fields

Zip Zapped

Implicit Taxonomy

acceptedNameUsageID, parentNameUsageID

Explicit Taxonomy

Any of Kingdom, order, family, genus etc

Map decimalLongitude, decimalLatitude

Timeline eventDate

• Sunburst / Icicle plot – Some difficulties with high fan-out taxa– Though a lot of these are data quality issues

Taxonomy

• Sunburst / Icicle plot – Some difficulties with high fan-out taxa– Though a lot of these are data quality issues

Taxonomy

• Based on popular leaflet.js library– And Markercluster plugin– Some adaptations to show selected items

Geography

• Simple bar chart– With rangeslider– Zoom in and see yearly patterns (i.e not much

at xmas)

Temporal

• Sanity check - Empty data count

Miscellaneous

• Taxonomic fan-out for hollow curve anomalies

• Export selected IDs– These can be saved or sent somewhere else

Miscellaneous

• Selections in one view are reflected in the other views for the same data– Multiple views, linking

Selection

• Javascript visualisations for DWCA archives

• Quickly shows areas of quality issue

• Can handle large archives if only key fields are analysed

Conclusion

• http://www.soc.napier.ac.uk/~cs22/vesperDemo/vesper/demoNew.html– Feedback welcome

• Thanks to GBIF, Canadensys, EMBL for data

• Funded by BBSRC

• Ask for a demo

Fin