giddens ecn2013

Post on 23-Jan-2015

238 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Getting collection data, maps, and images online via open source and commercial

solutions

Michael Giddens

Software developer with a focus in biodiversity informatics.

Follow me @silverbiology

What we do

• Design workflows and software to optimize image capture

• Analyze & Process label images• Create portals for entomological and scientific

collections• Develop interactive maps to tell stories about

data• Provide support and technical advice for NSF

projects

Digitization & Data Capture

• Seconds count• 28,800 seconds in an 8 work day• 100k @ 30 seconds = 34.7 days• 100k @ 29 seconds = 33.5 days• …• 100k @ 15 seconds = 17.3 days• Humans are not robots

Solutions

• Look at every action as a micro task• Find tasks to fill any wait time• Stick to a single workflow• Filename conventions are important• Stick with image sizes and formats needed• Renaming filenames using scanners or data

entrye.g. SilverImage

• Backup Images!!!

Things we learned

• Make sure your lighting environment does not change

• Dragon dictation is not accurate enough for number or scientific words

• Manually renaming files is slow• Some student workers do not care as much as

you do about your collection• People get burned out

Data Processing

• Optical Character Recognition Engines• Machine Learning• Crowd Sourcing• Human in the Middle

Optical Character Recognition Engines

• Free– Tesseract

• Commercial– OmniPage– Abbyy

• Services– www.silverbiology.com

• Font Training• No handwriting solution on market

Machine Learning

• Data Dictionaries• Conditional logic / Decision Trees• Past data to predict future data• Label / Word Boundaries• Orientation

Crowd SourcingNotes From Nature

• http://www.notesfromnature.org

Calbug – Essig Museum Collections

Ornithologicalfrom Natural History Museum

ALA Volunteer Program

• http://volunteer.ala.org.au

Human In The Middle

• Rotating Images• Tagging Areas• Metadata tagging• Identifying False Positives• Verification Steps• Bulk Validation

Web Portals• In-House• Specify 6 Portal• Symbiota• SilverCollection– California Academy of Sciences– Angelo State Natural History Museum– Louisiana State Arthropod Museum– Kansas State University Entomology Dept.– Mississippi Entomological Museum– NLBIF

Explore / Browse

• Taxonomy• Taxonomy (Filtered)• Family• Genus• Type Status• Regions• Collectors• Custom

Custom Checklists

Spreadsheet Format

Collecting Events

Images

Specimen Details

Reports

Interactive Maps

• Online service to Map, Analyze and Build applications with your data

• Simple to use• Easily create distribution maps, heat maps,

and category maps• Access to full geospatial query engine• Visualizing ecological models• Works well with lots of data

GBIF - 350 Million Records

http://www.gbif.org/occurrence

Visualizing two months in the life of seagull Eric

Blog on Lifewatch by Peter Desmet

Interactive Occurrence Data

Interactive Map Modes

Density Maps Polygons Grids

Useful Tools Provided By the Global Biodiversity Information Facility

http://tools.gbif.org

• Darwin Core Archive Assistant• Darwin Core Archive Validator• Higher Taxonomy Services• Name Finder• Name Parser• GBIF API Services• Integrated Publishing Toolkit (IPT)

Global Names Architecturehttp://www.gloablnames.org

• Global Names Recognition and Discovery• Global Names Index

Questions?

Michael Giddens www.silverbiology.com

top related