Download - From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records
![Page 1: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/1.jpg)
E X T R A C T I N G O B S E R V A T I O N S F R O M C E N T U R Y- O L D F I E L D N O T E B O O K S
What Henderson Saw
Andrea ThomerUIUC, Gaurav VaidyaCU-B, Robert GuralnickCU-B, David BloomUC-B & Laura RussellKU
![Page 2: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/2.jpg)
or
![Page 3: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/3.jpg)
M I N I N G T H E J U N I U S H E N D E R S O N F I E L D N O T E S F O R S P E C I E S O C C U R R E N C E R E C O R D S
From documents to datasets
Andrea ThomerUIUC, Gaurav VaidyaCU-B, Robert GuralnickCU-B, David BloomUC-B & Laura RussellKU
![Page 4: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/4.jpg)
Field notes and Biodiversity science
• Field work is central to biodiversity work• Field notes: • Are central to field work• Are typically stored in archives• But contain data• Data wants to be free!
![Page 5: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/5.jpg)
Biodiversity science and “first person precision”
• We often forget that field notes store data
• Value of field notes is in the combination of qualitative/quantitative data (Kramer, 2011)
• Grinnell: “first person precision” (1912)
• How do we free the data, while also preserving the record of its context of production?
![Page 6: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/6.jpg)
Junius Henderson
• A typical natural history “old-timer” • Had a mustache• wore suspenders• wrote snarky comments in his
field notes about young whippersnappers and trains
• Studied clams
![Page 7: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/7.jpg)
Influential in small but lasting ways, but not well-known beyond Boulder
![Page 8: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/8.jpg)
Henderson’s field notes
• 13 notebooks, 1 locality notebook• 1672 pages of notes total• Prolific collector• numerous photographs• 1905: Began field work for CU Museum• 2000-2002: Transcribed by Dr. Peter Robinson• 2006: NSIDC scanned the Henderson notebooks• 2011-2012: annotation and data extraction
![Page 9: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/9.jpg)
The Henderson Field Note Project
• Were looking for a low-tech digitization project• Rob knew of the existence of the transcribed
notes• “What we can accomplish with five hours of work
each?”• Goals:• Make notes freely available• Try to engage volunteers on the internet• Produce one “neat thing” (a visualization, a map, etc)
![Page 10: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/10.jpg)
Challenges in making notes available
• No time!• No resources!• No time!• No repository!• No platform!• No time!
![Page 11: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/11.jpg)
Solutions to challenges (ver. 1)
• No sleeping!• Use free resources!• Guerrilla takeover of Wikisource!• Profit!
![Page 12: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/12.jpg)
Wikisource
• Part of Wikimedia Foundation, as is Wikipedia• Has its own “collections” or “accessions” policies• All docs from before 1923• Post-1922: Documentary sources, peer-reviewed
scientific research, analytical & artistic works
• Support for “adding value” via transcription, translation, annotation, and more
![Page 13: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/13.jpg)
Basic Project Steps
• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation • Advertise project; attract volunteers• Write simple script to extract annotations• Publish those via IPT installation as a DwC-A• Sleep
![Page 14: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/14.jpg)
Basic Project Steps
• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation • Advertise project; attract volunteers• Write simple script to extract annotations• Publish those via IPT installation as a DwC-A• Sleep
![Page 15: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/15.jpg)
![Page 16: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/16.jpg)
Basic Project Steps
• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation • Advertise project; attract volunteers• Write simple script to extract annotations• Publish those via IPT installation as a DwC-A• Sleep
![Page 17: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/17.jpg)
Annotation Templates
• Anyone can annotate the transcribed to tag elements • Ex. “I saw a white-tailed jack rabbit”
“I saw a {{taxon|Lepus townsendii|white tailed jack rabbit}}.”
![Page 18: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/18.jpg)
Annotation Templates
{{taxon|Lepus townsendii|white tailed jack rabbit}}.
Type of annotation Wikipedia link verbatim textWikipedia link
Note: “white tailed jack
rabbit” would work here as well.
![Page 19: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/19.jpg)
Basic Project Steps
• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation • Advertise project; attract volunteers• Write simple script to extract annotations• Publish those via IPT installation as a DwC-A• Sleep
![Page 20: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/20.jpg)
Basic Project Steps
• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation • Advertise project; attract volunteers• Write simple script to extract annotations• Publish those via IPT installation as a DwC-A• Sleep
![Page 21: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/21.jpg)
![Page 22: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/22.jpg)
Basic Project Steps
• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation • Advertise project; attract volunteers• Write simple script to extract annotations• Write complex scripts to extract annotations and
compile them into occurrences• Extensively review occurrences• Taxonomic referencing• Publish those via IPT installation as a DwC-A• Sleep
![Page 23: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/23.jpg)
Taxonomic Referencing
• Remember that “Wikipedia link”?• We want to check if that is a valid taxonomic
name• How?• Easy, right? Just check against a resolver!
![Page 24: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/24.jpg)
Taxonomic Referencing
• Remember that “Wikipedia link”?• We want to check if that is a valid taxonomic name• How?• Easy, right? Just check against a resolver!• Hard! Which resolver? How to verify?
1)Check name against ITIS and EOL.2)Possible outcomes:
a) Both concordant! YAY!b) No results from both. Boo!c) Discordant results. Need HUMANS!
3) This was LOTS of work (thanks, Gaurav!)
![Page 25: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/25.jpg)
Basic Project Steps
• Upload notebooks to Wikisource• Match transcriptions to scans by hand• Create templates to support annotation • Advertise project; attract volunteers• Write simple script to extract annotations• Write complex scripts to extract annotations and
compile them into occurrences• Extensively review occurrences• Taxonomic referencing• Publish those via IPT installation as a DwC-A• Sleep
![Page 26: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/26.jpg)
Results!
• 3 Notebooks posted and fully annotatedNotebook 1 Notebook 2 Notebook 3
Downloaded on March 27, 2012 March 27, 2012 March 27, 2012
Pages processed 112 of 114 120 of 123 120 of 122
Number of entries 62 of 64 62 of 63 98 of 99
Number of annotations 632 703 1007
Taxon annotations 349 (201 unique) 224 (125 unique) 514 (248 unique)
Place annotations 219 (115 unique) 419 (154 unique) 401 (139 unique)
Date annotations 64 (63 unique) 60 (59 unique) 92 (90 unique)
Dates in range July 1905 to April 1907
May 1907 to October 1908
January 1909 to September 1909
![Page 27: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/27.jpg)
Results!... With caveats
• 3 Notebooks posted and fully mostly annotated• 1076 occurrences extracted• A published Darwin Core Archive!
• Most of our project’s Skype calls were about Dwc term use
• A ZooKeys paper (hopefully)• A lot more questions….
![Page 28: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/28.jpg)
What challenges remain?
• How do we georeference these occurrences?
• How to we maintain ties between DwC records and field notes?
• How do we assign unique identifiers to wiki tags?
• Is Wikisource the best place for this data?
![Page 29: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/29.jpg)
Why this could work for you too:
• Wikimedia projects really are community driven
![Page 30: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/30.jpg)
Why this could work for you too:
• Wikimedia projects really are community driven• We can all be a part of this community – if we do
the work
![Page 31: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/31.jpg)
Why this could work for you too:
• Wikimedia projects really are community driven• We can all be a part of this community – if we do
the work• Your lab, archive or library has as many or more
potential contributors as our project
![Page 32: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/32.jpg)
Why this could work for you too:
• Wikimedia projects really are community driven• We can all be a part of this community – if we do
the work• Your lab, archive or library has as many or more
potential contributors as our project• There are many flexible transcription platforms in
addition to Wikipedia
![Page 33: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/33.jpg)
This entire project was only possible because people had
been making small steps towards digitization over the
last 10 years
![Page 34: From documents to datasets -- mining the Junius Henderson Field Notes for species occurrence records](https://reader036.vdocuments.site/reader036/viewer/2022081515/554e8707b4c90526358b4740/html5/thumbnails/34.jpg)
Questions?
• References:• Grinnell J (1912) An Afternoon’s Field Notes. The Condor,
14(3), 104-107. Retrieved from http://www.jstor.org/stable/1362226.
• Kramer KL (2011) The spoken and the unspoken. In M. R. Canfield (Ed.), Field Notes on Science & Nature. Cambridge, Massachusetts: Harvard University Press.
• For more about Henderson, see our blog! http://soyouthinkyoucandigitize.wordpress.com/category/henderson-project/