bhl tech status update tech director w.ulate 2015.12.11
TRANSCRIPT
Status Update: what are BHL’s current priorities?
William Ulate, Martin Kalfatovic
S. Dillon Ripley CenterSmithsonian Institution, Washington, DC
12 November 2015
Technical Group at MBG
Mike LichtenbergDeveloper
Trish Rose-SandlerData Analyst
William UlateTechnical Director
Technical Support
MBG IT DivisionManage servers, systems and
telecommunications.Installs software needed
And others:MBLSmithsonianInternet Archive
BHL-AustraliaBHL-Europe
Technical Advisory Group
BHL Work Plan 2015 - 2017http://bit.ly/2136eP7
Priorities
BHL Infrastructure at SILDOI’s for ArticlesNEH Art of Life ٧IMLS Purposeful Gaming To end by Nov.
30thIMLS Mining Biodiversity To end by Dec.
31st
(1 year No-cost ext.)
Art of Life
Art of Life
Art of Life
Currently….
We have now processed the 35 million pages that we committed to for the project.
Significantly more numbers of BHL images are
now available on Flickr (see BHL blog post on tagging over 1 million images)
We launched a selection of 19th century BHL
images in the Zooniverse platform this week (see BHL blog post on Science Gossip)
Science Gossip (sciencegossip.org/)
Purposeful Gaming
*E.xvi c piteI von c. cXx.WptdvonfnrWmn � �bu fbe;bcn.5 am cix bIa S &3rn~ 41X a m � � �cv(f b1air 'o et ert oiensr ; ', : hlrfc c � � � � � �wa ff 4am.diug bist a�6aiw~s ff oJrJtwt nof bL4ecImt& blfafra mem b t wag `wr 4 cn wiu 4 e8t5m.ed bvUratflb ck wuo, ma144'*4I bttE5rmbebt =rt3'kn am4ra tif vrmr Waff C * t6rmnli an `tn ciblatGteaM w ?ffoaifrn �w4wmeu nu weib e , wpiteI voE5teiri ct c ober gtUcr cit cm` 91 cLi biar J ' >bSciatl Oiff ;Bruet �wacfttc n qmcx b1a bl: bt5c lttmtt bb9 lkr w.llr#e iti ncn xoa ff cu :r trtuft *e t B Rn " trv W1Rt' ?� �Cm c blas waIwutr Ober ci ti 1V Ces ' wt �gbtiemwwajfu tpctt, afferain 9 c: b titbfof r f � �eran m rs bra wlg auig4;f aer m *mc vrt �blatcabtfm wfru an'deg~m rt blas Iaum bwWt �run f ncmai b14ianf tJobrrfan ebrut4net vnber Brwt Ober awawi*m.crriii btafwfm uww c on$ 'it ttu wttkc 5,10 $ m~C fca trc* cx u W e &mcyfbq4 Mabtt mmw rc a iiu bc Jcn � �ncI.end.*, blat s. a\ u: rprd3 rw4ftf wm c ii,+ ttCC �tn wa frr9fr orfab fcfbt enb c optiti bt -r9 ceDa ttDcn i34M sn Sem i
OCR Improvements
GamingTranscription
Purposeful gaming and BHL: engaging the public in improving and
enhancing access to digital textsIMLS Grant Program:
National Leadership Grants for Libraries Partners:
Missouri Botanical GardenHarvard UniversityCornell UniversityNew York Botanical Garden
P.I.: Trish Rose-Sandler, Missouri Botanical GardenDates: Dec 2013 – Nov. 2015
Project objectives and benefits
Test new means of crowdsourcing to support the enhancement of content in BHL
Demonstrate if digital games are an effective tool for analyzing and improving digital outputs from OCR and transcription
Benefits of gaming include:improved access to content by providing richer and more accurate
data; an extension of limited staff resources; and exposure of library content to communities who may not know about
the collections otherwise.
Beanstalk and Smorball
Smorballgame.org
Beanstalkgame.org
Purposeful Gaming
Trish Rose-Sandler Principal InvestigatorPurposeful Gaming and BHL
Smorball and Beanstalk were designed as part of the Purposeful Gaming and BHL project, which explores how digital games can make scanned content more accessible and searchable for cultural institutions.
Based at the Missouri Botanical Garden in St. Louis, Missouri, “Purposeful Gaming and BHL” was established in 2013 through an Institute of Museum and Library Services (IMLS) grant and includes partners at Harvard University, Cornell University, and The New York Botanical Garden.
Best Serious Game Award at the Boston Festival of Indie Games
2015
Purposeful Gaming
OCR Improvements
German text interpreted by the OCR process as: “unb auf ben ©elnrgen be6 fublic{)en”
AOCR Improvements
Different resulting texts from parsing the phrase:“und auf den Gebirgen des südlichen Deutschlands”
(“and on the mountains of southern Germany”)
IA OCR OCR 2 Transcription 1
Transcription 2
1 unb und und und Ok
2 den ben den den Ok
3 ©elnrgen ©ebirgen Bebirgen Gebirgen X
4 be6 des de5 des Chk
5 fublic{)en fublichen Füdlichen Südlichen X
6 £)eittfc{)(anb6 Deutfchlanbs Deutfchlands Deutschlands X
http://miningbiodiversity.com/http://miningbiodiversity.org/
Mining Biodiversity
Mining Biodiversity: Enriching Biodiversity Heritage with Text Mining and Social Media
One of the international projects that won in the third round of the 2013 Digging Into Data Challenge
Promote the development of innovative computational techniques to apply into big data in the humanities and social sciencesThe National Centre for Text Mining (UK)Missouri Botanical Garden (US) Dalhousie University's Big Data Analytics Institute (Canada) Social Media Lab (Canada)
MiBIO: Mining Biodiversity
1. Automatic error correction of OCR text errors.
2. Crowdsource annotation of legacy texts with semantic metadata.
3. Adapt text mining techniques to extract terminology, entities and significant events automatically and to track terminology evolution over time.
4. Use Interactive visualization techniques to help users manage search results through next generation browsing capabilities, assisted by a semantic similarity network of important terms and entities.
5. Design of a social media layer, serving as an environment for diverse users to interact and collaborate on science, public education, awareness and outreach.
MiBIO: Mining Biodiversity
Crowdsource Markup
Display text Species Profile Model category
General/summary TaxonBiology
Geographic range Distribution
Habitat Habitat
Food sources and feeding behavior TrophicStrategy
Physical description (general) Description
Physical description (detailed
morphology)
DiagnosticDescription
Thank youWilliam UlateBHL Technical DirectorMissouri Botanical [email protected]: william_ulate_r