keynote: unexpected repurposing

55
Unexpected Repurposing: the British Library's digital collections and UCL teaching, research and infrastructure Professor Melissa Terras Professor of Digital Humanities, UCL Dept of Information Studies Director, UCL Centre for Digital Humanities [email protected], @melissaterras

Upload: labsbl

Post on 11-Jan-2017

34 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Keynote: Unexpected repurposing

Unexpected Repurposing: the British Library's digital collections and UCL teaching, research and infrastructure

Professor Melissa TerrasProfessor of Digital Humanities, UCL Dept of Information StudiesDirector, UCL Centre for Digital [email protected], @melissaterras

Page 2: Keynote: Unexpected repurposing
Page 3: Keynote: Unexpected repurposing
Page 4: Keynote: Unexpected repurposing

#openglam

Page 5: Keynote: Unexpected repurposing

British Library, 28th May 2008. https://web.archive.org/web/20110707135434/http://pressandpolicy.bl.uk/Press-Releases/The-British-Library-19th-Century-Book-Digitisation-Project-343.aspx

Returned to library in 2012, placed under a CCO-Public domain license for commercial and non-commercial use.

Page 6: Keynote: Unexpected repurposing
Page 7: Keynote: Unexpected repurposing
Page 8: Keynote: Unexpected repurposing
Page 9: Keynote: Unexpected repurposing
Page 10: Keynote: Unexpected repurposing

Optically Character Recognised (OCR) generated TextScanned Page

Page 11: Keynote: Unexpected repurposing

OCR XML Generated by ABBY Fine Reader

Page 12: Keynote: Unexpected repurposing
Page 13: Keynote: Unexpected repurposing

https://www.flickr.com/photos/britishlibrary

Page 14: Keynote: Unexpected repurposing

Image on Flickr Commons

https://goo.gl/AC43vs

Page 15: Keynote: Unexpected repurposing
Page 16: Keynote: Unexpected repurposing

http://blpublicdomain.wikispaces.com/home

Page 17: Keynote: Unexpected repurposing
Page 18: Keynote: Unexpected repurposing

https://historicaltexts.jisc.ac.uk/results?filter=service%7C%7Cbl&tab=date

Page 19: Keynote: Unexpected repurposing

Data: what can we do with 65,000 books?

224GB compressed ALTO XML

Page 20: Keynote: Unexpected repurposing
Page 21: Keynote: Unexpected repurposing
Page 22: Keynote: Unexpected repurposing

http://www0.cs.ucl.ac.uk/staff/D.Mohamedally/

Page 23: Keynote: Unexpected repurposing

Staff and Students, working together

• James Baker,  Adam Farquhar• Melissa Terras,  Dean Mohamedally,  Tim

Weyrich,• Stefan Alborzpour,  Stelios Georgiou,  Nektaria

Stavrou,  Wendy Wong,  Jonathan Lloyd,  Meral Sahin,  Divya Surendran,  James Durrant,  Muhammad Rafdi,  Ali Sarraf

Page 24: Keynote: Unexpected repurposing
Page 25: Keynote: Unexpected repurposing

Approach

• How can we search the dataset differently?• Complex and multifaceted needs of humanities

researchers• Boolean and Advanced Search• Microsoft Azure 5 APIs were implemented that

functionally scale to the data • Offering unconventional services such as bulk

download of text based on metadata queries, word frequency lists, and OCR text previews.

Page 26: Keynote: Unexpected repurposing
Page 27: Keynote: Unexpected repurposing
Page 28: Keynote: Unexpected repurposing
Page 29: Keynote: Unexpected repurposing
Page 30: Keynote: Unexpected repurposing

github.com/BL-publicdomain/blpublicdomain

Page 31: Keynote: Unexpected repurposing
Page 32: Keynote: Unexpected repurposing

picaguess.herokuapp.com, dx.doi.org/10.5281/zenodo.15980

James Baker, Tim Weyrich, Dean MohamedallyJonathan Lloyd, Meral Sahin,Divya Surendran

Page 33: Keynote: Unexpected repurposing

http://blbigdata.herokuapp.com/James Baker, Tim Weyrich, Dean Mohamedally,

Ali Sarraf, James Durrant, Muhammad Rafdi

Page 34: Keynote: Unexpected repurposing
Page 35: Keynote: Unexpected repurposing
Page 36: Keynote: Unexpected repurposing

github.com/UCL-dataspring

Page 37: Keynote: Unexpected repurposing
Page 38: Keynote: Unexpected repurposing

Method

• 65k books from the British Library:• 17th - 19th century• 224GB compressed ALTO XML• UCL High Performance Computing• Support from RITS and UCLDH• 4 humanities researchers• Turn research questions into computational

queries• Learn from the researchers about their needs,

wants, desires, and method.

Page 39: Keynote: Unexpected repurposing

Results

Page 40: Keynote: Unexpected repurposing

Taking Humanities data to HPC…

https://www.flickr.com/photos/epublicist/3546059144

Page 41: Keynote: Unexpected repurposing
Page 42: Keynote: Unexpected repurposing

Case Study 1: History of Medicine, Oliver Duke-Williams, UCL

Page 43: Keynote: Unexpected repurposing

Case Study 2: History of Images, Will Finley, Sheffield

Page 44: Keynote: Unexpected repurposing

What did this tell us?

• Best practice recommendations:– Derived datasets for home use– Documentating decisions– Fixed/defined dataset– Normalisations

Page 45: Keynote: Unexpected repurposing
Page 46: Keynote: Unexpected repurposing
Page 47: Keynote: Unexpected repurposing
Page 48: Keynote: Unexpected repurposing

Common Queries

• searches for all variants of a word • searches that return keywords in context traced

over time • NOT searches for a word or phrase that ignored

another word or phrase • searches for a word when in close proximity to a

second word • searches based on image metadata …. All returned in a derived dataset, in context.

Page 49: Keynote: Unexpected repurposing

Do try this at home…

1. Invest in research software engineer capacity to deploy and maintain openly licensed largescale digital collections from across the GLAM sector in order to facilitate research in the arts, humanities and social and historical sciences

2. Invest in training library staff to run these initial queries in collaboration with humanities faculty, to support work with subsets of data that are produced, and to document and manage resulting code and derived data.

Page 50: Keynote: Unexpected repurposing
Page 51: Keynote: Unexpected repurposing
Page 52: Keynote: Unexpected repurposing

github.com/UCL-dataspring

Page 53: Keynote: Unexpected repurposing
Page 54: Keynote: Unexpected repurposing

With thanks to

• BL Labs and Digital Curators: James Baker, Adam Farquhar, Mahendra Mahey, Ben O’Steen, Hana Lewis

• UCL CS Student Project Team: James Baker, Tim Weyrich, Dean Mohamedally

• Bluclobber Project Team: James Baker, James Hetherington, David Beavan, Anne Welsh, Helen O’Neill, Will Finley, Oliver Duke-Williams, Adam Farquhar.

• UCL Research IT Services: James Hetherington, Clare Gryce, Raquel Algere.

Page 55: Keynote: Unexpected repurposing