bl labs 2014 symposium: the mechanical curator
TRANSCRIPT
![Page 1: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/1.jpg)
Building Bridges(and rapid depreciation)
![Page 2: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/2.jpg)
![Page 3: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/3.jpg)
![Page 4: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/4.jpg)
David Foster Wallace, on Ambition:
“You know, the whole thing about perfectionism. The perfectionism is very dangerous, because of course if your fidelity to perfectionism is too high, you never do anything.
Because doing anything results in— It’s actually kind of tragic because it means you sacrifice how gorgeous and perfect it is in your head for what it really is.”- As told to Leonard Lopate on WNYC on March 4, 1996.
(emphasis my own)http://blankonblank.org/interviews/david-foster-wallace-on-ambition/
![Page 5: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/5.jpg)
The unifying theme to (pretty much) all the requests:
![Page 6: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/6.jpg)
The unifying theme to (pretty much) all the requests:
Give me EVERYTHING!
![Page 7: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/7.jpg)
The unifying theme to (pretty much) all the requests:
Give me EVERYTHING!
(that might be important to my work)
![Page 8: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/8.jpg)
Fetch!
![Page 9: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/9.jpg)
Why?“Can’t they just find the things they want
through the catalogue?”
![Page 10: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/10.jpg)
1. If they knew which bits of data were necessary,
they would already know the answers.
![Page 11: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/11.jpg)
![Page 12: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/12.jpg)
“I am interested in
travel accounts in
Europe during the 19th Century”
![Page 13: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/13.jpg)
2. If a conventional search interface worked, they wouldn’t be asking.
![Page 14: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/14.jpg)
How does conventional search work anyway? Under what assumptions?
Starts with the Text:
“I quickly explained that many big jobs involve a few hazards.”
![Page 15: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/15.jpg)
How does conventional search work anyway? Under what assumptions?
Then it is Tokenised (with some assumptions on how this is possible):
“I”, “quickly”, “explained”, “that”, ”many”, “big”, “jobs”, “involve”, “a”, “few”, “hazards”
![Page 16: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/16.jpg)
How does conventional search work anyway? Under what assumptions?
Then, the most common words are removed as these are assumed to be unimportant. (Stopwords)
“quickly”, “explained”, ”many”, “big”, “jobs”, “involve”, “few”, “hazards”
![Page 17: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/17.jpg)
How does conventional search work anyway? Under what assumptions?
Many fulltext search services will also perform language-specific Stemming, that is, to reduce each word to a root:
“quick”, “explain”, ”many”, “big”, “job”, “involve”, “few”, “hazard”
(Lookup ‘porter’ and ‘snowball’ stemmers for more.)
![Page 18: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/18.jpg)
How does conventional search work anyway? Under what assumptions?
Finally, an inverse-index is created* and arranged with the assumption that you want to find the most Relevant results to future queries.
Search terms are passed through the same workflow.
(*Contemporary search engines are more complex of course, but the basics are still there.)
![Page 19: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/19.jpg)
Why on earth did I teach you about search?
All services are made with compromises and assumptions, and it is good to examine these from time to time.
The key assumption is that people will search for the most Relevant record that matches the text they entered.
![Page 20: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/20.jpg)
The most Relevant record that matches the text they entered.
![Page 21: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/21.jpg)
Why not:
All the works that likely cover a specific topic I define or fit an arbitrary algorithm I can supply.
![Page 22: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/22.jpg)
“That’s great and all but it’s all subjective; you
can’t teach a computer that…”
![Page 23: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/23.jpg)
http://www.robertelliottsmith.com/?p=530
![Page 25: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/25.jpg)
“I am interested in
travel accounts in
Europe during the 19th Century”
![Page 26: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/26.jpg)
2013 Competition winnershttp://labs.bl.uk/Ideas+for+Labs
Pieter Francois
![Page 27: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/27.jpg)
![Page 28: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/28.jpg)

![Page 29: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/29.jpg)

![Page 30: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/30.jpg)
2013 Competition winnershttp://labs.bl.uk/Ideas+for+Labs
Dan Norton - “Mixing the Library. Information Interaction and the DJ”
Can a researcher record a session drawing from digital objects, in the same way a DJ does with music tracks?
![Page 31: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/31.jpg)
The other unifying themes to the requests:
“I need tools to help me interpret the vast amount of content you hold. You don’t provide any but make it impossible for others to do so.”
“I want to work on broad sweeps of content, rather than book-by-book. It would take too much time to get each one.”
“API? what’s that? I don’t care. Just give me the files.”
![Page 32: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/32.jpg)
So, a challenge was born…
If a researcher is given direct file access to a large amount of data, can it be useful?
What internal conventions would need to be removed? What external conventions added?
One way to try it out, was to pretend to be a researcher and to ‘eat our own dogfood’.
![Page 33: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/33.jpg)
How has the depiction of faces changed in books over the 19th Century?
aka how well does modern photographic face detection routines work on 19th C
illustrations?
![Page 34: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/34.jpg)
![Page 35: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/35.jpg)
Success? Not really.
Many more female faces were found than male.
This did not mean that there are more images of women in the books than men!
![Page 36: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/36.jpg)
19C depictions of faces
• Often drawn more symmetrically - male faces were more likely to be exaggerated.
• Depiction is typically 'clean' and posed• Fashion: beards, spectacles and hats - different
to the modern photographic training data
![Page 37: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/37.jpg)
There was something else though...
People on their way past would occasionally pause and look over my shoulder.
Every day it dug up illustrations that surprised me and the team around me.
So… I wondered if anyone else might be surprised and intrigued by them too?http://mechanicalcurator.tumblr.com/archive
![Page 38: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/38.jpg)
![Page 39: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/39.jpg)
![Page 40: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/40.jpg)
![Page 41: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/41.jpg)
How does machine learning work?
First, turn the raw data into numbers, something the computer can deal with:
eg when analysing text, assign a number to each word and form a ‘dictionary’
![Page 42: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/42.jpg)
How does machine learning work?
Process the numeric data in an effort to better expose the “important” information
- removing noise and tone variation from an image
- turning a grid of pixels into independant trackable ‘points of interest’
- hue, saturation, levels- produce metrics
![Page 43: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/43.jpg)
How does machine learning work?
Annotate - manually or automatically - what is useful and what is not in a portion of the data:
- Characteristics:- Spam or not?- Face at x,y,w,h- Positive, neutral and negative sentiment
- Scalar qualities
![Page 44: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/44.jpg)
How does machine learning work?
Pass most of the ‘known’ data through one of many machine learning algorithms, such as a Scalable Vector Machine (SVM) as implemented in libsvm.
Which one depends entirely on what the computer will be able to do once trained.
![Page 45: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/45.jpg)
How does machine learning work?
Test your trained machine with half of the rest of the data to see how it does.
eg if characterising email, does it correctly spot Spam?
![Page 46: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/46.jpg)
How does machine learning work?
Now, use the trained profile on real data!
Sometimes, these profiles are shared, for example, Haar cascades trained on photographic datasets (face, body, etc) are freely available
![Page 47: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/47.jpg)
Why the second lesson?
Analysis starts with a bulk set of data, and a set of assumptions and ideas.
The usefulness of a stemming/tokenising search service is unquestioned and Libraries support metadata-level search.
No-one can support all assumptions and ideas!
![Page 48: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/48.jpg)
![Page 49: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/49.jpg)
Surprising? It was an experiment, after all...
![Page 50: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/50.jpg)
Accessible?
• In theory, the books were accessible.
• In practice, it was a real challenge to find anything viewable.
The chasm between digital and print:http://samplegenerator.cloudapp.net
![Page 51: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/51.jpg)
As this is all in the public domain anyway...
What’s the harm in making it a bit more accessible?
The Mechanical Curator twitter account has only got a handful of people following it after all. Maybe there isn’t much appetite for it?
![Page 53: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/53.jpg)
Impact?
Hard to measure:
- 20 million hits on average every month, over 200 million in 10 months*.
- Over 100,000 tags added.- Hundreds of contributors.- Iterative crowdsourcing is ongoing.- Peter Balman’s aforementioned project
* Are image view stats really a good measure?
![Page 54: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/54.jpg)
![Page 56: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/56.jpg)
![Page 57: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/57.jpg)
Research and Technology
• Mario Klingemann Pattern Recognition Software• Collaborative PhD ‘A History of the Printed Image 1750-1850: Applying
Data Science Techniques to Printed Book Illustration’• TSB Digitial Innovation Contest New tech for tracking Public Domain in
the Wild
![Page 58: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/58.jpg)
Crowdsourcing & Apps
• Metadata Games• Wikipedia Synoptic Index• BL Georeferencer - 3221 maps referenced in a few weeks!
![Page 61: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/61.jpg)
Creative Uses
• David Normal installation at Burning Man Festival• “Moments” by Joe Bell • Colouring-in Pages for Children
![Page 62: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/62.jpg)
Tutorials
• Using Photoshop to Up-res images• Converting images to vector graphics
![Page 63: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/63.jpg)
Collaborations with Colleagues
• Inspired by Flickr, a Sound Archive series • Maps will be fed into the next phase of the Georeferencer
![Page 64: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/64.jpg)
Education
• Images included in Wikipedia Articles• University of Minnesota English Literature Course Exercise on Tagging• Art Therapy Courses
![Page 66: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/66.jpg)
The ‘British Library Big Data Experiment’
http://britishlibrary.typepad.co.uk/digital-scholarship/2014/06/the-british-library-big-data-experiment.html
“What can a group of UCL Big Data CS students do when given access to cloud computing, all of the book data and a focus group of digital humanists?”
![Page 67: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/67.jpg)
The ‘British Library Big Data Experiment’
Next phase will work with an undergraduate team with experience at image analysis.
We are hosting an event on the 18th of December 2014, on “Pattern Recognition”.
![Page 68: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/68.jpg)
![Page 69: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/69.jpg)
In summary, “Clarity”
It is clear that we can:fail and fail quickly
build experiments thatwon’t last
open content
build bridges
![Page 70: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/70.jpg)
My contact details for later technical questions:
[email protected]@benosteen
Links:http://labs.bl.ukhttp://mechanicalcurator.tumblr.com https://flickr.com/photos/britishlibraryhttps://github.com/bl-labshttp://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html
![Page 71: BL Labs 2014 Symposium: The Mechanical Curator](https://reader030.vdocuments.site/reader030/viewer/2022032616/55a3f6c91a28abe0718b47eb/html5/thumbnails/71.jpg)
Image credits:
Title image: from https://www.flickr.com/photos/britishlibrary/11223645575Title: "The Book of The Grand Junction Railway, being a history and description of the line from Birmingham to Liverpool and Manchester ... By T. Roscoe, assisted by the resident engineers of the line"Author: Roscoe, Thomas.Shelfmark: "British Library HMNTS 796.f.3."
https://www.flickr.com/photos/britishlibrary/11209677645 - Foot Bridge, Dartmoor
https://www.flickr.com/photos/britishlibrary/11208502325 - The Suspension Bridge
https://www.flickr.com/photos/britishlibrary/11234482436 - Wensleydale & Swaledale
Image taken from page 97 of 'The Mineral Baths of Bath. The Bathes of Bathe's Ayde in the reign of Charles 2nd as illustrated by a drawing of the King's and Queen's Bath, signed 1675. Whereunto is annexed a Visit to Bath in the year 1675 by “A Person of Q" by The British Library (More from this book here: https://www.flickr.com/search/?tags=sysnum000878624)
Image taken from page 467 of '[The History of New South Wales, including Botany Bay, Port Jackson, Pamaratta [sic], Sydney, and all its dependancies ... with the customs and manners of the natives, and an account of the English colony, from its foundation https://www.flickr.com/photos/britishlibrary/11001417405
http://britishlibrary.typepad.co.uk/digital-scholarship/2013/10/peeking-behind-the-curtain-of-the-mechanical-curator.html