archives, algorithms and people
DESCRIPTION
How we put the BBC World Service radio archive online using machines and crowdsourcing. A talk given to the UK Museums on the Web conference, November 2013. One of the major challenges of a big digitisation project is you simply swap out an under-used physical archive for its digital equivalent. Without easy ways to navigate the data there's no way for your users to get to the bits they want. We recently worked with the BBC World Service to generate metadata for their radio archive, 50,000 programmes from over 45 years. First using algorithms to generate "good enough" topics to put the archive online and then using crowd-sourcing to improve the data. Throughout 2013 we have been running this experiment to crowdsource improvements to the metadata that we automatically created. At http://worldservice.prototyping.bbc.co.uk people can search and browse for programmes, listen to them, correct and add new topics. This talk describes how we went about this and what we've learnt with this massive online multimedia archive - about understanding audio, automatically generating topics and crowdsourcing improvements to the data.TRANSCRIPT
![Page 1: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/1.jpg)
Tristan Ferne / @tristanfExecutive Producer
BBC Research & Development
Archives, algorithms and peopleor
How we put the BBC World Service radio archive online using machines and
crowdsourcing
![Page 2: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/2.jpg)
The BBC World Service archive
![Page 3: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/3.jpg)
1947-2012
![Page 4: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/4.jpg)
Spelling mistake
Missing data
Sometimes incorrect dataNo semantic data
The missing metadata
![Page 5: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/5.jpg)
How it works
![Page 6: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/6.jpg)
Listening machines
![Page 7: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/7.jpg)
Noisy transcripts
![Page 8: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/8.jpg)
Algorithms
![Page 9: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/9.jpg)
Algorithms and people
![Page 10: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/10.jpg)
The prototype
![Page 11: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/11.jpg)
worldservice.prototyping.bbc.co.uk
![Page 12: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/12.jpg)
![Page 13: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/13.jpg)
Show Synopsis editing version
![Page 14: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/14.jpg)
![Page 15: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/15.jpg)
![Page 16: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/16.jpg)
worldservice.prototyping.bbc.co.uk
![Page 17: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/17.jpg)
Machine learning
![Page 18: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/18.jpg)
Results
![Page 19: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/19.jpg)
70000tag edits
How much data?
1000synopsis edits
71000edits
36000listenableprogrammes
1mmachine tags
70000programmes
3000users
of programmes listened to36%
of programmes tagged21%
![Page 20: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/20.jpg)
And four lost programmes
![Page 21: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/21.jpg)
Tags are a large and sparse space
When is a tag correct?
When is a programme tagged completely?
How do you measure crowd-sourced data?
How good is the data?
![Page 22: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/22.jpg)
Who does the work?
1 person = 30% of edits
10 people = 70% of edits
10% of people = 98% of edits
![Page 23: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/23.jpg)
The shape of the archive
![Page 24: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/24.jpg)
![Page 25: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/25.jpg)
![Page 26: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/26.jpg)
Places mentioned
![Page 27: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/27.jpg)
Linking from the News
![Page 28: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/28.jpg)
The Last Danish Christmas Broadcast
“Entirely in Danish”
![Page 29: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/29.jpg)
We can significantly improve the data
It’s cost-effective with re-usable technology
A crowdsourcing approach
What we’ve learnt
![Page 30: Archives, algorithms and people](https://reader034.vdocuments.site/reader034/viewer/2022051515/558c98f4d8b42a55618b4649/html5/thumbnails/30.jpg)
How good are the machine tags?
How much crowdsourcing do you need?
When is your data good enough?
Open questions