d. bretherton, d. a. smith, j. lambert, mc schraefel. musicnet: aligning musicology’s metadata
DESCRIPTION
David Bretherton, Daniel Alexander Smith, Joe Lambert and mc schraefel (Music, and Electronics and Computer Science, University of Southampton). Music Linked Data Workshop, 12 May 2011, JISC, London.TRANSCRIPT
![Page 1: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/1.jpg)
Music Linked Data Workshop
12 May 2011 • JISC, London
MusicNet: Aligning Musicology’s Metadata
David Bretherton (Music), Daniel Alexander Smith, Joe Lambert and mc schraefel (Electronics and
Computer Science)
http://musicnet.mspace.fm
![Page 2: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/2.jpg)
David Bretherton
2
![Page 3: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/3.jpg)
musicSpace, the precursor to MusicNet
3
![Page 4: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/4.jpg)
Problem
4
![Page 5: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/5.jpg)
Digitised data is often ‘siloed’.
Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by: – Media type (text, image, audio,
video)– Date of creation/publication– Subject
5
![Page 6: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/6.jpg)
Digitised data is often ‘siloed’.
Geographical dispersal has been replaced by virtual dispersal on the web. Data is now segregated into countless online repositories by: – Language– Copyright holder– Ad hoc/insecure nature of project
funding
6
![Page 7: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/7.jpg)
Digitised data is often ‘siloed’.
Interoperability has generally not been given a high enough priority.
And, because the datasets are ‘mature’ the data isn’t Linked Data.
7
![Page 8: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/8.jpg)
Solution
8
![Page 9: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/9.jpg)
9
‘musicSpace’ is a faceted browser
![Page 10: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/10.jpg)
10
Demonstration
‘What recording of works by Cage exist, which performers have recorded a particular work by Cage, and what else by Cage have they recorded?
Screencast 1:
http://www.youtube.com/watch?v=keTN12OWies&hd=1
![Page 11: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/11.jpg)
How musicSpace provided the motivation for MusicNet
11
![Page 12: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/12.jpg)
Problem: you can align metadata fields, but this doesn’t align the data in those fields
12
Schubert Schubert, Franz Schubert, Franz Peter Shu-po-tʻe, ‡d 1797-1828 Schubert ‡d 1797-1828 F. P. Schubert Schubert, ... ‡d 1797-1828 Schubert, F. Schubert, F. ‡d 1797-1828 Schubert, Fr. Schubert, Fr. ‡d 1797-1828 Schubert, Franciszek. Schubert, Franc. ‡d 1797-1828 Schubert, Francois ‡d 1797-1828 Schubert, Franz P. ‡d 1797-1828
Schubert, Franz Peter Schubert, Franz Peter, ‡d 1797-1828 Schubert, Franz Peter ‡d 1797-1828 Schubert, Francois, ‡d 1797-1828 Schubert. Schubert ‡d 1797-1828 Shu-po-tʿe ‡d 1797-1828 Shubert, F. (Frant $s% ) ‡d 1797-1828 Shubert, F. ‡q (Frant $s% ), ‡d 1797-1828 Shubert, Frant $s% , ‡d 1797-1828 Shubert, Frant $s% ‡d 1797-1828 Shūberuto, F. Shūberuto, Furantsu ‡d 1797-1828 Subert, Franc ‡d 1797-1828 Subertas, F. (Francas), ‡d 1797-1828
Subertas, Francas Peteris, 1797-1828‡d Subert, F.
, .Subertas F ‡d 1797-1828 פרנץ, שוברט
シューベルト, F., 1797-1828 シューベルト , フランツ ‡d 1797-1828 舒柏特 , 弗朗茨 Schubert, Francois 1797-1828‡d
, Schubert Franz Peter 1797-1828‡d
![Page 13: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/13.jpg)
Causes of ‘dirty’ data (for names)
Different naming conventions;– e.g. ‘Bach, Johann Sebastian’ or ‘J. S. Bach’
Inclusion of non-name data in name field; – e.g. ‘Schubert, Franz, 1797-1828. Songs’,
or ‘Allen, Betty (Teresa)’
Different languages (and alphabets);
User input errors. – e.g. ‘Bach, Johhan Sebastien’
13
![Page 14: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/14.jpg)
Dirty data degrades the user experience
14
Searching for compositions by the composer Franz Schubert (1797–1828)...
Screencast 2:
http://www.youtube.com/watch?v=pFsYfz1vlAg&hd=1
![Page 15: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/15.jpg)
MusicNet’s alignment tool
15
![Page 16: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/16.jpg)
Prototype 1 (musicSpace era)
16
![Page 17: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/17.jpg)
Used Alignment API & Google Docs
We used Alignment API to compare the names as strings, using WordNet to enable word stemming, synonym support, etc.
Alignment API produces a similarity measure for each possible match.
We planned to set a threshold for automatic approval.
Matches below that threshold would be sent to a Google Docs spreadsheet for expert review.
17
![Page 18: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/18.jpg)
Shortcoming: no threshold
False matches with high similarity measures:
True matches with low similarity measures:
18
![Page 19: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/19.jpg)
Prototype 2 (building a custom tool
for MusicNet)
19
![Page 20: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/20.jpg)
Design considerations
From Prototype 1:– A completely automated solution is out of the
question (for the moment...). – We needed a custom tool with a human-friendly UI
(we also wanted keyboard shortcuts for speed).– Access to additional metadata (i.e. context), so
matches can be researched by the reviewer.
From experience with faceted browsers: – Alphabetically sorted columns enable one to spot
synonymous names at a glance.· Normally sources give names surname first; duplication
arises from the different representation of given names.
20
![Page 21: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/21.jpg)
Alignment process Data*
21
Suggested groups
Algorithm compares hash of alpha-only l.c. version of name
No groups suggested
User verified* or rejected*
Synonym groups
Manual grouping (research*)
URIs Alternative names Back links*
![Page 22: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/22.jpg)
UI of Prototype 2
22
![Page 23: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/23.jpg)
Prototype 2 demo
23
Screencast 3:
http://www.youtube.com/watch?v=5f8iaryZMk0&hd=1
![Page 24: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/24.jpg)
Daniel Alexander Smith
24
![Page 25: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/25.jpg)
Linked Data
25
URI for everything
e.g. Beethoven is:– http://musicnet.mspace.fm/person/367b10
7e07a7f9db8aed7c72d2ebeab2#id– http://dbpedia.org/resource/Ludwig_van_B
eethoven– http://www.bbc.co.uk/music/artists/1f9df1
92-a621-4f54-8850-2c5373b7eac9#artist
![Page 26: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/26.jpg)
Contribution
26
MusicNet provides links between composers in multiple scholarly repositories
We also link to MusicBrainz and BBC /music
This can be fed back into projects like musicSpace where disambiguation is a problem
![Page 27: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/27.jpg)
27
![Page 28: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/28.jpg)
MusicNet Published Data
28
Links between multiple URIs
Representations from each source
Machine-readable, standardised to build applications over this data
Human searchable and usable too
http://musicspace.mspace.fm
![Page 29: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/29.jpg)
29
![Page 30: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/30.jpg)
30
![Page 31: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/31.jpg)
Provenance
31
Retains source of information
e.g. that Grove say “Schubert, Franz (Peter)” and British Library say “Schubert, Franz” and “Schubert”
![Page 32: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/32.jpg)
Provenance
32
When they don’t exist already, musicnet provides individual URIs for a composer from each source, e.g.:– http://musicnet.mspace.fm/person/7ca5e1
1353f11c7d625d9aabb27a6174#blcollection
Then links back to search URLs, e.g.:– http://catalogue.bl.uk/F/?
func=find-b&request=Schubert%2C+Franz&find_code=WNA
![Page 33: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/33.jpg)
33
![Page 34: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/34.jpg)
34
![Page 35: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/35.jpg)
Links from BBC /music
35
Harvested links from BBC to:– DBPedia– New York Times– IMDB– PBS– etc.
![Page 36: D. Bretherton, D. A. Smith, J. Lambert, mc schraefel. MusicNet: Aligning Musicology’s Metadata](https://reader035.vdocuments.site/reader035/viewer/2022070316/555c1795d8b42ad27e8b5444/html5/thumbnails/36.jpg)
36
Thank you for listening!