harpring, getty vocabularies as lod

27
Patricia Harpring, Managing Editor Getty Vocabulary Program American Art Collaborative meeting April 2013 © 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.

Upload: patriciaharpring

Post on 15-Jan-2015

327 views

Category:

Technology


1 download

DESCRIPTION

Patricia Harpring, Managing Editor, Getty Vocabulary Program. Discussion of issues and resolutions regarding the Getty Vocabularies entering the LOD cloud, scheduled in increments 2013-2015. presented at American Art Collaborative meeting, April 2013

TRANSCRIPT

Page 1: Harpring, Getty Vocabularies as LOD

Patricia Harpring, Managing EditorGetty Vocabulary Programy y g

American Art Collaborative meetingApril 2013

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.

Page 2: Harpring, Getty Vocabularies as LOD

The Getty Vocabularies are constructed to allow their use in linked data, but to date little linking

dwas done

All four Getty vocabularies are scheduled to be released as LOD in the coming months

CONA is the first Getty vocabulary to actually be CONA is the first Getty vocabulary to actually be linked to the other three Getty vocabularies

I i li ki Issues in linking

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.1

Page 3: Harpring, Getty Vocabularies as LOD

G tt b l i l ith ti l d Getty vocabularies comply with national and international standards for vocabulary construction ISO and NISOconstruction, ISO and NISO

CCO (Cataloging Cultural Objects) and CDWA (Categories for the Description of Works of Art) standards for art information

Map to RDA and DACS (Library and Archives standards) and other standardss a da ds) a d o e s a da ds

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.2

Page 4: Harpring, Getty Vocabularies as LOD

G th f G tt b l i li Growth of Getty vocabularies relies upon contributions from the expert user community

Getty vocabularies are “social” (contributors are the community) yet “authoritative”

Qualified contributors = repositories of art works, visual resources art libraries other expertsvisual resources, art libraries, other experts

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.3

Page 5: Harpring, Getty Vocabularies as LOD

C t ib ti d i b lk i ib d Contributions are made in bulk via prescribed XML format

Released in XML and Relational Tables, as annual full releases; updated versions every two weeks via Web ServicesWeb Services

We plan to continue the XML and Rels releases h h LOD b d f db k f even when we have LOD; based on feedback from

the existing user community (300-plus license holders))

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.4

Page 6: Harpring, Getty Vocabularies as LOD

• Scope includes generic terms for work types, roles, materials, styles, cultures, techniques, attributes, techniques, attributes, abstract concepts• Current totals

36 114 records; 36,114 records; 244,665 terms

Recent activity: y• Translations in Spanish, Dutch, Chinese,

German, French, Italian, Portuguese• Contributions from the conservation • Contributions from the conservation

community organized by Getty Conservation Institute (GCI)

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.5

Page 7: Harpring, Getty Vocabularies as LOD

AAT is increasingly multilingual: F ll t l ti i S i h f C t d D t ióFull translation in Spanish from Centro de Documentaciónde Bienes Patrimoniales, ChileFull translation in Dutch translation from the Rijksbureau

K thi t i h D t tivoor Kunsthistorische DocumentatieChinese translation by the TELDAP (Taiwan E-Learning and Digital Archives Program) is underway = 8,000 termsg g ) y ,German translation is being undertaken by the Institut fürMuseumsforschung in Berlin

A Portuguese translation will begin soon3,000 French terms from CHIN have been fully integrated; European full French translation is planned

www.getty.edu/research/tools/vocabularies/ write to us: [email protected]

integrated; European full French translation is planned3,000 Italian terms from ICCD

Page 8: Harpring, Getty Vocabularies as LOD

• Scope includes cities, nations, empires, archaeological sites, physical featuresphysical features• 1,241,020 records;

1,799,859 names

Recent activity: • Contributions National Geospacialp

Intelligence Agency (NGA, formerly NIMA) and archaeological sites

• Greece Italy United Kingdom India • Greece, Italy, United Kingdom, India, Mexico, Chile, Egypt, New Zealand, the Netherlands

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.6

Page 9: Harpring, Getty Vocabularies as LOD

• Scope includes artists, architects firms studios architects, firms, studios, patrons, sitters; named and anonymous

• 222,851 records; 581,525 names

[Historical note: ULAN was conceived of by, and initiated under the leadership of, Eleanor Fink, today’s moderator. TGN

l b d h h b

Recent activity:

was also born, and the three vocabs were brought together under one roof under her leadership.]

y• Processing contributions (Grove, ARTstor, others)• ULAN contribution to the Virtual International

Authority File. VIAF is a joint project with the Library f C d i t ti l lib i t of Congress and numerous international libraries to

combine name authority files into a single name authority service

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.7

Page 10: Harpring, Getty Vocabularies as LOD

• Scope includes movable works (e.g., museum objects) and objects) and architecture• CONA is accepting

contributions will contributions, will grow over time• The pilot release

contains sample records• 1,011 records; 1,011 records; • 1,887 titles/names

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.8

Page 11: Harpring, Getty Vocabularies as LOD

Catalog Level (item, group, etc.) • These are the basic fields included Object/Work Type Title or Name Creator

C ti D t

in most museum records• Compliant with CCO and CDWA,

standards for best practice• An OCLC survey of 9 North American Creation Date

Measurements Materials and Techniques Depicted Subject

• An OCLC survey of 9 North American museums for CCO compliance (i.e., CONA) discovered that all participating museums collected all of these fields, except subject (collected by only 2) But Depicted Subject

Current Location Repository number for movable works Sources

except subject (collected by only 2). But users strongly wish to retrieve by subject. How to remedy this? Contributing to CONA hopefully can improve this situationSources improve this situation.

Default values are available for missing required information. E.g., “unavailable” for measurements.

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.9

Page 12: Harpring, Getty Vocabularies as LOD

Simple entity relationship diagram

We are linking b l i

diagram

ULAN

vocabularies to each other

TGNTGNCONA Records

Source Records

AAT

IconographyAuthority

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.10

Page 13: Harpring, Getty Vocabularies as LOD

A critical feature that makes the

subject_id=500013247

that makes the vocabularies useful as authorities is that each vocabulary record

term_id=1500207490vocabulary record is identified by a unique, persistent numeric ID

Terms and controlled lists also each have unique numeric IDs

nat_code=905040

role id 31261numeric IDs role_id=31261

TGN subject id=7006827TGN subject_id 7006827

subject_id=500115332

rel_type_code=1553 12

Page 14: Harpring, Getty Vocabularies as LOD

Another critical feature that makes the vocabularies useful in linking are existing relationshipsare existing relationships

Thesaural relationships (AAT is the prototypical thesuaurus, but all Getty vocabs are thesauri. The examples here are from ULAN.) Equivalence

▪ Sèvres Porcelain Manufactory = Manufacture nationale de Sèvres Hierarchical

▪ Sèvres Porcelain Manufactory is broader context for Eloy Brichard companySèvres Porcelain Manufactory is broader context for Eloy Brichard company Associative

▪ Sèvres Porcelain Manufactory was directed by Robert, Louis-Rémy 1832-1879

Relationships beyond thesaural: Relationships beyond thesaural: Nationality/Culture/Ethnicity; Role; Geographic places; published

sources; contributors Examples are from ULAN – Thesaural and other relationships also exist p p

in TGN, AAT, and CONA

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.13

Page 15: Harpring, Getty Vocabularies as LOD

W l t bli h ll G tt b l i t th LOD l d[Joan Cobb, software architect, was unable to attend today] We plan to publish all Getty vocabularies to the LOD cloud

Implementation project begins July 2013

First phase will focus on publishing vocabulary data as linked data

Subsequent phases will focus on how we use the data (e.g., using it on our own Web sites, collaboration with external sites, harvesting, visualization, etc.) s tes, a vest g, v sua at o , etc.)

Current plan: the data will be published as SKOS-extended format under the ODC-BY 1.0 license

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.14

Page 16: Harpring, Getty Vocabularies as LOD

The majority of the work will be done by our in-house b i d bli h i team, but we intend to establish an open community

and welcome collaboration

h ll i l i ill b i l d d i h Challenges exist: solutions will be included in the release because these are among the critical features that make our thesauri unique

Multilingual data – we already have terms in over 110 different languages and the list is growingg g g g

Sources and contributors at the subject (=record), term, and note levels

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.15

Page 17: Harpring, Getty Vocabularies as LOD

We will begin with AAT released as LOD and then move on to TGN, ULAN d fi ll CONA f l t 2013 th h 2015and finally CONA, from late 2013 through 2015

The sequence was chosen to take advantage of the way the data is connected: AAT is linked to itself TGN pulls from AAT; ULAN from AAT and TGN; CONA from all three TGN pulls from AAT; ULAN from AAT and TGN; CONA from all three Intend to publish our lookup lists (e.g, languages, roles, nationalities, place types,

sources) as linked data Our ontology AAT is based on SKOS and SKOS-XL Our ontology AAT is based on SKOS and SKOS XL We worked with Marcia Zeng to define the mapping and Pedro Szekely from ISI

to develop the ontology TGN and ULAN will use same core approachTGN and ULAN will use same core approach CONA ontology must be in synch with other vocabs, but must be aligned with

other projects such as the American Art Collaborative, Europeana, and Arches (= a project of collaboration between GCI & World Monuments Fund to develop an open source system to inventory immovable cultural heritage)

16

Page 18: Harpring, Getty Vocabularies as LOD

• Nationality/ Culture/ Race/ Ethnicity in

Many links cannot be made automaticallyRace/ Ethnicity in ULAN should be linked to AAT

• Nat list was never actually linked to

Matching ULAN Nat table to AATactually linked to AAT

• Project to match encounters issues e g no match e.g., no match, ambiguous match

• Must be resolved by hand

This

no match

apparent match, but wrong

hessian is a type of burlap

ambiguous match

no match

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.17

Page 19: Harpring, Getty Vocabularies as LOD

Issues in linking CONA

Some matches should be clear and done automatically, Pierre Koenig as artist on this drawing record and the

CONA Editorial SystemCONA Editorial System

Example below is display, we actually match on controlled fields

Pierre Koenig as artist on this drawing record, and the corresponding ULAN record.

CONA Editorial SystemCONA Editorial System

Koenig, Pierre (American architect, 1925-2004) 500086520

Since CONA is linked to the other vocabularies, it is necessary to match incoming values to the AAT, ULAN, TGN, and CONA Iconography Authority when CONA records are processed for , , , g p y y ploading

The CVA/Processor was developed for editors to use if auto-links are not possible Contribution Validation Application (CVA), software architect Gregg Garcia© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.

18© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.

1

Page 20: Harpring, Getty Vocabularies as LOD

• Unresolved auto-matches to other vocabularies are vetted in the CVA/ProcessorEdit h th b l i i CONA CVA• Editor may search other vocabularies in CONA CVA• CVA presents editor with choices for linking

• E.g. below: CONA record has too little info for artist identification to allow an auto-link to Jan Smit. Which one is he? Or maybe none of these, and needs to be addedlink to Jan Smit. Which one is he? Or maybe none of these, and needs to be added on the spot as a stub in ULAN. • New stub records for AAT, ULAN, TGN, or IA, may be added and linked on

the spot, filled in later by editors in other vocabs• For links with a pattern, editor may write a ‘rule’ for CVA

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.19

Page 21: Harpring, Getty Vocabularies as LOD

Issues in linking CONA

• Editor could write a rule in CVR if there is a pattern• In this case, for this particular contribution of European prints, when the

incoming “Place of Publication” contains the value “Amsterdam” we can gassume they always mean Amsterdam in the Netherlands, not Amsterdam, Ohio, or any of the other dozens of Amsterdams in the world.

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.21

Page 22: Harpring, Getty Vocabularies as LOD

Caveats in linking dataOb ti li k d d t i l d th f ll iObservations re. linked data include the following:

(various authors)

In any discipline, LOD (Linked Open Data) is not one cloud, but many, with imperfect links (each cloud has dense internal connections but typically only sparse connections between clouds)

Impact of linkage error is underestimated by developers

The reason why it is hard to avoid linkage error is that humans are needed to ll l k f l kmanually create linkage or to proof auto links

Homogeneity is required to make accurate links, but such homogeneity does not occurnot occur

Even when terminology is standardized, differences in values between corresponding variables cause linking errors (underlying causes: errors

t d d i ll ti f th d t d i th d t t th created during collection of the data, during the data entry, or there are true changes of meaning or application of a particular value)

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.21

Page 23: Harpring, Getty Vocabularies as LOD

Caveats in linking dataOb ti li k d d t i l d th f ll iObservations re. linked data include the following:

(our observations)

• In linking with CONA contributions, we link automatically where possible

• But for uncertain matches, we link by hand

• Even then, mistakes may be made when the incoming data has incorrect references or links

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.22

Page 24: Harpring, Getty Vocabularies as LOD

Caveats in linking data

Auto matches or by hand? Data is not an exact match Better to err on side of caution than make an incorrect link These are the same person, but conflicting data means a human

must confirm

Alvarez Algeciras, Germ$00an (Spanish painter, exhibited 1871-1878) 500035166

birth: 1831 death: 1878

$00Alvarez de Algeciras y Jimenez, German(Spanish artist, 1848-; fl. bef. 1878) 500298289

birth: 1831 death: 1878

birth: 1848 death: 18784 7

Names not exact match to algorithm-Matching based on fielded data, here display for ease of illustration-Painter not = artist, tables allow match if all else matches

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.23

,-Birth date not match, one estimated other is exact

Page 25: Harpring, Getty Vocabularies as LOD

• Errors in linking that is in the incoming data may be caught, but in general must rely upon contributors’ accuracymust rely upon contributors accuracy• To which person should the link be made in ULAN?

ULAN recordULAN record

• Contributed record was linked to LOC record for travel writer “Lazowski”• But should be the French revolutionary

Inscribed title: Inauguration du buste de Marat au tombeau qui été élevé pour sa gloire et celle de Lazowski, place de la Réunion a Paris, l'an 2 de la Re p. Franc. une et indivisible /© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.

24

Page 26: Harpring, Getty Vocabularies as LOD

Introduction to Introduction to Controlled Vocabularies

Ebook or paperback available at www.getty.edu

Added a section on LOD in the revised edition

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.25

Page 27: Harpring, Getty Vocabularies as LOD

Patricia HarpringPatricia HarpringPatricia HarpringPatricia HarpringManaging Editor Managing Editor Getty Vocabulary ProgramGetty Vocabulary Programy y gy y g

1200 Getty Center Drive1200 Getty Center Drive1200 Getty Center Drive1200 Getty Center DriveLos Angeles, CA 90049Los Angeles, CA 90049

310/440310/440 63536353310/440310/[email protected]@getty.edu

© 2013 J. Paul Getty Trust, author: Patricia Harpring .For educational purposes only. Do not distribute.26