subject metadata subject analysis subject analysis: the process of ascertaining the “aboutness”...

138
Subject Metadata

Upload: beatrice-brooks

Post on 25-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Subject Metadata

Subject Analysis• SUBJECT ANALYSIS: The process of ascertaining the “aboutness”

of a document by describing its topic, the discipline in which the topic is treated, and the form of the document.

• Discipline: An area or a branch of knowledge. The discipline is distinct from the thing being studied by the discipline. A broad field of inquiry; the context in which any subject is treated

• Subject (Phenomena): Broadly, the things studied by disciplines• Form: What the document is rather than what it contains’

– Intellectual: method by which the document has been compiled: history, biography, textbook, Festschrift

– Presentation: manner in which subject content has been organized. Statistical compilation

– Physical form: Structure of the document as an artefact. Book, video.

Definitions

• Subject analysis is the part of indexing or cataloging that deals with– the conceptual analysis of an item: what is it

about? what is its form/genre/format?– translating that analysis into a particular

subject heading system

• Subject heading: a term or phrase used in a subject heading list to represent a concept, event, or name

Types of concepts to identify

• Topics

• Names of:– Persons– Corporate bodies– Geographic areas

• Time periods

• Titles of works

• Form of the item

Subjects vs. forms/genres

• Subject: what the item is about• Form: what the item is, rather than what it is

about– Physical character (video, map, miniature book)– Type of data it contains (statistics)– Arrangement of information (diaries, indexes)– Style, technique (drama, romances)

• Genre: works with common theme, setting, etc.– Mystery fiction; Comedy films

What is a Controlled Vocabulary?

• From Wikepedia: A controlled vocabulary is a carefully selected list of words and phrases … The terms are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document’s text.

Controlled Vocabularies: Subject Heading lists vs. Thesauri

• Thesauri• Created largely in

indexing communities• Made up of single terms

and bound terms representing single concepts (usually called descriptors). Bound terms occur when some concepts can only be represented by two or more words (e.g. Type A Personality)

• Subject heading lists• Created largely in library

communities• Consist of phrases and

other precoordinated terms in addition to single terms

Controlled Vocabularies: Subject Heading lists vs. Thesauri

• Thesauri• More strictly hierarchical.

Because they are made up of single terms, each term usually has only one broader term

• Narrow in scope. Usually made up of terms from one specific subject area

• More likely to be multilingual. Because single terms used, easier to maintain in multiple languages

• Subject heading lists• Not strictly hierarchical.

Some headings may have no broader and/or narrower terms

• More general in scope, covering a broad subject area, or the entire scope of knowledge

• Usually not multilingual

Translating key words & concepts into controlled vocabulary

• Controlled vocabulary– Thesauri (examples)

• Art & Architecture Thesaurus (AAT)• Thesaurus for Graphic Materials I: Subject Terms (TGMI)• Thesaurus for Graphic Materials II: Genre and Physical

Characteristic Terms (TGMII)• Thesaurus of Geographic Names (TGN)

– Subject heading lists (examples)• Library of Congress Subject Headings (LCSH)• Sears List of Subject Headings• Medical Subject Headings (MeSH)

Keywords vs. Controlled Terms

• System should allow for both

• Keywords give access using “non-standard” terms

• Keywords include terms not yet in vocabularies; places or names not indexed

Drawbacks to Controlled Vocabulary

• Time to assign = $$

• Need for trained catalogers = $$

• Time lag to add relevant terms

• Time lag to delete outdated terms– … so use both keywords and controlled terms

Why use controlled vocabulary?

Controlled vocabularies:

• identify a preferred way of expressing a concept

• allow for multiple entry points (i.e., cross-references) leading to the preferred term

• identify a term’s relationship to broader, narrower, and related terms

Function of keywords

Advantages: • provide access to the words used in

bibliographic recordsDisadvantages:• cannot compensate for complexities of

language and expression• cannot compensate for contextKeyword searching is enhanced by

assignment of controlled vocabulary!

Vocabulary Control

• Vocabulary control is used to improve the effectiveness of information storage and retrieval systems, Web navigation systems, and other environments that seek to both identify and locate desired content via some sort of description using language. The primary purpose of vocabulary control is to achieve consistency in the description of content objects and to facilitate retrieval.

Need for Vocabulary Control • The need for vocabulary control arises from two basic features of

natural language

• Two or more words or terms can be used to represent a single concept– Example:

• salinity/saltiness• VHF/Very High Frequency

• Two or more words that have the same spelling can represent different concepts – Example:

• Mercury (planet)• Mercury (metal)• Mercury (automobile)• Mercury (mythical being)

Principles of Controlled Vocabularies

• There are four important principles of vocabulary control that guide their design and development.

• eliminating ambiguity

• controlling synonyms

• establishing relationships among terms where appropriate

• testing and validation of terms

Ambiguity• Ambiguity occurs in natural language when a word or phrase (a homograph

or polyseme) has more than one meaning.

• A controlled vocabulary must compensate for the problems caused by ambiguity by ensuring that each term has one and only one meaning

Synonymy

• A different problem occurs when a concept can be represented by two or more synonymous or nearly synonymous words or phrases. This is called synonymy. This means that desired content may be scattered around an information space or database because it can be described by different but equivalent terminology

• A controlled vocabulary must compensate for the problems caused by synonymy by ensuring that each concept is represented by a single preferred term. The vocabulary should list the other synonyms and variants as non-preferred terms with USE references to the preferred term.

Type of vocabulary control

Controlled Lists

A list is a simple group of terms Example:

AlabamaAlaskaArkansasCaliforniaColorado. . . .

Frequently used in Web site pick lists and pull down menus

What are these?

• Flying Horse • King Fisher • Royal Challenge

-- The meaning is not clear.

-- Need to eliminate ambiguity

What are these?

• Flying Horse• King Fisher• Royal Challenge• Heineken• Budweiser• Miller-lite• Bud-light

Drinks

• Flying Horse• King Fisher• Royal Challenge• Taj Mahal• Hayward’s 2000• Heineken• Corona• Budweiser• Miller-lite• Bud-light

Synonym Rings

A synonym ring is a list of synonyms or near synonyms that are used interchangeably for retrieval purposes

Synonym Rings-- Examples

Synonym rings are usually found as sets of lists that allow users to access all content containing any of the terms.

e.g., cholesterol:

CholesterolBlood CholesterolSerum CholesterolGood CholesterolBad CholesterolLDL . . .

Synonym rings are used …

• Synonym rings are used to expand queries for content objects. – If a user enters any one of these

terms as a query to the system, all items are retrieved that contain any of the terms in the cluster.

An example from International SEMATECH; a search for Silicon would look like this:

Synonym rings are used …

• Synonym rings are often used in systems where the underlying content objects are left in their unstructured natural language format, – the control is achieved through the interface

by drawing together similar terms into these clusters.

• Synonym rings are used in conjunction with search engines and provide a minimal amount of control of the diversity of the language found in the texts of the underlying documents.

Search: Tilenol, Result: Tylenol

Synonym rings can be used for assigning keywords in metadata fields

IBM Homepage source code:<meta name="Keywords" content="ibm,

international business machines, internet, e-business, ebusiness, e-business on demand, ebusiness on demand, on demand, ibm on demand, on demand business, on demand enterprise, on demand services, ondemand, on-demand, personal computer, personal system, e-commerce, ecommerce, pc, workstation, mainframe, unix, linux, technical support, homepage, home page"/>

Where to find synonyms

Search logsDictionariesExisting authority files

LC Name Authority File (NAF)The Union List of Artist Names (ULAN)The Getty Thesaurus of Geographic Names

(TGN)

Lexical databases, e.g., WordNethttp://www.cogsci.princeton.edu/~wn/

Taxonomies

A taxonomy is a set of preferred terms, all connected by a hierarchy or polyhierarchy

Example:Chemistry

Organic chemistry

Polymer chemistry

Nylon

Frequently used in web navigation systems

United Nations Standard Products and Services Classification

Thesauri

A thesaurus is a controlled vocabulary with multiple types of relationships

Example:Rice UF PaddyBT CerealsBT Plant productsNT Brown riceRT Rice straw

Thesauri

Relationship types:• Use/Used For – indicates preferred term• Hierarchy – indicates broader and

narrower terms• Associative – almost unlimited types of

relationships may be used

It is the most complex format for controlled vocabularies and widely used.

National Monuments Record Thesauri-- Archaeological Objects Thesaurus

Use of Controlled Vocabularies in Information Storage and Retrieval Systems

Dublin CoreContent data for some elements may be selected from a

controlled vocabulary, as indicated by best practice guidelines

Content Intellectual Property

Instantiation

Coverage Contributor Date

Description Creator Format

Type Publisher Identifier

Relation Rights Language

Source

Subject

Title

Example from LOM(Learning Object metadata)

5.2 Learning Resource Type

Explanation: Specific kind of learning object. The most dominant kind shall be first.

NOTE: --The vocabulary terms are defined as in the OED:1989 and as used by educational communities of practice.

• Controlled terms

Value Space: ordered exercisesimulationquestionnairediagramfiguregraphindexslidetablenarrative textexamexperimentproblem statementself assessmentlecture

Build in a pick-list for creating metadata records

Build in a thesaurus for automatic assignment of subject terms

Build in a thesaurus to assist searching

Build in an illustrated thesaurus to assist searching

Advantages and Disadvantages of Particular Structures

• Lists:– Simple to implement, use, and maintain– Provide little or no guidance for the user

• Synonym Rings:– Are constructed manually and are not used in

indexing– Can be useful in retrieval as they allow

synonyms and near-synonyms to be treated equally in searching.

Advantages and Disadvantages of Particular Structures

• Taxonomies– Good information about hierarchical relationships among terms

– Useful for both indexers and searchers who need to discover the most appropriate, specific terms for their purposes

– There is no entry vocabulary, (i.e. USE/USED FOR terms)– Taxonomies do not indicate other types of relationships among

terms• Thesauri

– Good information about hierarchical relationships among terms– Good information about relationships among terms– Entry vocabulary to help users locate the correct terms– Thesauri are useful for both indexers and searchers who need to

discover the most appropriate, specific terms for their purposes– Thesauri are time-consuming and labor intensive to develop and

maintain

Typical applications of Lists, Synonym

Rings, Taxonomies, and Thesauri • Lists

– Lists are frequently used to display small sets of terms that are to be used for quite narrowly defined purposes such as a web pull-down list or list of menu choices.

• Synonym Rings– Synonym rings are frequently used behind-the-scenes to enhance retrieval,

especially in an environment in which the indexing uses an uncontrolled vocabulary and/or there is no indexing as when searching full text.

• Taxonomies• Taxonomies are often created and used in indexing applications and for web

navigation. Because of their (usually simple) hierarchical structure) they are effective at leading users to the most specific terms available in a particular domain.

Thesauri– Thesauri are the most typical form of controlled vocabulary developed for

use in indexing and searching applications because they provide the richest structure and cross-reference environment. Thesauri can be narrow in scope and cover a limited domain or they can be broad in scope and widely applicable to many different types of content.

Subject Analysis

• Subject analysis is the abstracting and indexing of an item’s conceptual content

• A two step process:– ascertaining the subject– translating the subject into controlled vocabulary

• Important considerations include: cataloger objectivity, cataloger’s background knowledge, and consistency in determining the content

Subject Analysis

• Finding (find a work of which subject is known)

• Collocating (find what repository has on subject)

• Evaluating (assist in making informed decision)

• Navigating (provide users with links to related terms)

Subject Analysis

• What is it about? (aboutness or subject)

• What is it for? (relevance or use)

• These can be the same question in some instances, but often the subject of a work can be quite separate from the use to which the searcher may put it or the reasons why the searcher considers it relevant.

There are a number of methods for determining the aboutness of an item

• The Purposive Method tries to determine the author's purpose in creating the work.

• The Figure-Ground Method tries to determine what is most central to the work (highly subjective).

• The Objective Method counts references to topics and presume that commonly used topic words are central (this is one of the methods used by computers).

• The Appealing to Unity Method tries to determine what holds the work together.

This photograph is from the Library of Congress, and it was

taken by Marion Post Wolcottin March 1940

What is this scene about?

• photo of a town covered in snow at night• from 1940• is this about winter?• small town America?• the introduction of electric lights?• the depression?

The answer is it is about all of those things, and probably more. But it is a photo of a small town in the U.S. in the snow, it is a main street, we see automobiles and houses but also commercial buildings, footsteps in the snow, electric lights, and so on.

There is a fundamental difference between what an artifact is (a book or a photograph), what it is of, and what it is about. But all of those things usually get lumped together in subject headings and classifications.

Summarization for Subject Analysis

• Sumarization is the process of deciding what an item is about and translating this into index terms from a subject language.

• This process should examine three distinct areas: the discipline in which the item was produced, the specific subjects or topics treated and the form of the item.

Summarization

• "Summarization" is the word used for a string of terms that describe the aboutness of an artifact.

• Discipline | Topic {Facet} | Form

• The photograph could be described as:• Sociology | Depression; Winter; American town |

PhotographOR• History | Winter; Small Town America |

Photograph

Library of Congress Subject Headings (LCSH)

• Originally designed as a controlled vocabulary for representing the subject and form of books and serials in the LC collection

• Literary warrant: LC collection• originally for use in LC catalogs• now global standard for (i) library catalogs, (ii)

bibliographic databases• Approximately 259,000 headings• c.10,000 new headings added each year• Approximately 36% of headings are followed by

LC Class numbers

LCSH Principles• User and usage based• Literary warrant• Uniform headings

– Synonymous terms– Spelling variants– English vs. foreign language terms– Scientific/technical vs. popular terms– Currentness

• Unique headings• Specific entry and co-extensivity• Internal consistency• Stability• Precoordination: indexing terms are chosen and coordinated (“put

together as a string”) at the time of cataloging

LCSH Headings can be:

• Personal names– Individuals– Families, dynasties, etc– Mythological, legendary or

fictitious characters

• Corporate bodies• Historical events• Names of animals• Other proper names• Languages• Ideas, events

• Prizes, awards• Holidays, days of the

week, etc.• Ethnic groups, tribes,

nationalities, etc.• Religious, philosophical

systems• Geographic names

– Jurisdictional headings– Geographic features

• You name it – it can be a subject heading

LCSH Conventions for Relationships

• UF: used for: specific see reference

• BT: broader term: specific see also reference

• NT: narrower term: specific see also reference

• SA: see also: general see also reference

• RT: related term: specific see also reference

Syndetic structure: references

• Equivalence relationships

• Hierarchical relationships

• Associative relationships

Equivalence or USE/UF references

• Link terms that are not authorized to their preferred form

• Example:Baby sitting

USE Babysitting

Categories of USE/UF references

• Synonyms and near synonyms– Dining establishments USE Restaurants

• Variant spellings– Haematology USE Hematology

• Singular/plural variants– Salsa (Cookery) USE Salsas (Cookery)

Categories of USE/UF references

• Variant forms of expression– Nonbank banks USE Nonbank financial

institutions

• Alternate arrangement of terms– Dogs—Breeds USE Dog breeds

• Earlier forms of headings– Restaurants, lunch rooms, etc. USE

Restaurants

Hierarchical references: broader terms and narrower terms

• Link authorized headings

• Show reciprocal relationships

• Allow users to enter at any level and be led to the next level of either more specific or more general topics

Three types of hierarchical references

• Genus/species (or class/class member)Dog breeds Shih tzus

NT Shih tzus BT Dog breeds

• Whole/partFoot Toes

NT Toes BT Foot• Instance (or generic topic/proper-named

example)Mississippi River Rivers—United States BT Rivers—United States NT Mississippi River

Associative or related term references

• Link two headings associated in some manner other than hierarchy

• Currently made between– Headings with overlapping meanings

• Carpets RT Rugs– Headings for a discipline and the focus of that

discipline• Ornithology RT Birds

– Headings for persons and their field of endeavor• Physicians RT Medicine

Entry in LCSHAutomobiles (May Subd Geog) [TL1-296.5]

UF Autos (Automobiles) Cars (Automobiles) Gasoline automobiles Motorcars (Automobiles)BT Motor vehicles Transportation, AutomotiveSA headings beginning with the word Automobile NT A.C. Automobile Abarth automobiles

Alfa Romero automobile Etc.

Entry in LCSH

Librarians (May Subd Geog) [Z682 (Personnel)] [Z720 (Biography]

BT Information scientists Library employeesRT Libraries NT Academic librarians Acquisitions librarians

Adult services librarians Bisexual librarians Etc.

Getty VocabulariesGetty Vocabularies

• Structure & content are based upon standards (e.g., ISO, CDWA)

• Are compiled resources (not comprehensive)

• Growth through collaboration, inside Getty & outside

Getty VocabulariesGetty Vocabularies

• Art & Architecture Thesaurus (AAT)

• Union List of Artist Names (ULAN)

• Getty Thesaurus of Geographic Names (TGN)

Types of terms in vocabulariesTypes of terms in vocabularies

• personal names: Painter of the Wedding Procession (attributed to); Nikodemos (signed, as potter)

• geographic names: Athens• object names: storage vessels, Panathenaic

amphorae• corporate names: J. Paul Getty Museum• iconographic subjects and themes: Nike

Crowning the Victor, with Judge on right and defeated opponent on left

• genre terms: Antiquities, ceremonies• multilingual terms: Athínaí (Greek) = Athens

(English) = Athenae (Latin)

Types of terms in vocabulariesTypes of terms in vocabularies• personal names

in the Union List of Artist Names you will find "Georgia O’Keeffe" • geographic place names

in the Getty Thesaurus of Geographic Names you will find "Botswana" • corporate names

in the Library of Congress Name Authority File you will find "Metropolitan Museum of Art (New York. N.Y)"

• object namesin the Art & Architecture Thesaurus you will find "scroll paintings"

• iconographic subjects and themesin ICONCLASS you will find the "education of Cupid by Venus and Mercury"

• genre termsin the Thesaurus for Graphic Materials II: Genre and Physical Characteristic Terms you will find "political cartoons"

• multi-lingual termsin the Multilingual Egyptological Thesaurus you will find the term "pottery" in English, German, "keramik" and French, "céramique".

Getty Vocabularies• data value standards that provide terminology for use in cataloging,

indexing and documentation practice. They are most effective when used in combination with data structure standards (e.g., CDWA) and data content standards (e.g., AACR2).

• thesauri built according to standards. They follow the rules and conventions prescribed by standards organizations such as ISO, NISO, and other codes of practice for thesaurus construction.

• designed for use in both indexing and retrieval. They are intended to bridge the language of the indexer and that of the searcher. If the vocabularies are available at the time of the search query, the searcher can consult the vocabulary to see what likely terms are available for the query.

Getty Vocabularies• facilitators for information-sharing among different types of

collections. For example, the AAT can be used to describe subject matter for books in a library, works of art in a museum, records in an archive, or images on the Web.

• application independent. The Vocabularies can be applied in the electronic environment in a variety of applications (e.g., databases and search engines) as well as in manual indexing systems, such as a card file.

• evolving and growing tools. Work with contributors allows for on-going community input and expansion of coverage in specialized subject areas.

AAT

• focus of the AAT is on art and architecture, as the title suggest.

• However, the AAT can provide terminology for the description, documentation, and retrieval of visual and textual surrogates for art, and for related disciplines.

• The scope of the AAT is global, although currently it is richest in terminology used for art of Western Europe and North America.

• The AAT is growing and expanding coverage by incorporating additional data from a variety of Getty projects and external contributors. For example, a working group from the National Museum of African Art has added terminology for African styles/periods and object names.

AAT

• The AAT includes terminology related to:

– works of art (e.g., painting, sculpture, mixed media)

– architecture (e.g., the built and natural environment)

– material culture (e.g., furniture, costume, and equipment)

– forms and genre (e.g., document types, records)

– cultural traditions (e.g., events)

High-Backed Chairfor Miss Cranston's tea rooms

AAT terms in Italics• What is it? high-backed chair• What is it made of? oak, horsehair• How was it made? upholstered, stained, pierced• Who made it? Charles Rennie Mackintosh, architect• When was it made?1898-99• What style is it? Arts and Crafts• What is it part of? tea room• What condition is it in? reupholstered• How was it used? dining• What is it about? anthropomorphic• Where did it come from? Miss Cranston's Arbyle Street Tea

Rooms• Where is it? Glasgow School of Art, Glasgow

AAT does not include certain types of terminology

• Personal Names: Charles Rennie Mackintosh (ULAN)

• Corporate Names: Glasgow School of Art (Library of Congress authority files)

• Geographic Place Names: Glasgow (TGN)• Building Names: Miss Cranston's Argyle Street

Tearoom (local authority)• Historic Events: Exhibition of Decorative Art,

London, 1923 (Library of Congress authority files)• Iconographic themes: Venus and Cupid

(ICONCLASS)

Art &Art &ArchitecturArchitectureeThesaurusThesaurus

• Contains around 34,000 concepts, 131,000 terms •Records contain terms, notes, relationships, bibliography

Scope ranges from Scope ranges from antiquity to presentantiquity to present

Global, but Global, but preponderance of preponderance of Western conceptsWestern concepts

Terms describe Art, Terms describe Art, Architecture, Architecture, Decorative Arts, Decorative Arts, Material Culture, & Material Culture, & Archival MaterialsArchival Materials

Elements of an AAT record

parent conceptparent concept furnishings mirrors wall mirrors

parent conceptparent concept furnishings mirrors wall mirrors

conceptobject, material,

activity, style, attribute...

conceptobject, material,

activity, style, attribute...

names/termsnames/termspier glassespier mirrorstrumeaux

names/termsnames/termspier glassespier mirrorstrumeaux

related conceptsrelated conceptspier tables

related conceptsrelated conceptspier tables

sources sources Comstock, Helen. The Looking Glass in America, 1700-1825. Page 17.

sources sources Comstock, Helen. The Looking Glass in America, 1700-1825. Page 17.

scope notescope note Tall, narrow mirrors intended to fill the pier, the space between two windows...

scope notescope note Tall, narrow mirrors intended to fill the pier, the space between two windows...

Note: The Focus of each vocabulary record is a concept - not a “term”

TGN

• The TGN is a structured vocabulary containing around 1,000,000 names and other information about places.

• The TGN includes all continents and nations of the modern political world, as well as historical places.

• It includes physical features and administrative entities, such as cities and nations.

• The emphasis in TGN is on places important for art and architecture.

Getty Getty Thesaurus Thesaurus of Geographic of Geographic NamesNames

Getty Getty Thesaurus Thesaurus of Geographic of Geographic NamesNames

Records for 912,000 places, 1,106,000 namesNames, coordinates, relationships, dates & bibliography

Includes all continents and nations of modern political world, historical places

Includes physical features Includes inhabited places,

other administrative and political entities

Emphasis on places important to art & architectural history

Scope and range

Elements of a TGN recordnamesSienaSena Julia

namesSienaSena Julia

notesFounded as Etruscan hill town; later was Roman city of Sena Julia; thrived under Lombard kings; was medieval self-governing commune; was seat of Ghibelline power ...

notesFounded as Etruscan hill town; later was Roman city of Sena Julia; thrived under Lombard kings; was medieval self-governing commune; was seat of Ghibelline power ...

place typesinhabited placeprovincial capital

place typesinhabited placeprovincial capital

datessettled by Etruscans (flourished 6th cen. BCE)

datessettled by Etruscans (flourished 6th cen. BCE)

parent placeItaly Tuscany Siena province

parent placeItaly Tuscany Siena province

geographic coordinates43 19 N, 011 21 E

geographic coordinates43 19 N, 011 21 E

bibliographyAnnuario Generale (1980) Dizionario Corografico Toscana (1977) Webster's Geographical Dictionary (1984) Hook, Siena (1979), 6 ff. TCI: Toscana (1984), 479 ff. Times Atlas of the World (1992), 183Canby, Historic Places (1984), II, 861Milanesi, Storia dell'Arte Senese (1969)

bibliographyAnnuario Generale (1980) Dizionario Corografico Toscana (1977) Webster's Geographical Dictionary (1984) Hook, Siena (1979), 6 ff. TCI: Toscana (1984), 479 ff. Times Atlas of the World (1992), 183Canby, Historic Places (1984), II, 861Milanesi, Storia dell'Arte Senese (1969)

placeplace

Focus is concept

ULAN

• The ULAN is a structured vocabulary that contains around 220,000 names and other information about artists.

• The coverage of the ULAN is from Antiquity to the present, and the scope is global.

• The scope of the ULAN includes any identified individual or "corporate body" (i.e., a group of people working together) involved in the design or creation of art and architecture.

UnionUnionList of ArtistList of ArtistNamesNames

UnionUnionList of ArtistList of ArtistNamesNames

ULAN contains records for 120,000 ‘artists’, 293,000 namesRecords contain names, biographical information, relationships, & bibliography

Scope is from Antiquity to the present

Coverage is global, preponderance Western artists

Identified individuals or groups of individuals working together (corporate bodies)

Involved in the conception or production of visual arts & architecture

Scope and Range

Elements of a ULAN record Elements of a ULAN record

ArtistArtist

namesDosso DossiGiovanni de LuteroDosso da FerraraGiovanni di Niccolò

namesDosso DossiGiovanni de LuteroDosso da FerraraGiovanni di Niccolò

namesDosso DossiGiovanni de LuteroDosso da FerraraGiovanni di Niccolò

namesDosso DossiGiovanni de LuteroDosso da FerraraGiovanni di Niccolò

life datesborn ca. 1490, active from 1512, died 1542

life datesborn ca. 1490, active from 1512, died 1542

related peoplestudent of: Lorenzo Costa di Ottavio, from 1507

related peoplestudent of: Lorenzo Costa di Ottavio, from 1507

bibliography*Bénézit; Berenson; *Bolaffi; *Encyc. world art; Gibbons, DOSSO AND BATT. DOSSI (1968); Grove Dict of Art

bibliography*Bénézit; Berenson; *Bolaffi; *Encyc. world art; Gibbons, DOSSO AND BATT. DOSSI (1968); Grove Dict of Art

notesAlthough early biographers, including Vasari, noted a birth date of ca. 1475, modern scholars agree that he cannot have been born much before 1490...

notesAlthough early biographers, including Vasari, noted a birth date of ca. 1475, modern scholars agree that he cannot have been born much before 1490...

geographic locationFerrara (Italy)Venice (Italy)

geographic locationFerrara (Italy)Venice (Italy)

rolespainterdraftsman

rolespainterdraftsman

Focus is concept

Cataloguing Cultural Objectsas a tool for subject cataloguers

Aims

• practical guidance for subject cataloguers, indexers

• intra- and inter-indexer consistency

• user–indexer consistency

• retrieval effectiveness

Cataloguing Cultural Objectsas a tool for subject cataloguers

Challenges1. what does “subject” mean? -- i.e., what kinds of

property of works should be indexed?2. what kinds of method should be used to determine

the subject(s) of works, and ...3. ... to select terms that represent those subjects?4. what kinds of control should be imposed on the

lists of terms from which selection is made, and how should such authority control be implemented?

5. what metadata elements should be established for recording subject data?

Kinds of subject

Subjects, objects, images, texts • subjects: e.g., people, things, events,

places, concepts• objects (works) [in museums, archives]:

e.g., artworks, buildings, artifacts, documents, collections– descriptive cataloguing: what the objects are– subject cataloguing: what subjects the objects

are of / about

Kinds of subject

• images [in visual resource collections]: visual representations of objects, e.g., photographs, slides, digital files– descriptive cataloguing: what the images are; what

objects the images are of– subject cataloguing: what subjects the images are about

• texts [in libraries]: verbal representations of objects, e.g., books, journal articles– descriptive cataloguing: what the texts are– subject cataloguing: what objects the texts are about;

what subjects the texts are about

CDWA Subject

• In CDWA, subject matter is analyzed according to a method based on the work of Erwin Panofsky

• Panofsky identified three main levels of meaning in art:– Pre-iconographic description– Iconographic identification– Iconographic interpretation or “iconology”

CDWA Subject

• Three sets of subcategories under the category Subject Matter in CDWA reflect this traditional art-historical approach to subject analysis

• Simplified and practical for purposes of retrieval

CDWA Subject

• CDWA levels of subject analysis– Subject matter–Description. A description of the work

in terms of the generic elements of the image or images depicted in, on, or by it

– Subject matter–Identification. The name of the subject depicted in or on a work of art: its iconography. Iconography is the named mythological, fictional, religious, or historical narrative subject matter of a work of art, or its non-narrative content in the form of persons, places, or things

– Subject matter-Interpretation. The meaning or theme represented by the subject matter or iconography of a work of art.

Mantegna’s Adoration of the Magi

• Subject matter–Description: woman, baby, men, vessels, coins, turbans, etc.

• Subject matter–Identification: Known iconographic subject. Based on New Testament (Matthew 2). Balthasar, Melchoir, Caspar, Mary, Jesus, Joseph

• Subject matter-Interpretation: Three Ages of Man (Youth, Middle Age, Old Age); Three Races of Man; Three Parts of the World

Kinds of subject

Representation• representational (figurative) works

– narrative subjects• stories• episodes in stories, i.e., events

– non-narrative subjects• people, animals, plants• objects, e.g., buildings• activities; places; periods• [work types: portraits, still lifes, landscapes, genre

scenes, architectural drawings ...]

Kinds of subject

• non-representational works• abstract works• buildings• furniture• decorative arts

– “subject” / content = • meaning (symbolic, allegorical, thematic,

conceptual) • form, composition• function, purpose, use

Kinds of subject

Ofness and aboutness• what is the work of?

– generically: description• e.g., “Nude standing woman seen from front, holding dagger

in right hand”

– specifically: identification• e.g., “The suicide of Lucretia”

• what is the work about?– interpretation

• e.g., “virtuousness”

CCO recommendation #1

• subject data should be consistently given for all works, not just for representational ones– (even if those data end up overlapping with

the content of other elements, e.g. Work Type)

Subject analysis

Ofness

• who? what? where? when?– people, objects/activities, places, times

• generic to specific

• left to right; top to bottom; foreground to background ...

Subject analysis

Aboutness• what is the meaning of the work?• what is expressed by the work?• what do the objects, events, etc., depicted in the

work symbolize?• how may the image be interpreted?

• what was the intention of the work’s creator?• how has the work been interpreted historically?

CCO recommendation #2

• take a methodical approach to subject analysis

Term selection

What kinds of terms? How many terms?• factors that can’t help but affect the

specificity of indexing:– quality and quantity of available scholarly information

about the work – extent of indexer’s knowledge of the work– extent of indexer’s general pre-iconographic knowledge– depth of indexer’s indexing expertise – availability of time; money; human resources;

technology at institution’s disposal

Term selection

• factors that should also affect the specificity of indexing– needs of end-users: expert and non-expert– characteristics of the collection– relative importance of the work– presence of unusual details in the work– institutional policies

• number of terms to be assigned per work• method of subject analysis to be used

– capabilities of system• e.g., to link NTs to BTs, preferred terms to synonyms and

RTs, etc.

CCO recommendation #3a

• don’t be specific without the support of scholarly evidence– better to be general and accurate than

specific and wrong

CCO recommendation #3b

• use subject terms that have been identified as “preferred” in established authority files (controlled vocabularies)

Authority control

Four kinds of authority file• Personal and Corporate Body Authority

– preferred forms of names of real people/bodies (as artists, patrons, subjects of works)

• Geographic Place Authority– preferred forms of names of real places

Authority control

• Concept Authority– preferred forms of genre terms

• e.g. “still life,” “landscape”– preferred forms of generic subject terms

• objects, materials, activities, agents, properties, styles, periods treated as subjects

Authority control

• Subject Authority– preferred forms of iconographical terms

• proper names, uniform titles, standard labels ...• ... of characters, situations, events, themes, works

(e.g., buildings) ...• ... in historical, mythological, religious, literary

contexts

Authority control• cf. AAT: Art & Architecture Thesaurus

– terms for describing what objects / images are– project began 1980; funded by CLR, NEH, Mellon, then Getty from 1985;

sponsored by ARLIS, CAA, SAH, etc.– current: version 3.0-Web, at

http://www.getty.edu/research/conducting_research/vocabularies/aat/• cf. ICONCLASS: Iconographic Classification System

– terms for describing what objects / images are of / about– an iconographic classification system (not a vocabulary per se) – a collection of circa 24,000 ready-made definitions (in English) of objects,

persons, events, situations, and abstract ideas that can be the subject of a work of art (emphasis is on Western art)

– 1949: van de Waal (U. Leiden) began to develop ideas that led to ICONCLASS– 1973-85: published in 17 vols.– ICONCLASS Libertas Browser (KNAW, Amsterdam): web-accessible version, at

http://www.iconclass.nl/

ICONCLASS

• Iconclass was developed by Henri van de Waal (1910-1972), Professor of Art History at the University of Leiden

• His ideas for a systematic overview of subjects, themes and motifs in Western art, which later became the Iconclass System, took form in the early 50’s.

• The complete Iconclass System was finished in the years after 1972 by a large group of scholars and was published between 1973 and 1985 by the Royal Netherlands Academy of Arts and Sciences (KNAW) of which Van de Waal was a member.

ICONCLASS

• Iconclass is a subject-specific classification system; it is a hierarchically ordered collection of definitions of objects, persons, events and abstract ideas that can be the subject of an image.

• Art historians, researchers and curators use it to describe, classify and examine the subject of images represented in various media such as paintings, drawings and photographs.

ICONCLASS

• Numerous institutions across the world use Iconclass to describe and classify their collections in a standardized manner.

• In turn, users ranging from art historians to museum visitors use Iconclass to search and retrieve images from these collections.

• As a research tool, Iconclass is also used to identify the significance of entire scenes or individual elements represented within an image.

The three main component of Iconclass are

• Classification System: 28,000 hierarchically ordered definitions divided into ten main divisions. Each definition consists of an alphanumeric classification code (notation) and the description of the iconographic subject (textual correlate). The definitions are used to index, catalogue and describe the subjects of images represented in works of art, reproductions, photographs and other sources.

• Alphabetical Index: 14,000 keywords used for locating the notation and its textual correlate needed to describe and/or index an image. This index is a valuable tool for iconographers in the identification, search and retrieval of subjects and scenes.

• Bibliography: 40,000 references to books and articles of iconographical interest.

Authority control

Kinds of source of terminology for local authority files– distinguished by structure:

• hierarchical vs. non-hierarchical– by object type:

• subjects vs. people/places– by scope:

• domain-specific vs. interdisciplinary– by purpose:

• authority control vs. end-user reference

CCO recommendation #4

• link the occurrences of subject terms in work records to the authority records for those terms – (in authority files that implement synonym

control and hierarchical structure)

Record structure

Metadata element sets• cf. CDWA: Categories for the Description of Works of

Art– ed. Baca, Harpring– funded by Getty, NEH, CAA– 2000: version 2.0; on web at

http://www.getty.edu/research/conducting_research/standards/cdwa/

• cf. VRA Core Categories– ed. Lanzi, Whiteside– 2007: version 4.0; on web at

http://www.vraweb.org/projects/vracore4/index.html

Record structure

Subject metadata elements recommended by CCO

• Description [free-text; non-repeatable]• Subject [required; controlled; repeatable]• Extent

– for designating the part of the work to which the subject terms are applicable

• Subject Type– for distinguishing between description,

identification, interpretation

CCO recommendation #5

• implement separate subject elements for display and for retrieval

Example

• Statue of Hercules (Lansdowne Herakles)

• Unknown Roman sculptor; after the School of Polykleitos

• about 125 CE• marble• height: 193.5cm• J. Paul Getty Museum

(Los Angeles, CA)• ©2004 J. Paul Getty

Trust.

ExampleDescription: Herakles standing in

contrapposto, holding his attributes, the skin of the Nemean lion and a club. This statue was found in Tivoli ca. 1790, in the ruins of Hadrian’s villa; it was in the collection of the Marquess of Lansdowne until 1951. It is related in appearance to works attributed to 4th-century BCE Greek sculptors; however, the work has an eclectic style that is purely Roman.

Subject--Description: religion/mythology; human figure; male; nude; lion skin; club

Subject--Identification: Hercules (Greek/Roman hero); Nemean Lion

Example of a Subject Authority record

Subject Names: Hercules (preferred); Herakles; Heracles; Ercole; Hercule; Hércules

Hierarchical Position: Classical mythology--Greek heroic legends--Story of Hercules--Hercules

Indexing Terms: Greek hero; king; strength; fortitude; perseverance; Argos; Thebes

Note: Probably based on an actual historical figure, a king of ancient Argos. The legendary figure was the son of Zeus and Alcmene ...

Related Subjects: Labors of Hercules; Love Affairs of Hercules; Zeus (Greek god); Alcmene (Greek heroine); Hera (Greek goddess)

Dates: Story developed in Argos, but was taken over at early date by Thebes; literary sources are late, though earlier texts may be surmised. Earliest: -1000 Latest: 9999

Sources: ICONCLASS http://www.iconclass.nl/; Grant, Michael and John Hazel. Gods and Mortals in Classical Mythology. Springfield, MA: G & C Merriam Company, 1973. Page: 212 ff.

Opportunities

• integrity and longevity of data

• consistent, reliable access to data

• exchange, sharing, reuse of data

• interoperability of systems

• easy migration of data to new systems

• communication, cooperation, collaboration

Questions

• should indexers be expected to do iconographical research to index aboutness?

• should cultural-historical questions about a work’s unintended meanings be answered by indexers?

• how may future users’ needs be predicted?

• what role for general knowledge-organization schemes?