module 7b: term control and semantic relationships

32
Module 7b: Term Control and Semantic Relationships IMT530: Organization of Information Resources Winter 2008 Michael Crandall

Upload: krista

Post on 24-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Module 7b: Term Control and Semantic Relationships. IMT530: Organization of Information Resources Winter 2008 Michael Crandall. Steps in Constructing CVs. Define your domain Gather concepts From user interviews, search logs, content analysis, preexisting vocabularies Select your approach - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Module 7b:  Term Control and Semantic Relationships

Module 7b: Term Control and Semantic Relationships

IMT530: Organization of Information Resources

Winter 2008

Michael Crandall

Page 2: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 2

Steps in Constructing CVs

• Define your domain• Gather concepts

– From user interviews, search logs, content analysis, preexisting vocabularies

• Select your approach• Extract terminology• Control your terms• Organize your terms• Maintain, maintain, maintain

Page 3: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 3

Elements of Building CVs• Select your approach

– Pre- or post-coordinated (sixteenth century lute music or sixteenth century and lutes and music)

– Open or closed (indexers can add terms or not)– Enumeration vs. synthesis (facets)

• Extract terms– Warrant (from users or domain or both)

• Control terms– Specificity (cats or Siamese cats?)– Control of homographs (qualifications)– Term consistency and word form (plurals, etc.)– Multiword/phrase sequence and form (inverted, normal form?)– Term definitions (scope notes)– Syntax (citation order)– Semantic factoring

• Organize terms– Semantic relationships

Page 4: Module 7b:  Term Control and Semantic Relationships

Term Control

Page 5: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 5

Term control

– Specificity (cats or Siamese cats?)– Control of homographs (qualifications)– Term consistency and word form (plurals,

etc.)– Multiword/phrase sequence and form

(inverted, normal form?)– Term definitions (scope notes)– Syntax (citation order)– Semantic factoring

Page 6: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 6

Specificity

• Depends on user needs and time available• Should be consistent throughout CV to avoid

user confusion• May be influenced by choice of approach

– If faceted some facets may be more specific than others

– If hierarchical you should be consistent throughout

Page 7: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 7

Homographs

• Sometimes a single word or phrase has multiple meanings: e.g., “power”, “drum”, “Java”, “Jupiter”

• Controlled vocabularies “disambiguate” these terms to make each term have a single meaning– In thesauri & subject heading lists, parenthetical

qualifiers are added, e.g. these LCSH terms “Power (Mechanics)”; “Power (Christian theology)”; “Power (Social Sciences)”; Power (Philosophy)”

– In taxonomies and classifications, the meaning of homographs is contextualized by placement in a particular hierarchy (following the example above, Power will appear in the Philosophy, Christianity, Social Sciences, and Mechanics hierarchies and the terms themselves, by virtue of their location (thus, different notation), will be disambiguated)

Page 8: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 8

Word Form

• Single word form should be consistent– Choose verbs or nouns– Singular or plural– Standard form

• Phrases should be standard form– Either direct (Constitutional government)– Or inverted (government, constitutional)

• Allows closer grouping of like terms in alphabetic display- not used much anymore

Page 9: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 9

Scope Notes

• Scope notes are term definitions in a thesaurus or controlled vocabulary

• Scope notes are useful for indexers to let them know what the precise meaning of the term is; and for users to help them know if they are searching on the correct term

Page 10: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 10

Syntax

• Syntax describes how terms are built (especially, how multiple concepts may be combined), and citation order (order of facets)– Syntax is an issue when concepts are pre-

coordinated in an indexing term (whether the syntax is consistent or not)

– Syntax is an issue for CVs that use synthesis with facets in that rules for synthesis (also called citation order in classification schemes) determine term syntax

Page 11: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 11

Semantic Factoring

• “The process of analyzing some or all of the categories of an ontology into a collection of primitives” Sowa, J. F. (2003). Ontology. Glossary. http://www.jfsowa.com/ontology/gloss.htm

• Essentially, you are trying to decompose terms into their elemental concepts, to minimize duplication and maximize reuse– For example: ship = vehicle+water transport – Not always possible, especially with non-concrete concepts

• “Creating a thesaurus without doing semantic factoring is like trying to put together furniture from Ikea without following the instructions. You will get interesting configurations, but you will not save time.” Ezzo, J. (2005) Bella and Yakov and Tillie's Panties: What I Learned in “Construction and Maintenance of Indexing Languages and Thesauri” Bulletin of the American Society for Information Science and Technology 31(4) April/May 2005. http://www.asis.org/Bulletin/Apr-05/ezzo.html

Page 12: Module 7b:  Term Control and Semantic Relationships

Relationships in CVs

Page 13: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 13

Relationships in Controlled Vocabularies

• There are three major types of relationships between subject concepts

– Equivalence Relationships – Hierarchical Relationships – Associative Relationships

Page 14: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 14

Equivalence Relationships

• In natural language one word or phrase can refer to one or more concepts; and multiple terms can refer to a single concept

• In other words, there is no one-to-one correspondence between words/phrases and concepts

Page 15: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 15

Preferred Terms and Cross references (Synonyms)

• Controlled vocabularies create one-to-one relationships between synonyms – multiple words or phrases that share similar meaning

• To do this we:– Select Preferred term (descriptor, subject

heading)– Create cross references from non-preferred

terms (entry vocabulary, lead-in terms)

Page 16: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 16

Example Equivalence Display• Sample display for descriptor (preferred term) “Creativity” from

the ERIC Thesaurus:

Creativity UF Creative ability

Originality

• If you searched on “Originality” or “Creative ability” in the ERIC database, you would see these references:– “Creative ability” see “Creativity” OR– “Originality” use “Creativity”

• In other words, you would be led from the unused (lead-in) terms to the used (preferred) term.

Page 17: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 17

Equivalence Relationships - Summary

• Exist between words or phrases that share the same (or similar) meaning

• Equivalent terms are considered synonymous (whether they actually are or are not)

• When controlling vocabulary, one equivalent term is selected as a preferred term (e.g., descriptor); the other equivalent terms are treated as “lead in” terms or cross references

• References used in the CV to show equivalence relationships include: “UF” (use for); and “Use” “See”; and “Search under”

Page 18: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 18

Hierarchical Relationships

• Hierarchical Relationships:– May be strictly defined as:

• Genus-species (also called class inclusion or “is-a”) relationships

• Whole-part relationships (sometimes these are treated as associative relationships)

Page 19: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 19

Hierarchical Relationships

• Hierarchical Relationships:– May be illustrated by set notation: Set G (green) is

a subset of Set B (blue)– All Gs are also Bs (in other words, a G is a B)– Using a real-world analogy, if Gs are gorillas, and

Bs are animals, all gorillas are animals

Page 20: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 20

Ideal CV Hierarchical Relationships

• Ideally, all hierarchical relationships indicated in a controlled vocabulary are also controlled and defined as genus-species (and sometimes also whole-part) relationships

• ALL other relationships between terms are associative relationships

• In real life CVs, this is not always the case!

Page 21: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 21

References for Hierarchical Relationships

• Hierarchically related terms are shown by the BT (broader term), NT (narrower term), and sometimes See also/Search also references.

• Examples of two entries in the ERIC thesaurus:Creativity

BT Psychological characteristics

Psychological characteristics NT Creativity

Intelligence Cognitive style

Page 22: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 22

BTs & NTs

• In the previous slide, both Creativity and Psychological characteristics are preferred terms

• Each has its own display; the Creativity display (Creativity as a preferred term display) shows the reference to the broader, preferred term “Psychological characteristics”

Page 23: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 23

Testing for Hierarchical Relationships

• To test for a hierarchical relationship between terms, use the ‘is-a’ test.

• The relationship between “robin” and “bird”? (A robin is a (type of) bird, so the relationship is hierarchical; Bird is the broader term, Robin is the narrower)

• The relationship between Water and Hydronomy? (Water is not a hydronomy or a type of hydronomy; Hydronomy is not a water or a type of water; so the relationship here is an associative relationship)

Page 24: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 24

Examples of Hierarchical Relationships

• What is the relationship between these sets of terms?– books and library materials– water and floods– buildings and chimneys– painting and acrylic paints– water and groundwater

Page 25: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 25

Answers

• Books and Library materials (hierarchical)• Water and floods (associative because a flood

is not the same type of thing as water--one way you can tell is that one is a count noun, and the other is not--but maybe hierarchical is ok depending on context)

• Buildings and chimneys (hierarchical if you include whole-part relationships; associative if you don’t)

• Painting and acrylic paints (associative)• Water and ground water (hierarchical)

Page 26: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 26

More on Hierarchical Relationships

• A characteristic of the hierarchical relationship between terms that are strictly hierarchically related (genus-species only, not whole part) is Hierarchical Force

• When a narrower term is hierarchically related to a broader term, the narrower terms (NT) inherits all of the characteristics of the terms above it in a hierarchy

Page 27: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 27

Associative Relationships

• Include all relationships not encompassed by equivalence and hierarchical relationships

• In Controlled Vocabularies, these relationships are shown by the following references:– Related Term (RT), see also (SA)

• Examples of types of associative relationships (there are many of these!): – Thing and property (rubber, elasticity)– Complementary activities (teaching, learning)– Agent and activity (artist, painting)

Page 28: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 28

Associative Relationships

• Many of these are semantic relationships• Some of these are syntactic relationships too:

– Children see related term Games

• Problems – when to stop? How close in meaning or syntactic relation do two terms have to be to show them in a CV?

• Note: associative relationships are rarely shown in classifications & taxonomies

Page 29: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 29

Example Associative Relationship Display

• From the ERIC thesaurus:Comprehension RT Concept formation

Misconceptions Scientific literacy Thinking skills

• Again, remember that both Comprehension and all of the RTs are preferred terms; however, this is the display for the preferred term Comprehension

Page 30: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 30

Some Guidelines

• Does the taxonomy cover the domain appropriately?

• Is it within scope?• Do draft definitions for concepts express them

clearly?• Are duplicate concepts removed?• Are basic-level concepts represented?• Does extracted terminology express them?• Is the structure useful and sensible?

Page 31: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 31

Questions?

• If not, take a break!!!

Page 32: Module 7b:  Term Control and Semantic Relationships

IMT530- Organization of Information Resources 32

Exercise 7b

• Take your concept lists from the last exercise, and use those in Exercise 7b to begin building a controlled vocabulary

• Do as much as you can in class today, work on the rest during the week

• Each group should send me your initial controlled vocabularies by email by next Friday