1 cs 430: information discovery lecture 5 descriptive metadata 1 libraries catalogs dublin core

Post on 17-Jan-2016

221 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

CS 430: Information Discovery

Lecture 5

Descriptive Metadata 1

Libraries CatalogsDublin Core

2

Course Administration

3

Descriptive Metadata

• Catalog: metadata records that have a consistent structure, organized according to systematic rules.

• Abstract: a free text record that summarizes a longer document.

• Indexing record: less formal than a catalog record, but more structure than a simple abstract.

Some methods of information discovery search descriptive metadata about the objects.

Metadata typically consists of a catalog or indexing record, or an abstract, one record for each object.

4

Descriptive Metadata

• Usually stored separately from the objects that it describes, but sometimes is embedded in the objects.

• Usually the metadata is a set of text fields.

Textual metadata can be used to describe non-textual objects, e.g., software, images, music

5

Descriptive metadata

Information discovery is often most effective when applied to metadata rather than raw information

• Allows fielded searching

author = "Goethe"

• Suitable for non-textual material

type = "picture" and subject = "Ithaca"

• Can be used with controlled vocabulary

language = "en"

6

Origins of Library Catalogs

Bibliographic Objective:

• To bring together like items

• To differentiate among similar ones

Sir Anthony Panizzi, Keeper of Books at the British Museum (1856-67).

His Ninety-One Rules (1841) were the basis of modern catalogue rules.

7

Origins of Library Catalogs

Information Discovery:

• to enable a person to find a book of which either the author, title or subject is known

• to show what the library has by a given author, on a given subject, or in a given kind of literature

• to assist in the choice of a book as to its edition (bibliographically) or to its character (literary or topical).

Charles Ammi CutterLibrarian of the Boston Athenaeum

Rules for a Dictionary Catalog, 1874

8

Origins of Library Catalogs

Classification:

Division of subject matter into a hierarchy. Typically used in libraries to provided a subject-based order for shelving books.

Melvil DeweyActing Librarian of Amherst College (1874)

Dewey Decimal system of book classification, uses the numbers 000 to 999

to cover the general fields of knowledge and decimals to fit special subjects.

9

Technology

Materials to be catalogued:

• Originally books

• Extended to serials, maps, music, etc., but concepts still rely heavily on experience with books

Form of catalog:

• Entries in books (Panizzi)

• Index cards (Cutter)

• Online databases (Kilgour)

[Library Cataloguing will be continued in Lecture 6.]

10

Catalogs as Investments

Costs:

• Conventional Catalog Records are created by skilled librarians. (cost estimate $100 per record).

• OCLC's catalog has 43 million records. Total investment is several billion dollars.

Cataloguing Standards:

• Enable libraries to share records

• Combine records of the past with records created today

• Allow readers and librarians to move between libraries

11

Dublin Core

Simple set of metadata elements for online information

• 15 basic elements

• intended for all types and genres of material

• all elements optional

• all elements repeatable

Developed by an international group chaired by Stuart Weibel since 1995.

(Diane Hillmann and Carl Lagoze of Cornell are very active in this group.)

12

13

Dublin Core

publisher: OCLC

creator: Weibel, Stuart L.

creator: Miller, Eric J.

title: Dublin Core Reference Page

date: 1996-05-28

format: text/html (MIME type)

language: en (English)

identifier: http://purl.org/dc/documents/rec-dces-199809.htm#

14

Dublin Core with Meta Tags

<meta name="publisher" content="OCLC">

<meta name="creator" content="Weibel, Stuart L.">

<meta name="creator" content="Miller, Eric J.">

<meta name="title" content="Dublin Core Reference Page">

<meta name="date" content="1996-05-28">

<meta name="format" content="text/html">

<meta name="language" content="en">

<meta name="identifier" content="http://purl.org/dc/documents/rec-dces-199809.htm#">

15

Dublin Core elements

1. Title The name given to the resource by the creator or publisher.

2. Creator The person or organization primarily responsible for the intellectual content of the resource. For example, authors in the case of written documents, artists, photographers, or illustrators in the case of visual resources.

3. Subject The topic of the resource. Typically, subject will be expressed as keywords or phrases that describe the subject or content of the resource. The use of controlled vocabularies and formal classification schemes is encouraged.

16

Dublin Core elements

4. Description A textual description of the content of the resource, including abstracts in the case of document-like objects or content descriptions in the case of visual resources.

5. Publisher The entity responsible for making the resource available in its present form, such as a publishing house, a university department, or a corporate entity.

6. Contributor A person or organization not specified in a creator element who has made significant intellectual contributions to the resource but whose contribution is secondary to any person or organization specified in a creator element (for example, editor, transcriber, and illustrator).

17

Dublin Core elements

7. Date A date associated with the creation or availability of the resource.

8. Type The category of the resource, such as home page, novel, poem, working paper, preprint, technical report, essay, dictionary.

9. Format The data format of the resource, used to identify the software and possibly hardware that might be needed to display or operate the resource.

10. Identifier A string or number used to uniquely identify the resource. Examples for networked resources include URLs and URNs.

18

Dublin Core elements

11. Source Information about a second resource from which the present resource is derived.

12. Language The language of the intellectual content of the resource.

13. Relation An identifier of a second resource and its relationship to the present resource. This element permits links between related resources and resource descriptions to be indicated. Examples include an edition of a work (IsVersionOf), or a chapter of a book (IsPartOf).

19

Dublin Core elements

14. Coverage The spatial locations and temporal durations characteristic of the resource.

15. Rights A rights management statement, an identifier that links to a rights management statement, or an identifier that links to a service providing information about rights management for the resource.

20

Qualifiers

Element qualifier

Example: Date

DC.Date -> Created: 1997-11-01

DC.Date -> Issued: 1997-11-15

DC.Date -> Available: 1997-12-01/1998-06-01

DC.Date -> Valid: 1998-01-01/1998-06-01

21

Qualifiers

Value qualifiers

Example: Subject

DC.Subject -> DDC: 509.123

DC.Subject -> LCSH: Digital libraries-United States

22

23

Dublin Core with qualifiers

<title>Digital Libraries and the Problem of Purpose</title>

<creator>David M. Levy</creator>

<publisher>Corporation for National Research Initiatives</publisher>

<date date-type = "publication">January 2000</date>

<type resource-type = "work">article</type>

<identifier uri-type = "DOI">10.1045/january2000-levy</identifier>

<identifier uri-type = "URL">http://www.dlib.org/dlib/january00/01levy.html</identifier>

<language>English</language>

<rights>Copyright (c) David M. Levy</rights>

24

Limits of Dublin Core

Complex objects

• Article within a journal

• A thumbnail of another image

• The March 28 final edition of a newspaper

Complete object

Sub-objects

Metadata records

25

Flat v. linked records

Flat record

All information about an item is held in a single Dublin Core record, including information about related items

convenient for access and preservation

information is repeated -- maintenance problem

Linked record

Related information is held in separate records with a link from the item record

less convenient for access and preservation

information is stored once

Compare with normal forms in relational databases

26

Dublin Core with flat record extension

Continuation

<relation rel-type = "InSerial">

<serial-name>D-Lib Magazine</serial-name>

<issn>1082-9873</issn>

<volume>6</volume>

<issue>1</issue>

</relation>

27

Events

Version 1

New material

Version 2

Should Version 2 have its own record or should extra information be added to the Version 2 record?

How are these represented in Dublin Core?

28

Minimalist versus structuralist

Minimalist

15 elements, no qualifiers, suitable for non-professionals

encourage creators to provide metadata

Structuralists

15 elements, qualifiers, RDF, detailed coding rules

will require trained metadata experts

[For an example of how complex Dublin Core can become, see the source of: http://purl.org/dc/documents/rec-dces-199809.htm#]

29

Dublin Core in many languages

See:

Thomas Baker, Languages for Dublin Core, D-Lib MagazineDecember 1998, http://www.dlib.org/dlib/december98/12baker.html

top related