exercise overview - erik web viewclass 5 – methods of description, representation and...

23
Class 5 – Methods of description, representation and classification Exercise overview Last week we explored cataloging models and learned different types of metadata. We explored the world of cataloging principles and got a glimpse of how these principles inform metadata representation and encoding standards. We learned about the development of cataloging practices and standards in libraries, museums and archives (LAM) and the challenges associated with developing and maintaining standards that interoperate in these different domains. This week we will continue exploring our metadata model by learning about a metadata standard called Dublin Core and finding out more about the MARC bibliographic standard. Our focus this week is on data fields/structure and data content/values and how these metadata schema and storage mechanisms help us create and work with representations. In this class we will compare what we learned about RDA/AACR2 with our knowledge of Dublin Core to evaluate the suitability of metadata standards to an information need. Next week we will begin exploring how these metadata standards are encoded or represented in digital documents. Metadata Standards and Web Services Page 1 Erik Mitchell

Upload: dangdieu

Post on 03-Feb-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Class 5 – Methods of description, representation and

classification

Exercise overview

Last week we explored cataloging models and learned different types of metadata. We explored the

world of cataloging principles and got a glimpse of how these principles inform metadata

representation and encoding standards. We learned about the development of cataloging practices

and standards in libraries, museums and archives (LAM) and the challenges associated with

developing and maintaining standards that interoperate in these different domains.

This week we will continue exploring our metadata model by learning about a metadata standard

called Dublin Core and finding out more about the MARC bibliographic standard. Our focus this week

is on data fields/structure and data content/values and how these metadata schema and storage

mechanisms help us create and work with representations. In this class we will compare what we

learned about RDA/AACR2 with our knowledge of Dublin Core to evaluate the suitability of metadata

standards to an information need. Next week we will begin exploring how these metadata standards

are encoded or represented in digital documents.

Instructions:

Work individually or in groups to complete the worksheet. When you get to a section that requires

you to select a resource to explore – pick one resource (please don’t always choose the first one!).

When asked to ‘discuss as a group’, consider your response and continue completing the worksheet.

We’re going to work with computer coding today and here’s an important note as you follow the exercises. Computer code is shown on numbered lines and are enclosed in boxes. The

numbered lines are simply to help as a reference during instruction and should not be copied into

your program. For example a line that reads 56. p { visibility:hidden; } should simply be typed in as

p { visibility:hidden; }

Metadata Standards and Web Services Page 1

Erik Mitchell

Page 2: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Suggested readings1. Mitchell, E. (2013). Chapter 1: Metadata developments in libraries and other cultural heritage institutions. In Library

Linked Data: Research and Adoption, Library Technology Reports, 49,5 July/August 2013.

2. Review: Understanding Metadata. NISO press. http://www.niso.org/publications/press/UnderstandingMetadata.pdf

3. Powell & Johnston. (2007). Guidelines for implementing dublin core in xml. http://dublincore.org/documents/dc-xml-guidelines/

4. Read/skim: Taylor, Arlene G. and Daniel N.Jouedrey. (2009). “Systems for Vocabulary Control.” The Organization of Information.3rd Edition.

5. Read / Skim to supplement RDA description. Tillett, Barbara. What is FRBR? http://www.loc.gov/cds/downloads/FRBR.PDF

6. Read section 0 of RDA - RDA Toolkit - Section 0, Introduction, 1–12. Retrieved from http://access.rdatoolkit.org. See TXT file from this week for login information

7. Good resource for cataloging rules. https://sites.google.com/site/opencatalogingrules/

8. Gilliand, Anne J. (2012). Setting the Stage. http://www.getty.edu/research/publications/electronic_publications/intrometadata/setting.html

9. Explore: Introduction to the Dewey Decimal Classificationhttp://www.oclc.org/dewey/versions/ddc22print/intro.pdf

10. Explore: Introduction to Library of Congress Subject Headings http://www.tulane.edu/~techserv/lcsh%20introd.html

11. Explore: Library of Congress Main Classes, http://www.loc.gov/catdir/cpso/lcco/

12. Harden, Jean. (2012). Inadvertent RDA: New Catalogers’ Errors in AACR2. Journal of Library Metadata. 12:2-3. http://www.tandfonline.com.proxy-um.researchport.umd.edu/doi/full/10.1080/19386389.2012.700597

Understanding types of metadata

Take a few minutes to explore/review the definitions of metadata as found in our NISO reading from

last week (http://www.niso.org/publications/press/UnderstandingMetadata.pdf). Fill out Table 1

describing what purpose these types of metadata serve.

Table 1 Metadata types

Metadata type Metadata use

Administrative

Descriptive

Preservation

Metadata Standards and Web Services Page 2

Erik Mitchell

Page 3: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Structural

Technical

Key Questions

Question 1. What type of metadata is most often used in creating library online catalogs?

Question 2. The NISO document comments on different uses of metadata. In addition to

discovery what other uses are mentioned and which types of metadata would be useful for

each function?

Question 3. In addition to Dublin Core, the NISO document mentions a number of other metadata

standards and their functions. Using the document as your guide, fill out the table below:

Table 2Metadata standards, their function and examples

Metadata standard Primary type (e.g. Descriptive, structural…)

Example application

Dublin Core

Text Encoding Initiative (TEI)

Metadata Encoding and Transmission Standard (METS)

Metadata Object Description Schema (MODS)

Metadata Standards and Web Services Page 3

Erik Mitchell

Page 4: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Encoded Archival Description (EAD)

Visual Resource Association (VRA Core)

As you can see, metadata standards can fit both very general and specific uses, can be used for any

range of needs and can be loosely or strictly structured. The process of designing a metadata

standard or selecting one often involves a process akin to data modeling in which you design or

select a database or metadata standard that meets your information needs.

Conceptual model

With an understanding of the principles underlying the process of information organization lets turn our attention to models that help us understand the elements of representation. For a more in-depth discussion of the building blocks of metadata you can refer to Chapter 4 in Metadata Standards and Web Services in Libraries, Archives, and Museums

Step 1: Spend a few minutes with the Mitchell article, paying attention to the tables on page 9,

complete table 3 and answer the following questions:

Key Questions

Question 4. What are the five elements of a metadata standard, what is an example standard and

what need does each element fill?

Table 3 Exploration of metadata standards elements

Element Example What need does the element fill?

Data Model

Content rules

Metadata Standards and Web Services Page 4

Erik Mitchell

Page 5: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Metadata schema / vocabularies

Data serialization

Data exchange

Question 5. What are some of the challenges in metadata confronting libraries, archives and

museums identified in the Mitchell article?

Reflect back on the cataloging principles studied in class 4 (e.g. Paris Principles, IFLA, ISAD/G).

These principles are foundational to the entire model represented here in that they inform the data

model required to represent relationships, specify rules for creating content, suggest metadata

schema requirements and identify ways in which the created metadata should be stored, exchanged

and used. Consider, for example the inherent complexity in the IFLA guidelines as compared to the

Paris Principles. In order to create and manage new relationships between Works, Expressions,

Manifestations and Items we need to think about each of these metadata building blocks and

consider how they need to change to accommodate those relationships.

Data modeling

Data modeling is the process of designing or selecting an information system that is capable of

storing the metadata that you need to adequately represent your resource. Imagine the data model

required to store all of the information that YouTube needs to keep track of the videos, users,

comments, ratings and license data. Data modeling may involve a relational database like MySQL, a

text-file format like MARC (Machine Readable Cataloging) or a Graph database (Note: A Graph

Metadata Standards and Web Services Page 5

Erik Mitchell

Page 6: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

database describes a specific relationship between two things like Erik Likes the UMD iSchool and is

a common structure in social network systems).

In LIS, data modeling has historically focused on describing books and archival resources but the

increasing adoption of electronic and non-print media has challenged our profession’s ability to

support representations of these objects using our standard metadata systems. This semester we

are going to explore metadata standards that still serve the bulk of book cataloging (e.g. MARC,

AACR2, RDA) and standards that support archival resources like Dublin Core (DC). For the scope of

our class, data modeling includes all four of the aspects of the metadata model we explored last week

(fields/structure, content/values, format, data exchange).

Creating metadata

In our work with HTML documents we worked with first class (aka “primary source”) information

objects. They were not representations or derivatives but rather were used as the ‘endpoint.’ HTML

documents in real life tend to be complex because they contain not only ‘first-class’ information but

also contain representation metadata (e.g. meta tags), contextualization metadata (e.g. <p>, <h1>,

<title> elements) and sometimes contain surrogate metadata (e.g. abstracts and summaries). In

contrast, other metadata standards such as Dublin Core or Qualified Dublin Core are designed to

contain descriptive, administrative, or technical metadata about an information object.

Some key uses of descriptive metadata include:

1. Discovery by searching or browsing

2. Identification of resource values and differences

3. Co-location of similar resources

4. Provision of location information

5. Tracking of rights information

6. Differentiation of dissimilar resources

Metadata Standards and Web Services Page 6

Erik Mitchell

Page 7: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Reflect on the user-needs explored in our FRBR model (e.g. Find, Identify, Select, Acquire) and the

connection of these uses to the process of metadata creation. In this class we will explore how a

specific metadata standard – Dublin Core helps fill these services.

Identifying metadata

Representations are useful for text based resources but are almost always essential in media/image

based resources including still images, movies or audio recordings. Representations often contain

metadata that describes the information object but does not always. Lets return to one of our readings

for this week – the overview of metadata types by Gillian.

http://www.getty.edu/research/publications/electronic_publications/intrometadata/setting.html

With this article in mind, review the resource at http://1.usa.gov/qZFg3R . Note, if the resource is

unavailable you can find pdf files with the printed metadata in the class module.

Step 2: As you look at the resource, notice the tabs for “about this item, obtaining copies, and

access original.” Find the link to the MARC record and use it to see more metadata about

the record. If you have trouble identifying what the MARC fields mean (e.g. 245) – you can

check out http://www.oclc.org/us/en/bibformats/en. Review the page and classify the

metadata on the page according to Gillian’s types of metadata.

Question 6. What are some example metadata elements for each type of metadata?

a. Descriptive metadata:

b. Technical metadata

c. Administrative metadata:

d. Structural metadata:

Metadata Standards and Web Services Page 7

Erik Mitchell

Page 8: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

e. Rights metadata:

Exploring representation schemas – Dublin Core and MARC

No matter what type of metadata we are implementing, we need to follow a schema (e.g. A standard

set of elements that enable other people and systems to understand our metadata). One of the most

common schemas in use both in libraries and other information fields is the Dublin Core standard.

Dublin core consists of 15 core elements that were selected/designed to be applicable to a wide

range of information resources. While the process of working with Dublin Core can get complex, the

basic activity of identifying and assigning metadata to each element is pretty simple. Lets begin by

exploring this schema and how it implements these different types of metadata.

Question 7. Go to a web browser and retrieve the page http://dublincore.org/documents/dces/.

This is the basic Dublin core element set. A set of fifteen properties or elements that is

commonly used to describe elements. Review the metadata on the digital image page above

and map the metadata from the LOC image page onto the 15 Dublin Core properties.

Table 4: DC cataloging

Dublin Core property

Value from LOC digital image

contributor

Coverage

creator

Date

description

Metadata Standards and Web Services Page 8

Erik Mitchell

Page 9: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

format

identifier

Language

Publisher

Relation

Rights

Source

Subject

Title

Type

Key Questions

Review the types of metadata you found and answer the following questions:

Question 8. Are there metadata elements on the LOC page that you were not able to map to one

of the fifteen DC properties? What were they and what type of metadata (e.g. descriptive,

administrative, etc) were they?

Question 9. Were there DC properties that you did not understand or did not use? What were

they?

Metadata Standards and Web Services Page 9

Erik Mitchell

Page 10: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Dublin Core is a very general standard that does not dive into the detail often found in other metadata

schemas. For example while Dublin Core has 15 core elements, the MARC metadata schema has

several hundred! In addition the MARC schema allows us to refine metadata elements through the

use of special attributes.

One-to-One principle

One of the most complex decisions catalogers have to make when working with digital objects is

which object they are cataloging. In our Library of Congress example we are cataloging a digital

object of some print resource. In addition, we are dealing with only some limited files from that digital

object (notice for example that the administrative metadata discusses a larger collection). Deciding

which resource you are cataloging and then creating metadata that pertains only to that object is

known as the one-to-one principle (http://wiki.dublincore.org/index.php/Glossary/One-to-

One_Principle). While this is an important factor in reducing ambiguity in cataloging, as you may

have seen, it can be difficult to establish and hold this concept in application. In fact, some standards

(MODS for example) take a more pragmatic approach and suggest that it is acceptable to mix

description of print and digital objects as long as appropriate metadata elements are used.

Contents and values

The foundation of good metadata is proper identification of metadata elements and accurate

transcription / translation of metadata from an information resource into those elements. Remember

that data content refers to the formatting of metadata in a field and data values refers to the use of

controlled vocabularies, authority files, subject heading lists and taxonomies as field values. In the

coming weeks we will explore controlled vocabularies in more detail – particularly in the context of

subject heading assignment. For the time being however, lets expand our understanding of our

Dublin Core metadata schema by exploring the use of controlled vocabulary and content formatting.

Value schemes in metadata

In addition to defining the elements that make up a metadata schema (e.g. title, creator, contributor)

metadata schema also include rules for how the content is formatted. The Dublin Core standard calls

these two types of content control “Vocabulary Encoding Schemes” and “Syntax Encoding Schemes.”

Metadata Standards and Web Services Page 10

Erik Mitchell

Page 11: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Vocabulary schemes help create good metadata by providing a pre-defined list of acceptable values

for a given metadata element (e.g. Language names for the element Language). In contrast, Syntax

schemes help create good metadata by defining the formatting of content within a field (e.g. a specific

way of writing the date in the Date element). Vocabulary encoding schemes include subject heading

lists (e.g. Library of Congress Subject Headings (LCSH) and Dewey Decimal Classification (DDC)),

Thesauri of place names (e.g. TGN) and Internet Document Type categories (e.g. MIME Types).

Step 3: Lets quickly explore one of these vocabularies – the Thesaurus of Geographic Names

(TGN).

a. Using your web browser, open http://www.getty.edu/vow/TGNSearchPage.jsp

b. Search for Washington DC

Key Questions

Question 10. What is the preferred name in TGN for Washington DC?

Question 11. What are some other names?

Question 12. What “Place Types” does TGN map to Washington DC?

Question 13. What are some other ways to identify Washington DC? Are there any unique

identifiers or system actionable metadata that you might use?

Question 14. How might you use the information contained in this vocabulary to improve the

metadata functions related to FRBR (e.g., Find, Identify, Select, Acquire)?

Metadata Standards and Web Services Page 11

Erik Mitchell

Page 12: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Content schemes in metadata

Syntax encoding schemes focus on defining how a particular fields is formatted. Two fields for which

formatting is particularly important is dc:date and dc:identifier.

Step 4: In order to get a sense of these content formatting schemes lets also become acquainted

with the Dublin Core metadata registry, a website that lists all of the fields for Dublin Core.

a. In your web-browser go to http://dublincore.org

b. Click on DCMI Specifications and find “The Dublin Core Metadata Registry” on the

webpage.

c. Go to the Metadata Registry and click on the “Browse | Search” link.

d. Select “Syntax Encoding Schemes” and click OK

e. Click on “dcterms:W3CDTF” and explore the link to the w3c (hint – called see on the

page)

Key Questions

Question 15. How would you format today’s date using the W3c Date Time Format

(W3cDTF)

Question 16. Referring back to the DC metadata registry, pull up the list of properties (Also

known as fields or elements in metadata speak) and look for some elements that would be a

good fit for this content scheme?

While we have focused largely on Dublin Core in this exercise, there is a much larger scheme known

as Qualified Dublin Core that contains many more elements. These elements are often refinements

of the simple Dublin Core set. For example, while Dublin Core has the element "date," Qualified

Dublin Core also has "created," "valid," "available," "issued," and "modified" refinements of this

elements. As you can see with these elements, QDC metadata moves past descriptive metadata

elements into administrative element types.

Metadata Standards and Web Services Page 12

Erik Mitchell

Page 13: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Metadata terminology

Before we become too immersed in metadata land we need to clarify some terminology that we first

learned when working with HTML.

Element: A specific metadata field for which there is a definition and specific use. In Dublin Core

elements are also referred to as Properties

Properties: See elements

Fields: See elements (Note that the word 'field' is very common in MARC lingo)

Attributes: A refining concept for an element that gives the main element context. Also called

Classes in Dublin Core, Attributes provide context for the value of an element.

Classes: See Attributes

Values: A value is a string (‘literal’) (e.g. text) or a pointer to a string (“non-literal”) (e.g. a unique

identifier or URI) that belongs to either an element or an attribute. Values are the “data” in metadata.

Metadata services – Creation, extraction, conversion

As the semester progresses we will spend time exploring metadata-rich services. We have already

created one (our JavaScript bookmarklet) which used technical metadata (e.g. the URL) to see if a

web-resource that we had found was available via UMD.

Step 5: Before we wrap this week up, lets return to our NISO metadata document to get a sense of

what types of metadata services exist.

a. Return to the NISO document and explore pages 10-12 related to metadata creation

and services.

Key Questions

Question 17. In addition to manual metadata creation (i.e. what you did earlier in this

worksheet), what other ways exist to create metadata?

Metadata Standards and Web Services Page 13

Erik Mitchell

Page 14: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Question 18. The NISO document also explores ways to assess metadata quality. We have

already considered one way (adherence to value and content rules). What are some other

ways to assess quality?

Question 19. The Dublin core element and class dictionary that we have been exploring in

this exercise is an example of what type of metadata service?

Automatic metadata processes

As our last exercise this week, lets explore how automated tools can help us create, manipulate,

validate, and index metadata. To do this we will focus on an automatic metadata generation tool.

Step 6: Lets look at an automatic metadata generation tool. Find the brief url to the LC resource

we cataloged above. Now, visit http://www.ukoln.ac.uk/metadata/dcdot/ and enter the URL

metadata harvester box (http://1.usa.gov/qZFg3R). Review the created metadata and

compare it to your Dublin Core record above

Table 5: Manual vs automatic cataloging

DublinCore property

Manually cataloged value Automatically generated value

contributor

Coverage

creator

Date

description

Metadata Standards and Web Services Page 14

Erik Mitchell

Page 15: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

format

identifier

Language

Publisher

Relation

Rights

Source

Subject

Title

Type

Key questions

Question 20. Which fields contained identical or similar values?

Question 21. Which fields were widely different? Did the automatic metadata generator do

a better job creating an accurate representation or did you?

Metadata evaluation

While there are a number of measures of metadata quality, in this class we will focus on for that relate

directly to the quality of the metadata in relation to the resource it is representing. These four criteria

are:

1. Specificity (i.e. creating metadata that fits the most specific appropriate level of description)

Metadata Standards and Web Services Page 15

Erik Mitchell

Page 16: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

2. Completeness (i.e. how fully our representation describes our resource)

3. Consistency (i.e. metadata generation and formatting)

4. Accuracy (i.e. is the metadata representation correct?)

Step 7: With these four criteria in mind, take a few moments individually or in your group to

compare the metadata record your group created with the record the automatic metadata

generation tool created. For each criteria rate the metadata on a scale of 1-3 (1 = poor, 2 =

acceptable, 3 = excellent).

Table 6: Evaluation of manual/automatic

Metadata quality Manually created Automatically generated

Specificity of metadata (Is the

metadata appropriately

granular?)

Completeness (Is the record

complete?)

Consistency (Are fields

formatted properly? )

Accuracy (Is the generated

metadata correct?)

Total Score

Key Questions

Question 22. When you add the scores for each evaluation which process comes out on

top?

Metadata Standards and Web Services Page 16

Erik Mitchell

Page 17: Exercise overview - Erik Web viewClass 5 – Methods of description, representation and classification. Exercise overview. Last week we explored cataloging models and learned different

Question 23. Do you agree with the overall score? Why or why not?

Question 24. Were there areas where the manual or automatic processes were better?

Why do you think this is?

Summary

This week we began exploring metadata standards, services and evaluative approaches in more

detail. We found that there are metadata standards for different domains and communities and found

that the particular issues of metadata interoperability, transformation and quality control are

considerable. In the coming weeks we will continue working with our metadata models through the

encoding and transformation processes.

Metadata Standards and Web Services Page 17

Erik Mitchell