markup and metadata session 5 lis 60639 implementation of digital libraries dr. yin zhang

40
Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Upload: garry-stevenson

Post on 26-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Markup and Metadata

Session 5

LIS 60639 Implementation of Digital Libraries

Dr. Yin Zhang

Page 2: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

2

1. Guidance for Building Good Digital Collections: Metadata

NISO (2007). A Framework of Guidance for Building Good Digital Collections: Metadata related sections.

Page 3: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Why do we need metadata for DLs?

• One of the most challenging aspects of the digital environment is the identification of resources available on the Web.

• The existence of searchable descriptive metadata increases the likelihood that digital content will be discovered and used.

• The description of individual objects and sets of objects helps to locate an object and collocate similar/related objects.

• Examples of metadata-based access tools: – library catalogs,

– archival finding aids,

– museum inventory control systems, and

– search utilities such as Google.

Page 4: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

What is metadata and who should be responsible for creating it?

• Metadata is structured information associated with an object for purposes of discovery, description, use, management, and preservation.

• Metadata creation is an incremental process that should be a shared responsibility among various parts of an institution. Different types of metadata can be added by different people at various stages of an information object’s life cycle.

– At the creation stage, metadata about an object’s authors, contributors, source, and intended audience could be provided by the original authors. Creators of digital objects should be encouraged to embed as much metadata as possible within the object before it is shared or distributed.

– At the organization stage, metadata about subjects, publishing history, and access rights could be recorded by catalogers or indexers.

– At the access and usage stage, evaluative information such as reviews and annotations could be added by the user.

Page 5: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Types of metadata

• Three basic types of metadata:

– Descriptive metadata helps users find and obtain objects, distinguish one object or group of objects from one another, and discover the subject or contents.

– Administrative metadata helps collection managers keep track of objects for such purposes as file management, rights management, and preservation.

– Structural metadata documents relationships within and among objects and enables users to navigate complex objects, such as the pages and chapters of a book.

Page 6: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Metadata standards

• Various metadata standards have been developed for describing different types of objects and for different purposes.

• A typology of data standards created by Anne Gilliland helps us understand how different metadata standards are related and work together.

Page 7: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

• Various metadata standards have been developed for describing different types of objects and for different purposes.

• A metadata schema provides a labeling, tagging or coding system used for recording cataloging information or structuring descriptive records. A metadata schema establishes and defines data elements and the rules governing the use of data elements to describe a resource. - http://gondolin.rutgers.edu/MIC/text/how/catalog_glossary.htm

See Table 4 for a list of DL metadata schemas

• This typology of data standards helps us understand how different metadata standards are related and work together.

• Depending upon the nature of DL collections, a single metadata scheme may not suffice for all needs. A combination of metadata schemes may be the best solution.

Typology of Data Standard created by Anne Gilliland

Page 8: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Metadata we should know from 60002

• Metadata schemas: – MARC

– Dublin core

• Data value standards: – LCSH

• Data content standards:– AACR2

• Data format/technical interexchange standards– MARC21

Page 9: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Choosing metadata standards

• The decisions about which metadata standard(s) to adopt and what levels of description to apply must be made within the context of

– the organization's purpose for creating the collection,

– the available human and technical resources,

– the users and intended usage, and

– approaches adopted within the particular field of inquiry or knowledge domain.

Page 10: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Metadata Principles

1. Good metadata conforms to community standards in a way that is appropriate to the materials in the collection, users of the collection, and current and potential future uses of the collection.

2. Good metadata supports interoperability.

3. Good metadata uses authority control and content standards to describe objects and collocate related objects.

4. Good metadata includes a clear statement of the conditions and terms of use for the digital object.

5. Good metadata supports the long-term curation and preservation of objects in collections.

6. Good metadata records are objects themselves and therefore should have the qualities of good objects, including authority, authenticity, archivability, persistence, and unique identification.

Page 11: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

11

Discussion and Reflection

• Issues raised in this reading

• How such issues are addressed in your DL case

Page 13: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

XML and metadata

• Given the large quantity of metadata, workflows, and knowledge in the development of bibliographic data such as MARC, an important question is how these legacy records and systems will be moved toward a more XML-centric metadata schema.

• In the mid-1990s, the Library of Congress made an important first step by offering an XML version of MARC in MARC21XML. MARC21XML was developed to be MARC, but also in XML. It represented a lossless XML format for MARC data, with many of the benefits of XML.

• MARCXML/MARCXML21 homepage: http://www.loc.gov/standards/marcxml/

Page 14: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

MARC21XML

Page 15: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Metadata Object Description Schema (MODS) metadata format

• MODS homepage: http://www.loc.gov/standards/mods/

• The Library of Congress recognized the need for a metadata schema that would be compatible with the legacy MARC data while providing a new way of representing and grouping bibliographic data.

• These efforts led to the development of the Metadata Object Description Schema (MODS) metadata format. MODS represents

– the next natural step in the evolution of MARC into XML.

– a much simpler alternative that retains its compatibility with MARC.

• Developed as a subset of the current MARC21 specification, MODS was created as a richer alternative to other metadata schemas like Dublin Core.

• Differences between MARC2IXML and MODS

– MARC2IXML faithfully transferred MARC structures into XML; the structure of MODS allowed for metadata elements to be regrouped and reorganized within a metadata record.

– MODS uses textual field labels rather than numeric fields in MARC21XML. This change allows MODS records to be more readable than traditional MARC or MARC2lXML records and promotes a design that allows for element descriptions that can be reused throughout the metadata schema.

Page 16: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang
Page 17: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

MODS applications

• A number of digital library efforts are looking at formalizing MODS support to either replace or augment Dublin Core only metadata systems. Likewise, groups like the Digital Library Federation have started recommending that organizations and software designers provide MODS-based OAI harvesting capability to allow for a higher level of metadata granularity.

• E-print systems like DSpace have looked at ways of utilizing MODS either as an internal storage format or as a supported OAI protocol

• Digital repositories like Fedora currently utilize a MODS-like metadata schema as the internal storage schema.

• Interest has grown in using MODS for ILS development due in large part to the open-source development work being done by the Evergreen open-source ILS built around MODS.

• MODS was developed as a subset of a number of larger ongoing metadata initiatives at the Library of Congress:

– It was developed, in part, as an extension format to the Metadata Encoding and Transmission Standard (METS) to provide a MARC-like bibliographic metadata component for METS-generated records.

– The MODS schema was tapped as one of the registered metadata formats for SRU/SRW (Search and Retrieve via URL/ Search and Retrieve via Web), the next-generation communication format designed as a replacement for Z39.50.

– While MODS was created to work as a stand-alone metadata format that could be used for original record creation, translating MARC data into XML, or facilitating the harvesting of library materials, it was also created as part of a larger ongoing strategy at the Library of Congress to create a set of more diverse, lightweight XML formats that could be used with the library community's current legacy data.

Page 18: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

METS (Metadata Encoding AndTransmission Standard)

• METS page: http://www.loc.gov/mets

• METS is not a metadata format utilized for bibliographic description of objects.

• METS acts as a container object for the many pieces of metadata needed to describe a single digital object:

– the individual who submitted the digital object may only be responsible for adding information to the bibliographic metadata,

– the digital repository itself is generating metadata related to the structural information of the digital object, that is, assembling information about the files that make up the entire digital object (metadata, attached items, etc.).

– METS provides a method for binding these objects together so that they can be transferred to other systems or utilized within the local digital repository system as part of a larger application profile.

Page 19: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

A very basic METS document utilized at Oregon State University for archiving structural information about digitized text in DSpace

Page 20: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Free metadata

• The library community has historically promoted the idea that metadata should be freely accessible between systems.

• The Z39.50 protocol represented one such manifestation of this belief.

• Over the last decade, many information providers have "freed" their metadata, embracing the mashup concept either as a business model or as a user service.

• Large information providers like search engines (Google, Yahoo!, MSN) or social networking services like Flickr and del.icio.us have moved to offer API (application programming interface) to provide easier remote-system access and encourage use of their services from remote users.

Page 21: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Sharing metadata

• Nearly all digital repository platforms provide some method of sharing metadata within the larger user community.

• Tools, methods, and protocols to crosswalk harvested content to different metadata schemas:– XSLT (eXtensible Stylesheet Transformation) is commonly used

in XML metadata crosswalking. XSLT is a W3C technology designed to work with XSL (eXtensible Stylesheet Language), a style-sheet language for XML. XSLT offers a simple method for transforming an XML document to other formats.

– OAI-PMH (Open Archives Imitative Protocol For Metadata Harvesting) makes metadata available for harvest.

Page 22: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Challenges for metadata crosswalking

• Metadata consistencyThe crosswalking process must assume that metadata in one format has been consistently created in order to develop rules/algorithms about how that information should be represented in other metadata formats.

• Schema granularityVery rarely does crosswalking occur between two metadata schemas that share the same level of granularity, for example:

Dublin Core: Creator MARC21: 100, 110, 111, 700, 710, 711, 720

• The “spare parts”Because metadata crosswalking is rarely a lossless process, one often has to decide what information is "lost" during the crosswalking process. “Spare parts” - the unmappable data that cannot be carried through the crosswalk

• Dealing with localismsLocalisms-data: the metadata to enable data to sort or display a specific way within a local system.

Page 23: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

23

Discussion and Reflection

• Issues raised in this reading

• How such issues are addressed in your DL case

Page 24: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

24

3. Markup and Metadata

Witten & Bainbridge (2003). Ch. 5 Markup and metadata: Elements of organization

Page 25: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Markup and metadata

• If documents are the digital library's basic building blocks, markup and metadata are its basic elements of organization.

– Markup is used to specify the structure of individual documents and control how they look when presented to the user.

– Metadata is used to expedite access to relevant parts of the collection through searching and browsing.

• Markup controls two complementary aspects of an electronic document: structure and appearance:

– Structural markup makes certain aspects of the-document structure explicit: typically section divisions, headings, subsection structure enumerated and bulleted lists…. These structural items can be considered metadata for the document.

– Appearance is controlled by presentation or formatting markup which dictates how the document appears typographically: page size, page headers and footers, fonts, line spacing, how section headers look, where figures appear, and so on.

– Structure and appearance are related by the design of the document, that is, a catalog-often called a style sheet - of how each structrual item should be presented.

Page 26: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Markup languages and styles

• HTML – HyperText Markup Language

• XML - Extensible Markup Language

• Presenting marked-up documents

– Cascading style sheets: CSS for HTML

– Extensible stylesheet language: XSL for XML

Page 27: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Extracting metadala

• Automatic extraction of information from text - text mining, is a hot research topic.

• Plain text documents are designed for people. Readers extract information by understanding their content. Fully automatic comprehension of arbitrary documents using computers and programs remains a challenge.

• Structured markup languages such as XML help make key aspects of documents accessible to computers and people alike.

• Fortunately, it is often unnecessary to understand a document in order to extract useful metadata from it.

Page 28: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

28

Discussion and Reflection

• Summary:

• Issues raised in this reading

• How such issues are addressed in your DL case

Page 29: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

29

4. Trends in Metadata Practices

Carole L. Palmer, Oksana L. Zavalina, & Megan Mustafoff (2007). Trends in metadata practices: A longitudinal study of collection federation. In Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries (pp. 386-395.). New York: ACM.

Page 30: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Background

• With the increasing focus on interoperability for distributed digital content, resource developers need to take into consideration how they will contribute to large federated collections, potentially at the national and international level.

• At the same time, their primary objectives are usually to meet the needs of their own institutions and user communities.

• This tension between local practices and needs and the more global potential of digital collections has been an object of study for the IMLS Digital Collections and Content (IMLS DCC) project.

Page 31: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Aim and approach

• Our practical aim has been to provide integrated access to over 160 IMLS-funded digital collections through a centralized collection registry and metadata repository (hereafter referred to as the “IMLS DCC”) based on the Open Archives Initiative Metadata Harvesting Protocol (OAI-PMH).

• During the course of development, the research team has investigated how collections and items can best be represented to meet the needs of local resource developers and aggregators of distributed content, as well as the diverse user communities they may serve.

• Research methods:

Surveys, interviews, and case studies. Additional data were collected to investigate item and collection description and subject access issues through content analysis, focus groups, and usability studies.

Page 32: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang
Page 33: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang
Page 34: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang
Page 35: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Locally Developed Schemes

• Whether used as single or with multiple schemes, 29% of projects applied locally developed schemes in 2003 (n=94) and 2006 (n=59).

• Projects chose to apply a local scheme for a number of reasons:– customization was needed to capture information unique to the materials,

information already recorded in a database or some other local information source was to be imported, or existing standards did not allow projects to adhere to their goals.

– All 100% (n=17) of the projects using locally developed schemes, indicated access as the primary purpose of their project in their grant proposals, while only 56.9% of other projects (n=51) listed access as a primary purpose of their grant.

Page 36: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Schemes for New Content and Mapping

• Schemes for New Content – There are some significant differences with respect to scheme use

between projects that have added new types of data and those that have not.

– Use of Dublin Core was more frequent than MARC for projects adding new content.

• Mapping– 63.4% (n=56) of projects have mapped their metadata. – Dublin Core was mapped to most often with 63% (n=35) of projects

mapping to Dublin Core.– 26% (n=35) of projects have mapped to MARC– “Other Standards” and MODS were the next highest at 14% and 12%

(n=35), respectively. – Overall, 41% (n=35) of projects that have done mapping have mapped

to multiple schemes.

Page 37: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Decision factors on choosing metadata

• Choice of metadata scheme(s) was influenced by the following factors: – the overall degree to which a standard had been adopted by peer

institutions was an important consideration

– the compatibility with local systems

– Content management system software also influenced text encoding decisions

– Knowledge and skillMany library-based digital collection developers chose MARC because it allowed for more granularity in description than Dublin Core while also being the easiest to implement since their staff were already proficient using MARC

Page 38: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Problems

• The three most commonly reported problems with description were:

– consistent application of the chosen metadata scheme within a project,

– identification and application of controlled vocabularies, and

– integration of sets of data, schemes, and vocabularies either within an institution or among collaborators.

• In addition, there were clear tensions between local practices and what was perceived as the best for interoperability.

• One project that began with Dublin Core decided against using it part way into the grant, favoring MARC and TEI for representing the texts in their collection. Later they ended up mapping their metadata back to Dublin Core for OAI interoperability.

Page 39: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

Problems and solutions

• Some of the unique content in digital collections in the IMLS DCC cannot be adequately described by existing metadata schemes resource developers often look for examples of how similar content has been described by other projects

• Local, home-grown metadata schemes often are developed when no suitable standard can be identified Folksonomies and social tagging collected from the end-user community were named as one of the term sources for such home-grown schemes

• Smaller digital collections that do not have resources to develop local schemes sometimes end up compromising the richness of description to implement Dublin Core.

• The unstable standards environment has made it difficult to advance without shifts, reconsiderations, and adaptations in original metadata plans to support interoperability and shareability of metadata.

Page 40: Markup and Metadata Session 5 LIS 60639 Implementation of Digital Libraries Dr. Yin Zhang

40

Discussion and Reflection

• Issues raised in this reading

• How such issues are addressed in your DL case