metadata for digitization and preservation. introduction what is metadata and why it matters the key...

70
Metadata for Digitization and Preservation

Upload: sarah-roberts

Post on 28-Mar-2015

238 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Metadata for Digitization and Preservation

Page 2: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Introduction

What is metadata and why it mattersThe key elementsHow metadata is createdWhere metadata is storedMetadata standardsHow much will it cost?

Page 3: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

What is metadata?

Tony Gill – ARTstorMetadata refers to structured descriptions, stored as computer data, that attempt to describe the essential properties of other discrete computer data objects.

Big picture definition: the sum total of what can be said about any information object at any level of aggregation

Page 4: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

What is metadata for?

World Wide Web consortium say metadata is:

to provide a means to discover that the data set exists and how it might be obtained or accessedto document the content, quality, and features of a data set, indicating its fitness for use.

Therefore we need to think:content, context and structure

Page 5: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Why Does Metadata Matter?

“Doing research on the Web is like using a library assembled piecemeal by packrats and vandalized nightly.” – R. Ebert, Internet Life

Finding the needle in the haystackManaging 1000’s of identical looking needlesFinding visual materials without viewing themExpanding usePreserving content and context

Page 6: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Key Elements

Administrative Metadata – used in managing and administering information resourcesDescriptive Metadata – used to describe or identify information resourcesPreservation Metadata – related to the preservation management of information resourcesTechnical Metadata – related to how a system functions or metadata behaveUse Metadata – related to the level and type of use of information resources

Page 7: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Structure of metadata

CollectionCollection CollectionCollection

WorkWork WorkWork WorkWork

ItemItem

ItemItem

ItemItemItemItem

ItemItem ItemItem

ItemItem

Page 8: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

How metadata is created

By software toolsFrom resource content e.g. catalogues or databasesFrom creation tool e.g. digital camera or file header

By human interventionDescription by resource creator/ownerDescription by third party provider e.g. technical metadata

Creating and maintaining good metadata is time consuming and high cost

Page 9: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Where metadata is stored

Embedded in the resourceXIF information with TIFF images – viewable in PhotoshopFile headers or invisible copyright watermarking

Linked to resourceCreated as record in database format

Page 10: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is
Page 11: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Metadata Standards

Dublin Corehttp://vads.ahds.ac.uk/guides/creating_guide/sect43.html

DIG35 – for technical metadatawww.i3a.org/I_dig35.html

Categories for the Description of Works of Art (CDWA)www.getty.edu/research/institute/standards/cdwa/

Visual Resources Association Core Categorieswww.vraweb.org/

SEPIA working groupwww.knaw.nl/ecpa/sepia/workinggroups/wp5/cataloguing.html

Resource Description Framework (RDF)Encoded Archival Description (EAD)

Page 12: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

How much will it cost?

How long is a piece of string?Depends upon the stop pointsThere is no one-size-fits-all or one-cost frameworkDepends upon the description already in place and how well the collection is currently indexedInhouse measurement

Balance skill, time, and automationPhotographs – descriptive metadata will not take <5 minutes per photograph and usually not >30 minutes

Page 13: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Traditional Functions

Traditionally we applied these functions to:

Paper based and microform based information resources

Monographs, serials, photographs, etc.

Access provided through local library services

Including inter-library loan

Page 14: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

New Functions

Apply these functions to:

Web documents, online serials, digital images, digital collections, web sites, digital audio and video, born digital material, etc.

Access provided via the web and email

Page 15: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Why are these digital objects different?

Information explosionMultiple versionsInstant accessLess physical control over collectionSome are surrogatesIncreased user expectationsPreservation is more complex

Page 16: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Why do we need metadata to do these things?

Provides the necessary tools to manage, preserve and provide access to information in the digital environment

Our jobs have not fundamentally changed; but our collections have and our users have

Page 17: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

What is metadata?

Metadata is data that facilitates the management, description, and preservation of a digital object or aggregation of digital objects.

The creation of metadata is governed by a body of standards, best practices and schemas that, when appropriately applied, work together to facilitate the management, description, and preservation of digital objects.

Page 18: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Types of metadata

Descriptive TechnicalStructuralAdministrativePreservation

Page 19: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

About Metadata

SetsEncoding standards/schema

Metadata set = rulesEncoding schema = representation

Page 20: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Metadata Sets

AACR2Dublin CoreVisual Resources AssociationMetadata Object Descriptive SchemaText Encoding InitiativeEncoded Archival Description

Page 21: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding Standards/Schema

HTMLMARCMetadata Encoding Transmission Standards (METS)Resource Description Framework (RDF)XMLZ39.50

Page 22: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Choosing Sets and Schema: Interoperability

Why is interoperability important?How is it achieved?

Crosswalks/mappingStandardization

SchemaControlled vocabulary

Open Archives Initiative (OAI)Common elements harvested and made searchable from one interfaceVery basic level of description, working to develop it to make it better

Page 23: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Choosing an Encoding Schema

The more digitized objects you have; the more complex they are; the more data sharing you do; the more important it will be to utilize an encoding schema

XML is the most prevalent encoding schema

All metadata schema have XML based encoding schema already available

Page 24: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Factors in Metadata Decisions for Digitization Projects

AudienceWorkflow and TimelinesPreservationInteroperabilityNumber of and complexity of digitized objects

Page 25: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

What Do You Want To Do?

Digitize for access only?Descriptive Some administrative

Digitize for preservation?DescriptiveAdministrativeTechnicalEventually preservation

Page 26: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

What Materials Are You Digitizing?

The more complex the material, the more complex your metadata

Structural metadata becomes vital

For example….

Page 27: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Complex Digital Objects

Original = 150 page book with 7 chaptersDigitization results in 4 versions of the same content

150 master TIFF images150 JPEG access images150 JPEG thumbnail images7 ASCII text transcripts (one per chapter)

Files to manage = 457

Page 28: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Complex Digital Objects and Structure

Which images belong in which chapter?

Which digital version is which?

Where is chapter 3 in each version?

There is technical metadata for each digital version AND each digital file. How do we relate the correct metadata to the correct version/file?

Page 29: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Digitization and Metadata

Descriptive metadata for access and administration

Technical metadata for preservation

Structural metadata for control over complex digitized objects

Preservation metadata for management within a digital archive

Page 30: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Descriptive Metadata

Information users will have to gain access to the digitized material

Should facilitate access to the original source material whenever possible

Access via a web interface search engine

User friendly

Standardized

Well written

Page 31: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Common Descriptive Metadata Sets for Digitization Projects

Visual Resources Association

Metadata Object Descriptive Schema

Encoded Archival Description

Text Encoding Initiative

Dublin Core

MARC

Page 32: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Choosing a Set

Should we use MARC?Integrated into existing workRules for creation already existLess technical infrastructure necessaryComplex – more trainingTime consuming

Should we use something else?Collaborating? Interoperability concerns?Staff expertiseSize of projectExhibit and web access

Page 33: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Choosing a Schema

Can we use both?MARC for collection levelMetadata for item level

MARC for allCrosswalked to web accessible database

Database for allCrosswalked to MARC

Page 34: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Implementation

What informational elements do you need?List them, making sure to think through web design, audience and access issues

What descriptive schema schema will you use?

MARCDublin CoreVRAMODS

Page 35: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Implementation

Build database or implement content management system for metadata storage

Map the fields to the schema you have chosen

Document the mapping

Create Style Guide for your project

Staff creates the metadata manually according to Style Manual and established work processes

Metadata is reviewed for quality

Page 36: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Implementation

Metadata is stored and made web accessible

XML (if supported)

Back-ups, “master” metadata record, and/or web access

Page 37: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Dublin Core

Title Creator Subject /Keywords Description Publisher Contributor Date Audience

Resource Type FormatResource Identifier Source Language Relation Coverage Rights Management

Page 38: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Characteristics of the Dublin Core

All elements optional

All elements repeatable

All elements displayable in any order

Extensible (a starting place for richer description)

International

Page 39: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Extensibility

Refining mechanism for elements improve sharpness of description with qualifiers

Means for extending element setcomplementary packages of other types of metadata (administrative, rights management, discipline-specific, etc)

Page 40: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Technical Metadata

Information file that facilitates management and preservation of the file

Technical information about:

Master file (TIFF) Scanning specifications (resolution, bit depth, etc)

Derivative

Storage – compression

Page 41: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

NISO Metadata

Purpose: To define a standard set of metadata elements for digital images

Facilitate interoperability

Support long term management of and continuing access to digital images

Page 42: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Tagged Image File Format – Background and

Metadata

TIFF is a specification for a file format

Spec includes a “directory” or “header” section which consists of several metadata fields

A TIFF can consist of several images

Directory/Header information is unique for each image

Page 43: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Tagged Image File Format – Background and

Metadata

The TIFF spec is implemented differently by different applications

Scanning softwareUsually “bundled” with your scanner

Controls the scanner or camera and passes information to computer storage or image editing software

Outputs image files in specific image file formats

Determines what “flavor” TIFF is produced

Determines what metadata fields are utilized and how they are utilized

Page 44: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Tagged Image File Format – Background and

Metadata

Other software may add to the TIFF metadata, such as Photoshop

Tags can be added, using particular software

TiffKit (no longer supported)

Black Ice Software Development Kit

Captiva’s Input Accel

Others

Page 45: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Technical Metadata -- Options

Options?

Use as much as you can; create manually using database and/or XML based on the NISO draft and the LC encoding schema

or

Use DC: Format element

Page 46: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Using DC Format for Technical Metadata Elements

File sizeQuality (bit depth, resolution)Extent (pixel dimensions, play time, pagination)CompressionChecksum value (error detection)Object producer (name of scanning technician, vendor who scanned)Creation Hardware (digital camera, flatbed scanner,etc)Creation Software (name and version)

Page 47: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS

Metadata Encoding and Transmission Standard

Product of Making of America project

Digital Library Federation Initiative

Provides an XML schema for encoding metadata necessary for:

management of digital library objects

exchange of those objects (OAIS)

Brings all the metadata together

Page 48: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS

Five Sections of a METS document

Descriptive

Administrative

File Group

Structural Map

Behavior

Page 49: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS Five Sections

Descriptive Metadata

may point to descriptive metadata external to the METS document

MARC

may imbed the descriptive metadata within the METS document

Page 50: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS

<dmdSec ID="dmd002">     <mdWrap MIMETYPE="text/xml" MDTYPE="DC" LABEL="Dublin Core Metadata">         <dc:title>Alice's Adventures in Wonderland</dc:title>         <dc:creator>Lewis Carroll</dc:creator>         <dc:date>between 1872 and 1890</dc:date>         <dc:publisher>McCloughlin Brothers</dc:publisher>         <dc:type>text</dc:type>     </mdWrap> </dmdSec>

<dmdSec ID="dmd003">     <mdWrap MIMETYPE="application/marc" MDTYPE="MARC" LABEL="OPAC Record">          <binData>MDI0ODdjam0gIDIyMDA1ODkgYSA0NU0wMDAxMDA...(etc.)        </binData>     </mdWrap> </dmdSec>

Page 51: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS Five Sections

Administrative Metadata

information regarding file creation and stored

intellectual property

metadata regarding the original

information regarding provenance of the digital object (technical metadata)

may be external or internally encoded

Page 52: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS

<amdSec ID="AMD001">     <mdWrap MIMETYPE="text/xml" MDTYPE="NISOIMG" LABEL="NISO Img. Data">         <niso:MIMEtype>image/tiff</niso:MIMEtype>         <niso:Compression>LZW</niso:Compression>         <niso:PhotometricInterpretation>8</niso:PhotometricInterpretation>         <niso:Orientation>1</niso:Orientation>         <niso:ScanningAgency>NYU Press</niso:ScanningAgency>    </mdWrap> </amdSec>

<file ID="FILE001" ADMID="AMD001">     <FLocat LOCTYPE="URL">http://dlib.nyu.edu/press/testimg.tif</FLocat> </file>

Page 53: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS Five Sections

File Groups

used to group together related files

One file group lists all of the files which comprise a single electronic version of the digital library object

Master document (TIFF)

Access copy or copies

Perhaps a transcript

Page 54: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS

<fileGrp>     <fileGrp ID="VERS1">         <file ID="FILE001" MIMETYPE="application/xml" SIZE="257537"         CREATED="2001-06-10">             <FLocat LOCTYPE="URL">                 http://dlib.nyu.edu/tamwag/beame.xml

            </FLocat>         </file>     </fileGrp>

Page 55: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS

<fileGrp ID="VERS2">         <file ID="FILE002" MIMETYPE="audio/wav" SIZE="64232836"         CREATED="2001-05-17" GROUPID="AUDIO1">             <FLocat LOCTYPE="URL">                 http://dlib.nyu.edu/tamwag/beame.wav             </FLocat>         </file>     </fileGrp>

Page 56: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS

<fileGrp ID="VERS3" VERSDATE="2001-05-18">         <file ID="FILE003" MIMETYPE="audio/mpeg" SIZE="8238866"         CREATED="2001-05-18" GROUPID="AUDIO1">

            <FLocat LOCTYPE="URL">                 http://dlib.nyu.edu/tamwag/beame.mp3             </FLocat>         </file>     </fileGrp> </fileGrp>

Page 57: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS Five Sections

Structural Mapoutlines the intellectual structure of the content of the digital resource

Page 58: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS

<structMap TYPE="logical">   <div ID="div1" LABEL="Oral History: Mayor Abraham Beame" TYPE="oral history">       <div ID="div1.1" LABEL="Interviewer Introduction" ORDER="1">           <fptr FILEID="FILE001">               <area FILEID="FILE001" BEGIN="INTVWBG" END="INTVWND" BETYPE="IDREF" />           </fptr>

Page 59: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS

<fptr FILEID="FILE002">               <area FILEID="FILE002" BEGIN="00:00:00" END="00:01:47" BETYPE="TIME" />           </fptr>

<fptr FILEID="FILE003">               <area FILEID="FILE003" BEGIN="00:00:00" END="00:01:47" BETYPE="TIME" />           </fptr>       </div>

Page 60: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Encoding: METS Five Sections

Behaviorused to associate executable behaviors with contentdefines the behaviorscan contain executable code to run the behaviors

Page 61: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

METS: Encoding

<METS:behaviorSec ID="DISS1.0" STRUCTID="S1" BTYPE="uva-bdef-image-w:101" CREATED="2002-05-25T08:32:00" LABEL="Watermark Behaviors" GROUPID="DISS1" ADMID="AUDREC1" STATUS="A"> <METS:interfaceDef LABEL="Photo Watermark Behavior Definition" LOCTYPE="URN" xlink:href="uva-bdef-image-w:101"/> <METS:mechanism LABEL="Watermarking Behavior Mechanism for Images" LOCTYPE="URN" xlink:href="uva-bmech-image-w:112"/> </METS:behaviorSec>

Page 62: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Preservation Metadata

If you are digitizing with preservation in mind, ALL metadata is preservation oriented

Metadata must be of the highest quality that is possible

Incorporate the creation and management of metadata into your project at the planning stage

Page 63: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Preservation Metadata

Designed to facilitate the process of preservation and management in a digital repository

Generally implemented at the time a digital resource is moved to a digital archive

Several schemas under development for particular operating environments and/or programs

Page 64: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Preservation Metadata Sets

CEDARS – Consortium of University Research Libraries,

Exemplars in Digital Archives project

www.leeds.ac.uk/cedars/guideto/metadata/

NLA -- National Library of Australia

www.nla.gov.au/preserve/pmeta.html

NEDLIB – Networked European Deposit Library

www.kb.nl/coop/nedlib/results/D4.2/D4.2.htm

OCLC Digital Archive

www.oclc.org/digitalarchive/about/works/metadata/

Page 65: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Preservation Metadata

Inference that there is a core of metadata necessary for preservation regardless of the preservation strategy

More work needs to be done to identify the particular elements necessary for particular preservation strategies

Page 66: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Metadata Wrap up

New tools for new resources

Metadata schema = rules

Encoding schema = mark up and storage

Page 67: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Descriptive Metadata

Use an established metadata schema

Create a project style guide to facilitate standardized, high quality creation

Store in content management software or database to provide web access

Document the database design and map fields to DC (or other schema) within the documentation

Encode and back up using XML, if technically feasible

Page 68: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Technical and Structural

Use TIFFDocument scanning software used as TIFF has many different “flavors”

Use as much of the NISO draft standard as possible – watch for implementation developments, or…Use descriptive schema to collect technical informationStructural metadata ( METS) to manage numerous, complex digital objects, or…Documented file naming and structures

Page 69: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Planning

Plan for the costs associated with good metadataCreation and researchTechnical resources (staff, hardware, software, backups)

Get a team of appropriate people togetherIdentify goals, elements, and research appropriate schema and encodingStyle Guide for descriptive metadataCreate the highest quality, most thorough metadata possible in your situationDocument mappings

Page 70: Metadata for Digitization and Preservation. Introduction What is metadata and why it matters The key elements How metadata is created Where metadata is

Some Conclusions

Metadata is a work in progress at both the community level and the project level

Use standards

Technical metadata will be easier to implement in time

Structural metadata is vital for large projects with complex digital object

Preservation metadata isn’t standardized yet