© tefko saracevic, rutgers university1 metadata considerations for digital libraries
Post on 20-Dec-2015
238 views
TRANSCRIPT
© Tefko Saracevic, Rutgers University 1
metadata
considerations for digital libraries
© Tefko Saracevic, Rutgers University 2
the Web
• fastest growing technology in history• explosive growth of WWW provided
– ubiquity of information and access– but also information chaos & anarchy
•growing difficulty in identifying, searching & retrieving
•‘lost in an ocean’ metaphors
© Tefko Saracevic, Rutgers University 3
problem
• to organize & search the Web needed: knowledge about the structure of data– but Web data & databases fuzzy– structures vary widely; no consistency – constantly evolve over time– lack of agreement about meaning of even
simple terms & concepts in structure
© Tefko Saracevic, Rutgers University 4
solution• some standardized description or
language to increase functionality– a mechanism for a more precise
description of things on the Web
• going from machine-readable to machine-understandable– missing in original Web architecture
METADATA METADATA !!
© Tefko Saracevic, Rutgers University 5
metadata
© Tefko Saracevic, Rutgers University 6
what?• metadata: ‘data about data’
– machine understandable information for the Web - emphasis on machine
– description of what a text (or any object) part is all about•e.g. labeling title, author, source …
• many evolving standards suggested to be applied in various domains
© Tefko Saracevic, Rutgers University 7
where?
• in volatile digital environments– metadata describe electronic
resources, texts & multimedia– metadata exist or have meaning
only in relation to the referenced document or object•provide information about the object
© Tefko Saracevic, Rutgers University 8
why?
• to standardize description of what is what in electronic resources in order
•to aid in identification, organization, & location of a great variety
•to enable effective search of variety of objects (documents) distributed all over
•sometimes also to provide controls (e.g. validation, rights, provenance, ratings ...)
© Tefko Saracevic, Rutgers University 9
importance
• standard metadata descriptions are a prerequisite to – common use– effective searching– ‘intelligent’ roaming by agents– validation, ratings,
© Tefko Saracevic, Rutgers University 10
markup languages• SGML - granddaddy (standard in 1986)
– marks elements within documents•derived from old markups for typesetting•adapted by communities producing
electronic documents•machine independent - reason for success
– transportable from one hardware & software to another; substitutes strings
•many extensions & specific applications
© Tefko Saracevic, Rutgers University 11
principles• ALL markup language must specify
•what markup means•what markup is allowed•what markup is required•how markup is distinguished from text
• all markup languages & applications follow these principles
•underlying concepts are fairly simple but they get very confusing real fast.
© Tefko Saracevic, Rutgers University 12
specifications• types of documents defined by
DTD Document Type Definitions – many types & applications
formulated •vary greatly in complexity and use
• RDF - Resource Description Framework
– a common syntax, data model & scheme for describing
© Tefko Saracevic, Rutgers University 13
extensions• HTML - most famous & successful
– allows for metatags in the Head•not used much, even discouraged•in the body could be indirect
• XML - the next big thing (hopefully)•data format for structured document
interchange & interoperability on WWW•increases functionality of SGML &
combines with ease of use of HTML
© Tefko Saracevic, Rutgers University 14
who specifies standards?
• formal groups– national & international standards
organizations - ISO, ANSI, NISO
• informal groups– WWW Consortium (W3C)– Dublin Core– Library of Congress
© Tefko Saracevic, Rutgers University 15
proliferation
• currently: proliferation of metadata standards activities -many domains– a lot of confusion & incompatibility– in document description & libraries
•coordination through liaisons & a number of projects in the U.S & internatioanly
– strength: domain experts involvement– weakness: limited perspective; re-invention
© Tefko Saracevic, Rutgers University 16
libraries
• in libraries metadata has a very long tradition long preceding the Web (but not called metadata)
– cataloging rules, standards
• MARC (Machine Readable Cataloging)•enabled worldwide exchange of cataloging
records•but long standing problems with searching
© Tefko Saracevic, Rutgers University 17
sample of projects• Encoded Archival Description (EAD)
• Text Encoding Initiative (TEI)• Federal Geographic Data
Committee (FGDC) - geospacial data
• Z39.50 standards - searching• crosswalks: mapping e.g. DC to MARC
© Tefko Saracevic, Rutgers University 18
Dublin Core (DC)• international initiative to describe
a core set of Web resources – a set of 15 elements
Title; Creator; Subject; Description; Publisher; Contributor; Date; Type; Format; Identifier; Source; Language; Relation; Coverage; Rights
• wide interest & a lot of work but not widely applied on the Web
© Tefko Saracevic, Rutgers University 19
library interoperability
• library catalogs bound by proprietary software & hardware
• middleware needed– protocols (based on Z39.50) provide
for interaction of clients with many servers (catalogs)
• problems remain with semantic interoperability
© Tefko Saracevic, Rutgers University 20
digitization
• metadata assignment (cataloging) a key component in digitization or electronic publishing
• choices: a spectrum of possibilities to select & apply metadata
• search for automation - e.g. templates
• connection with cataloging, indexing
© Tefko Saracevic, Rutgers University 21
decisions, decision
– how & what to plan for metadata creation in conjunction with dl?
– target audience?– scope and depth? – what to adopt? plug-in in a scheme?– how to integrate metadata projects?– needed skills? training? staffing?
© Tefko Saracevic, Rutgers University 22
$$$$• costs of metadata: HUGE
– involved operations– time, personnel, effort– learning many new things included– making decisions complex &
involved
• cooperative activities essential• libraries pushed out of libraries
© Tefko Saracevic, Rutgers University 23