cornell cs 502 metadata for the web issues and simple answers cs 502 – 20030219 carl lagoze –...

27
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Upload: lora-claribel-dalton

Post on 19-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Metadata for the WebIssues and Simple Answers

CS 502 – 20030219Carl Lagoze – Cornell University

Page 2: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

“Metadata is data about data”

Page 3: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Metadata is semi-structured data conforming to commonlyagreed upon models, providing operational interoperability

in a heterogeneous environment

Page 4: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Some untested hypotheses

• Metadata is useful for…– People– Machines

• More metadata is better• (semi) automated digital libraries and simple

metadata

Page 5: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Some known facts

• Number and variety of metadata vocabularies will continue to increase

• The Tower of Babel is a franchise– There is not one common view of reality

• “The one thing I know about metadata is that it is expensive” (Bill Arms)

• “I hate metadata projects because they make every other digital library project more expensive” (Michael Lesk)

Page 6: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Are metadata and data distinguishable?

• Objectivity?• Intellectual property?• Structure?• Aboutness?

Page 7: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

The fiction of classification

…there is no classification of the universe that is not fictional and

conjectural.

Jorge Luis Borges

Page 8: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Lenses and Views

• All classification does and should provide a biased lens or view of reality

• Each view emphasizes certain characteristics and hides others

GeospatialRights

Museum

Page 9: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Reality is Complex

Created by:George Castaldo

Created on:1994

Created by:Leonardo da Vinci

Created on:1506

Relationship?

Page 10: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Objects are Related

IFLA Entity Model

Page 11: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Entities, Events, and Agents

Photographer

Camera type Software

Computer artist

Page 12: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Haven’t we done metadata already?

Page 13: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

What’s wrong with this model?

• Expensive– Complex (even for its original goal?) – Professional intervention (assumes single community

of expertise)

• Monolithic– One size fits all approach– Reflects its centralized system origins

• Bias towards physical artifacts– Fixed resources– Incomplete handling of resource evolution and other

resource relationships

• Anglo-centric

Page 14: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Web Challenge to Traditional Cataloging

• Scale

• Permanence

• Authenticity

• Organizational Context

• Custodial Control

• Variety

Page 15: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Internet Commons includes Multiple Communities

ScientificData

HomePages Geo

InternetCommons

Library

Museums

Commerce

Whatever...

Page 16: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Metadata Takes Many Forms

resourcediscovery

documentadministration

rightsmanagement

contentrating

security andauthentication

archivalstatus

products andservices

databaseschemas

process controlor description

Page 17: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Metadata Challenges

• Accommodate multiple varieties of metadata– community-specific functionality, creation,

administration, access

• Tensions– functionality and simplicity – extensibility and interoperability– human and machine creation and use

Page 18: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Interoperability has many facets

• Semantics– Meaning/classification/ontology

• Models/Structure– Entities and relationships

• Syntax– grammars to convey semantics and structure

Page 19: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Warwick Framework: Containing Chaos

• Conceptual Architecture for metadata from the Warwick Metadata Workshop (DC-2)

• Conceptual architecture to support the specification, collection, encoding, and exchange of modular metadata

• Provide context for metadata efforts (including Dublin Core)– avoids the “black-hole” of comprehensive element

sets– focuses interoperability issues at package level

Page 20: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Metadata Container

Container

Package

Dublin Core

Package

MARC record

Package

Indirect Reference

Package

Terms and Conditions

URI

Page 21: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Modularization Allows Distributed Management

• Communities of expertise (not software vendors) are responsible for:– Semantics– Registration– Administration– Access management– Authority of data– Sharing and Distribution

Page 22: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Realities of Web search and discovery

• Search systems are motivated by advertising• Index coverage is unpredictable and limited• Too much recall, too little precision• Index spam abounds• Resources (and their names) are volatile

Page 23: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Metadata: Part of a Solution

• Structured data about data– helps to impose order on chaos– enables automated discovery/manipulation

• Variety across various dimensions:– specialization– decentralization– democratization

Page 24: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Web Metadata Models:Drill-Down Searching Paradigm

• Moving along a specificity spectrum• Inter-domain vs. intra-domain terms, models,

query mechanisms• One size doesn't fit all

– Cognitive models of searching and browsing

Page 25: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Drill-down search paradigm

DomainIndependent

view

DomainSpecific

View

Page 26: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Metadata:Part of the problem

cost

functionality

AACR2/MARC

googleDublin Core

Page 27: Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – 20030219 Carl Lagoze – Cornell University

Cornell CS 502

Why hasn’t metadata worked on the Web?

• Its all about trust• People are lazy• Metadata is hard• No perceived benefit

– “Reverse tragedy of the commons”

• No agreement on one way to describe things

• “Metacrap” - http://www.well.com/~doctorow/metacrap.htm