breaking down the walls moving libraries from collectors to portals carl lagoze cornell university...

45
Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University [email protected]. edu

Upload: cynthia-brooks

Post on 29-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Breaking down the walls

Moving libraries from collectors to portals

Carl LagozeCornell [email protected]

Page 2: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

The Library should selectively adopt the portal model for targeted program areas. By creating links from the Library’s Web site, this approach would make available the ever-increasing body of research materials distributed across the Internet. The Library would be responsible for carefully selecting and arranging for access to licensed commercial resources for its users, but it would not house local copies of materials or assume responsibility for long-term preservation.

LC21: Digital Strategy for the Library of Congresspage 5

Page 3: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

LC21: Digital Strategy for the Library of Congresspage 5

Page 4: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Some of the most fundamental aspects of library operations entail the existence of a border, across which objects of information are transferred and

maintained. Such a parameter, demarcating a single, distributed digital library (the "control zone"), needs to

be created and managed by the academic library community at the earliest opportunity.

Ross AtkinsonLibrary Quarterly, 1996

Towards a Virtual Control Zone

Page 5: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Why distributed collections?

• Scale of the Web• Prevalence of new publishing

models and agents• Increasing complexity of licensing

and access management• Dynamic nature of content

Page 6: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Towards Hybrid Portals

• Traditional portal (e.g., Yahoo!)– linkage without responsibility

• Hybrid Portal– assertion of (some semblance) of

curatorial role over linked objects

Page 7: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

New models have cultural/organizational

ramifications…• Performance and ranking metrics –

"bigger is better"• Levels of confidence• Trust

Page 8: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

…that can be assisted by new technical foundations

• Digital object architectures– that enable aggregating and customizing

content for local access and management

• Metadata frameworks– that model changes of objects and their

management over time

• OAI Harvesting Protocol– for exchange of structured information

• Preservation models– that enable non-cooperative and cooperative

offsite monitoring

Page 9: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Digital Object Architectures:

aggregating & localizing distributed content

Acknowledgements:– Naomi Dushay– Sandy Payette– Thorton Staples (U. Va.)– Ross Wayland (U. Va.)

Page 10: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

From Mediators to Value-Added Surrogates

• Wiederhold – mediators between raw data and end-user applications for integration and transformation

• Paepcke – mediators as foundation for digital library interoperability

• Payette and Lagoze – mediators (V-A surrogates) to aggregate and create a localized service layer for distributed resources

Page 11: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

FEDORA Digital Object Model

Page 12: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Establishing a Virtual Control Zone

Page 13: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

V-A Surrogate Applications

• Access management– Shared responsibility among trusted

partners

• Enhanced and customized functionality– Examples: reference linking, format

translation, special needs

• Preservation– Monitoring "significant" events and acting

on them

Page 14: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

ContextBroker

A

DigitalObject A

StructuralCharacteristics

Realaudio video

Powerpoint presentation

SMIL synchronization metadata

Tool

Tool

DigitalObject A:• View Slides• View Video• View synchronized presentation using applet

Tool

Tool

ContextBroker

B

DigitalObject A:• Get Transcript of Audio• Search for keyword• Get Slides translated to French

Page 15: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Where we are now…

• Ongoing FEDORA reference prototype– http://www.cs.cornell.edu/cdlrg/FEDORA.html– Policy enforcement research– Content mediation

• Proposed joint deployment with University of Virginia– Open source scalable implementation of

FEDORA architecture– Testing and deployment with a number of

research library partners.

Page 16: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Event-Aware Metadata Frameworks:

describing changes over time

• Acknowledgements:– Dan Brickley (ILRT, Bristol)– Martin Doer (FORTH, Crete)– Jane Hunter (DSTC, Brisbane)

Page 17: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Distributed ContentThe Metadata Challenge

• From fixed, contained physical artifacts to fluid, distributed digital objects

• Need for basis of trust and authenticity in network environment

• Decentralization and specialization of resource description and need for mapping formalisms

Page 18: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Multi-entity nature of object description

Photographer

Camera type Software

Computer artist

Page 19: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Attribute/Value approaches to metadata…

Hamlet has a creator Shakespeare

subject implied verb metadata noun literal

Play

wrig

ht

metadata adjective

The playwright of Hamlet was Shakespeare

R1

“Shakespeare”

“Hamlet”

dc:creator.playwright

dc:title

Page 20: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

…run into problems for richer descriptions…

Hamlet has a creator Stratford

birt

hpla

ce

The playwright of Hamlet was Shakespeare,who was born in Stratford

“Stratford”R1

“Shakespeare”dc:creator.playwright

dc:creator.birthplace

Page 21: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

…because of their failure to model entity

distinctions

R1

“Stratford”

creatorR2

name “Shakespeare”

birthplacetitle

“Hamlet”

Page 22: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

ABC/Harmony Event-aware metadata model

• Recognizing inherent lifecycle aspects of description (esp. of digital content)

• Modeling incorporates time (events and situations) as first-class objects– Supplies clear attachment points for

agents, roles, occurrent properties• Resource description as a “story-

telling” activity

Page 23: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Resource-centric Metadata

Title Anna Karenina

Author Leo Tolstoy

Illustrator Orest Vereisky

Translator Margaret Wettlin

Date Created 1877

Date Translated 1978

DescriptionAdultery & Depression

Birthplace Moscow

Birthdate 1828

?

Page 24: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

“translator”

“Margaret Wettlin”“Orest Vereisky”

“illustrator”

“Anna Karenina”

“Tragic adultery andthe search for meaningfullove”

“English”

“author”

“creation”

“1877”“1978”

“translation”

“Russian”

“Leo Tolstoy”"Moscow"

“1828”

Page 25: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Queries over descriptive graphs

List details of events where Lagoze is a participating agent

SELECT ?title, ?type, ?time, ?place, ?name FROM http://ilrt.org/discovery/harmony/oai.rdf WHERE (web::type ?event abc::Event) (abc::context ?event ?context) ….. AND ?name ~ lagoze

USING web FOR http://www.w3.org/1999/02/22-rdf-syntax-ns#

Rudolf Squish – http://swordfish.rdfweb.org/rdfquery

Page 26: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Where we are now

• Stabilization of model• Collaboration with museum/CIDOC

community for joint modeling principles

• Plans– RDF api for model elements– UI for metadata creation– Query engine testing

Page 27: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Open Archives Initiative:facilitating exchange of structured information

• Acknowledgements:– Herbert Van de Sompel– OAI Steering and Technical

Committees

Page 28: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Open Archives Initiative

• Testing the hypotheses– exposing metadata in various forms

will facilitate creation of value-added services

– key to deployable DL infrastructure is low-entry cost

– Individual communities can/will customize common infrastructure

Page 29: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Where we’ve come from

• Late 1999 Santa Fe UPS meeting – increase impact of eprint initiatives through federation

• Santa Fe Convention – metadata harvesting among eprint archives

• Increasing interest outside the eprint community– Research libraries– Museums– Publishers

Page 30: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Progress over the past year

• OAI workshops at US and EC DL conferences

• Organizational stability– Executive committee and steering committee

• September 2000 technical meeting– Reframe and rethink technical solutions for

broader domain

• Extensive testing and refinement of technical infrastructure

Page 31: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Technical Infrastructure – key technical features

• Deploy now technology – 80/20 rule• Two-party model – providers and

consumers• Simple HTTP encoding• XML schema for some degree of protocol

conformance• Extensibility

– Multiple item-level metadata– Collection level metadata

Page 32: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

OAI protocol requests

Supporting protocol requests:• Identify• ListMetadataFormats• ListSets

Harvesting protocol requests:• ListRecords• ListIdentifiers• GetRecord

repos i tory

harves ter

service provider data provider

Page 33: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Where we are now• “Stable” 1.0 protocol specification • Hopefully, self-documenting infrastructure

– http://www.openarchives.org

• 27 registered data providers• Increasing number of tools available• Research initiatives

– NSF-funded NSDL– EC-funded Cyclades– Andrew W. Mellon service proposals– EC-funded community building

Page 34: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Where do we go from here

• Controlling the stampede• Maintaining the organizational model – lean and

mean while encouraging community-specific exploitation

• Encouraging testing especially through deployment and especially service development

• Encouraging metadata diversification – this isn’t just above Dublin Core!!!– Preservation– Document access– Authentication

Page 35: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

OAI & Metadata Research

• Dictionary of metadata terms (Tom Baker)• Mandating usage rules has only limited

effectiveness• Compiling usage of those terms is vital to

machine understanding and interoperability– Provide context heuristics for search engine and

indexer processing

• Large-scale deployment of OAI and web crawling enables (partial) automation of usage compilation (e.g., data mining of term usage)

Page 36: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Preservation Models:monitoring threats to distributed content

• Acknowledgements:– Bill Arms– Peter Botticelli (CUL)– Anne Kenney (CUL)

Page 37: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Preservation & Remote Control

• Organization Issues– “assured preservation” may not be possible

without direct custodial control.– what are the levels of acceptability and for

which types of resources?

• Technical Issues– what are the technologies for remote control at

the various levels of assurance deemed acceptable by the library?

– what is the probability of a reasonable level of preservation in the context of such technologies?

Page 38: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Cost vs. Functionalityco

st

f u n c tio n a lity

O p en Ar c h iv a lI n f o r m atio n S y s tem

( O AI S )

Page 39: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Leveraging Current Work

• Event-based metadata• Metadata harvesting• Longevity and threats to digital

resources

Page 40: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Level 0 Experiment

W e b S ite W e b S ite W e b S ite

Se le c t ive W e b C rawling

E ve ntR e c ords

P1 A1

P2 A2

P3 A3

P o lic y E n f o r c er

acti

ons

Page 41: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Level 1 Experiment

W e b S ite W e b S i teO AI D ataP ro vide r

Se le c t ive W e b C rawling

E ve ntR e c ords

P1 A1

P2 A2

P3 A3

P o lic y E n f o r c er

acti

ons

O AI D ataP ro vide r

P r e se r v a t io n M e t a da t aP r e se r v a t io n M e t a da t a

O AI P ro to c o l R e que s t

Page 42: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

One of Six Core One of Six Core Integration Integration

Demonstration Projects Demonstration Projects for the NSDLfor the NSDL

Page 43: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

How Big might the NSDL be?

The NSDL aims to be comprehensive -- all branches of science, all levels of education, very broadly defined.

Five year targets:

1,000,000 different users

10,000,000 digital objects

100,000 independent sites

Requires: low-cost, scalable, technology automated collection building and maintenance

Page 44: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Levels of Interoperability:Metadata Harvesting

Agreements on simple protocol and metadata standard(s)

Example:

Metadata harvesting protocol of the Open Archives Initiative (MHP)

• Moderate-quality services

• Low cost of entry to participating sites

Moderately large numbers of loosely collaborating sites

Promising but still an emerging approach

Page 45: Breaking down the walls Moving libraries from collectors to portals Carl Lagoze Cornell University lagoze@cs.cornell.edu

Levels of Interoperability:Gathering

Robots gather collections automatically with no participation from individual sites

Examples:

Web search services (e.g., Google)

CiteSeer (a.k.a. ResearchIndex)

• Restricted but useful services

• Zero cost of entry to gathered sites

Very large numbers of independent sites

Only suitable for open access collections