Download - Open Archives Initiative
![Page 1: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/1.jpg)
Open Archives Initiative
Where we are,
Where we are going
Carl Lagoze4th OAF WorkshopSeptember, 2003
![Page 2: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/2.jpg)
Where we are now
• De facto standard for Internet information exchange
• Deployed extensively and internationally– (digital) libraries– Museums– Eprint repositories– Research projects
![Page 3: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/3.jpg)
Protocol Stability
• OAI-PMH has been stable since release– No functional changes, just typographic edits– Validation of leadership/participation model
• No plans for a 3.0 release– Core protocol will not be extended– Minor 2.x release could occur (more later)– Additional implementation guidelines (more
later)
![Page 4: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/4.jpg)
NSDL and OAI-PMH
![Page 5: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/5.jpg)
The NSDL Context
• National STEM (Science, Technology, Engineering, Mathematics, Medicine) Digital Library
• Major National Science Foundation project targeted at the application of web and Internet to (STEM) education
• $25M over six years to over 100 projects– Collections– Services– Targeted Research– Core Integration
![Page 6: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/6.jpg)
• Aggregation rather than collection– Core integration team will not manage any collections
• Spectrum of interoperability– Accommodate diversity of participation models– Open interfaces and standards permitting plug in of
array of value-added services• One library many portals
– Accommodate multiple quality and selection metrics– Tailor presentation of content and nature of services
to audience needs
NSDL technical guidelines
![Page 7: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/7.jpg)
Level Agreements Example
Federation Strict use of standards AACR, MARC(syntax, semantic, Z 39.50and business)
Harvesting Digital libraries expose Open Archivesmetadata; simple metadata harvesting
protocol and registry
Gathering Digital libraries do not Web crawlerscooperate; services must and search enginesseek out information
Spectrum of interoperability
![Page 8: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/8.jpg)
• This is a big task that no one has done before!• Work on the priorities
– Focus on one point on spectrum of interoperability• Metadata harvesting• Incorporate NSF funded collections and selected other collections
– Leverage existing (or at least emerging) technologies and protocols• OAI, uPortal, Shibboleth, SDLIP, InQuery
– Provide reliable base level services• Search and Discovery, Access Management, User Profiles, Exemplary
Portals, Persistence• Plant some seeds for the future
– Machine-assisted metadata generation– Automated collection aggregation– Web gathering strategies
Translating to initial goals
![Page 9: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/9.jpg)
• Central storage of all metadata about all resources in the NSDL– Defines the extent of NSDL collection– Metadata includes collections, items, annotations, etc.
• MR main functions– Aggregation– Normalization– redistribution
• Ingest of metadata by various means– Harvesting, manual, automatic, cross-walking
• Open access to MR contents for service builders via OAI-PMH
Metadata Repository
![Page 10: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/10.jpg)
Metadata Strategy
• Collect and redistribute any native (XML) metadata format
• Provide crosswalks to Dublin Core from standard formats – DC-GEM, LTSC (IMS), ADL (SCORM),
MARC, FGCD, EAD
• Concentrate on collection-level metadata• Use automatic generation to augment
item-level metadata
![Page 11: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/11.jpg)
Importing metadata into the MR
Collections
Harvest
Staging area
Cleanup and
crosswalks
Database load
Metadata Repository
![Page 12: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/12.jpg)
Exporting metadata from the MR
NSDL services
Create OAI server tables
Metadata Repository
SQL queries OAI server Harvest NSDL services
Create OAI server tables
Metadata Repository
SQL queries OAI server Harvest
![Page 13: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/13.jpg)
NSDL and OAI-PMH Two years later
• Concepts are good, practice is hard
• Issues– Metadata is hard
• http://www.well.com/~doctorow/metacrap.htm
– XML is hard– Protocols are hard
• Static repositories (more later)
– IP is relevant (more later)
![Page 14: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/14.jpg)
Some Essential Metadata Questions
• Review original (DC) metadata assumptions– Metadata is essential for good resource
discovery– “Joe Sixpack” could create metadata
• Account for current realities– 2003 is not 1994– Google, etc. keeps getting better
![Page 15: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/15.jpg)
A u to m a tic Ind e xing
Con
text
ual I
nfor
mat
ion
(e.g
., w
eb li
nks)
M e tad a ta R ichne ss /Q ua lity/T ru s t
Metadata Space
![Page 16: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/16.jpg)
Metadata Triage
![Page 17: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/17.jpg)
Reconsidering the Dublin Core Requirement
• Questions about utility of unqualified DC– The conundrum….
• Specification too loose to serve intended interoperability goal
• But more complex metadata may be too hard
• Limited energy for interoperability– Data providers implement required DC at
expense of better metadata
• Use of protocol for purposes other than resource discovery
![Page 18: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/18.jpg)
Rethinking record-oriented model
B ase Web Graph
N SDL Selec tions
Desc riptive Metadata
A nnotations
B randing
Collec tion (Semantic )
P eople and Organizations
Equivalenc e
Implications for record-oriented harvesting????
![Page 19: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/19.jpg)
Topology Evolution
Repository
OAI-PMHServer
Repository
OAI-PMHServer
Repository
OAI-PMHServer
Repository
OAI-PMHServer
LinkingService
OAI-PMHHarvester
BrowseService
OAI-PMHHarvester
SearchService
OAI-PMHHarvester
Simple Data Provider, Service Provider Topology
![Page 20: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/20.jpg)
Topology Evolution (cont.)
O A I-P M HH arv es ter
OA
I-PMH
Server
Met
adat
aR
epos
itor
y
S ear c hS er v ic e
O A I-P M HH arv es ter
Br o w s eS er v ic e
O A I-P M HH arv es ter
C o llec tio n
O A I-P M HS erv er
C o llec tio n
O A I-P M HS erv er
C o llec tio n
O A I-P M HS erv er
C o llec tio n
O A I-P M HS erv er
Metadata Aggregator
![Page 21: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/21.jpg)
Topology Evolution (cont.)
OA
I-P
MH
Har
vest
erOA
I-PM
HS
erver
OA
I-P
MH
Har
vest
er
OA
I-PM
HS
erver
OA
I-PM
HS
erver
OA
I-PM
HS
erver
OA
I-P
MH
Har
vest
er
OA
I-PM
HS
erver
OAI-PMH p2p network
![Page 22: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/22.jpg)
OAI-P2pMH Issues
• Document (metadata) location– Exploit unique identifiers, use efficient key-based
location mechanisms (distributed hash tables)• Provenance-based queries
– Metadata records may go through refinement and/or translation phases as they move through value-added aggregators.
– Exploit provenance guidelines• Network harvesting
– Broadcast query (Gnutella) inefficient– Exploit techniques for efficient routing of queries (P-
trees)
![Page 23: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/23.jpg)
OAI-PMH and Intellectual Property
• Protocol exists in a context where information providers have concerns about use of intellectual property
• OAI-PMH is nominally about metadata, but…– Rich metadata is an intellectual product– The protocol can be used to transmit anything
(e.g. content) that can be encoded in XML– Generally metadata leads to content so….
![Page 24: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/24.jpg)
OAI-rights effort
• Goal is to investigate and develop means of expressing rights about metadata and resources in the OAI framework.
• The result will be an addition to the OAI implementation guidelines that specifies mechanisms for rights expressions within OAI-PMH. – No changes to core protocol
![Page 25: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/25.jpg)
OAI-rights Effort (cont.)
• Extensible, providing a general framework for expressing rights statements within OAI-PMH. – Not an effort to develop a new rights expression
language
• Use Creative Commons licenses as a motivating and deployable example.
• Release of specification by 2nd quarter ’04• Invited OAI-rights group
– Standard OAI development model
![Page 26: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/26.jpg)
Dimensions of OAI-PMH and rightsEntity Association
• Metadata: concern in NSDL for (re)use of rich metadata
• Content: predominant application of the protocol to resource discovery and ultimate access makes this important
![Page 27: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/27.jpg)
Dimensions of OAI-PMH and rights Aggregation Association
• OAI-PMH aggregations– Repository– Set– Item
• Rights association with an aggregation may provide shortcut (e.g., the rights for all resources in a repository/set…)
• Cost of shortcut is pseudo-statefulness, possibly complex overriding rules
![Page 28: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/28.jpg)
Dimensions of OAI-PMH and rightsBinding
• Choices– exploit mechanisms in metadata formats e.g., DC-
rights– restrict the rights statements to some more specific
protocol mechanism– allow some mixture of these methods.
• DC-rights problems– Semantics is restricted to rights about resource– Can’t embed XML in dc value– What if DC is not required
• Burden on harvesters if rights embedding is not explicit but scattered across several locations
![Page 29: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/29.jpg)
OAI-PMH Static Repositories
• Provide a lightweight mechanism for data provider participation
• Intended for relatively small and static collections
• Two components– Static Repository XML format
• Semantically equivalent to Identify and ListRecords• Invisible to harvester
– Static Repository Gateway• Virtual data provider for static repository data• Unique baseURL for each “contained” static repository
![Page 30: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/30.jpg)
Static Repositories andStatic Repository Gateway
S R 1
S R 2
S R 3
S ta ticR e p o s ito ry
G a te w a y
HT T P G E T
H a rv e s te r
h ttp :/ /s rg .o r g /s r1 /v er b = . . . .
h ttp :/ /s rg .o r g /s r2 /v er b = . . . .
h ttp :/ /s rg .o r g /s r3 /v er b = . . . .
![Page 31: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/31.jpg)
Static Repositories Open Issue
Relationship to RSS?????
![Page 32: Open Archives Initiative](https://reader035.vdocuments.site/reader035/viewer/2022081420/568148ad550346895db5c02d/html5/thumbnails/32.jpg)
Conclusions
• Interoperability and lowest common denominator• Rapid advances automated methods
– Moore’s law– Smart algorithms– Benefits of issues of scale
• Combining human effort and automated methods– Extracting order from chaos– Learning from order
• Move beyond resource discovery