lecture 14 interoperability for information discovery
DESCRIPTION
Lecture 14 Interoperability for information discovery. CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel [email protected]. Acknowledgements: Carl Lagoze. Why interoperability?. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/1.jpg)
1 herbert van de sompel
Lecture 14 Interoperability for information discovery
CS 502 Computing Methods for Digital Libraries
Cornell University – Computer ScienceHerbert Van de [email protected]
Acknowledgements: Carl Lagoze
![Page 2: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/2.jpg)
2 herbert van de sompel
Why interoperability?
The distributed information environment creates enormous challenges regarding the provision of coherent services. Addressing these challenges requires some form of interoperability.
![Page 3: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/3.jpg)
3 herbert van de sompel
OPAC FTXT
FTXTe-printA&I
A&I
distributed
herbert van de sompel
Information resources: distributed
![Page 4: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/4.jpg)
4 herbert van de sompel
OPAC FTXT
FTXTe-printA&I
A&I
range of authorities, technologies
herbert van de sompel
Information resources: different authorities, technologies
![Page 5: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/5.jpg)
5 herbert van de sompel
OPAC FTXT
FTXTe-printA&I
A&I
¡¡ challenges re integrated access !!
herbert van de sompel
Information resources: challenges re coherent services
![Page 6: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/6.jpg)
6 herbert van de sompel
Interoperability helps!
• UNICODE• XML• XML Schema, XML Namespaces• HTTP• Metadata formats (MARC, Dublin Core, …)
• This lecture: interoperability for resource discovery (searching)• Next week: interoperability for reference linking
![Page 7: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/7.jpg)
7 herbert van de sompel
• reaching interoperability is an organizational challenge (not only technical)
• the challenge is to create incentives for independent digital libraries to adopt specifications
Interoperability: technical & organizational
![Page 8: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/8.jpg)
8 herbert van de sompel
Functionality versus cost of acceptance
Functionality
Cost of acceptance
Metadata HarvestingOAI
SDLIP
Z39.50
This picture does not show the eventual benefit of a protocol in the creation of coherent services
![Page 9: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/9.jpg)
9 herbert van de sompel
A&I
federated searching
image
FTXT
OPAC
e-print
![Page 10: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/10.jpg)
10 herbert van de sompel
Z39.50
http://www.loc.gov/z3950/agency/
![Page 11: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/11.jpg)
11 herbert van de sompel
• Permits one computer (z3950 client) to search and retrieve information on another computer serving a database (z3950 target)
• Important both technically and for its wide use in library systems, A&I databases
• Most development has concentrated on bibliographic data
• Most implementations emphasize searches that use a bibliographic set of attributes to search databases of MARC records
Aims of z39.50
![Page 12: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/12.jpg)
12 herbert van de sompel
• Developed for X.25 networks (connection orientation), conversion to run over TCP fitted later
• Original concept in days when repeating a search was expensive computation (about 1980)
• NISO standard
Technical history of z39.50
![Page 13: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/13.jpg)
13 herbert van de sompel
Abstract view of database searching.
• Server stores a set of databases with searchable indexes
• Interactions are based on a session
• The client opens a connection with the server, carries out a sequence of interactions and then closes the connection.
• During the course of the session, both the server and the client remember the state of their interaction.
z39.50 - principles
![Page 14: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/14.jpg)
14 herbert van de sompel
• The server carries out the search and builds a results set
• Server saves the results set.
• Subsequent message from the client can reference the result set.
• Thus the client can modify a large set by increasingly precise requests, or can request a presentation of any record in the set, without searching entire database.
z39.50 - state
![Page 15: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/15.jpg)
15 herbert van de sompel
init -- client connects to the server and exchanges initial information, e.g., preferred message size
explain -- client inquires of the server what databases are available for searching, the fields that are available, the syntax and formats supported, etc.
search -- client presents a query to a database; choices of syntax for specifying searches
• Boolean queries widely implemented
z39.50 - services
manipulation of results sets -- e.g., sort or delete
present -- requests the server to send specified records from the results set to the client in a specified format
scan – browse indexes
![Page 16: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/16.jpg)
16 herbert van de sompel
In the database named Books find all records for which the access point title contains the value evangeline and the access point author contains the value longfellow.
Z39.50 defines a rich variety of search access points (use attributes) that can be extended by implementers
z39.50 – sample query
http://lcweb.loc.gov/z3950/agency/defns/bib1.html
![Page 17: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/17.jpg)
17 herbert van de sompel
z39.50 – issues
No general acceptance (for instance not supported by web browsers => httpd/z3950 gateways)
Merging results for distributed search
Semantic interoperability ~ z39.50 profiles
![Page 18: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/18.jpg)
18 herbert van de sompel
Simple Digital Library Interoperability Protocol
http://www-diglib.stanford.edu/~testbed/doc2/SDLIP/
![Page 19: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/19.jpg)
19 herbert van de sompel
• Compromise between a full-scale, all encompassing approach such as Z39.50 and the “anything goes” approach typical for ad-hoc search interface design on web
• Developed jointly by Stanford, Berkeley, and UC Santa Barbara
• Heavily influenced by DASL from IETF
SDLIP
![Page 20: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/20.jpg)
20 herbert van de sompel
SDLIP – search middleware
![Page 21: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/21.jpg)
21 herbert van de sompel
SDLIP – managing complexity via separating interfaces
![Page 22: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/22.jpg)
22 herbert van de sompel
• Search Interface – defines simple query language, protocol can then include other languages
• Result Interface – parking meter metaphor (server decides on length of session)
• Source Metadata Interface – allows to query a library proxy about its capabilities (subcollections, attributes that can be searched, …)
SDLIP - interfaces
![Page 23: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/23.jpg)
23 herbert van de sompel
Open Archives Initiative Metadata Harvesting Protocol
http://www.openarchives.org
![Page 24: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/24.jpg)
24 herbert van de sompel
• Low-barrier framework for repository interoperability• Minimal burden for parties providing databases• Plug-in concept to allow community and service
specialization (placeholders – about, descriptor ; parallel metadata formats; …)
OAMH protocol
![Page 25: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/25.jpg)
25 herbert van de sompel
metadata
A&I
image
OPAC
e-print
FTXT
harvester
FTXT
OAMH - metadata harvesting
![Page 26: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/26.jpg)
26 herbert van de sompel
metadata
A&I
image
FTXT
e-print
AuthorTitleAbstractIdentifer
OPAC
OAMH – federated services
![Page 27: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/27.jpg)
27 herbert van de sompel
service provider data provider
Requests
Replies
repos i tory
harves ter
6
OAMH – protocol
![Page 28: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/28.jpg)
28 herbert van de sompel
• low-barrier interoperability
• data-provider & service-provider model
• metadata harvesting model OAMH protocol
Dublin Core
HTTP basedReply • XML Schema
• Self contained• shared metadata format and parallel, community-
specific metadata formats
OAMH – core concepts
• authentication etc. : on purpose outside of the
protocol
![Page 29: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/29.jpg)
29 herbert van de sompel
service provider data provider
DatestampIdentifierSet
Records
repos i tory
harves ter
OAMH – protocol tools
![Page 30: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/30.jpg)
30 herbert van de sompel
Supporting protocol requests:• Identify• ListMetadataFormats• ListSets
Harvesting protocol requests:• ListRecords• ListIdentifiers• GetRecord
repos i tory
service provider data provider
harves ter
OAMH – protocol tools
![Page 31: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/31.jpg)
31 herbert van de sompel
ListMetadataFormats
ListMetadataFormats / Time / Request REPEAT
• Format prefix• Format XML schema
/REPEAT
repos i tory
service provider data provider
harves ter
OAMH – a supporting protocol request
![Page 32: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/32.jpg)
32 herbert van de sompel
* from=a * until=b * set=klmListRecords * metadataPrefix=dc
ListRecords / Time / Request REPEAT
• Identifier• Datestamp
• Metadata/REPEAT
repos i tory
service provider data provider
harves ter
OAMH – a harvesting protocol request
![Page 33: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/33.jpg)
33 herbert van de sompel
• federated services [S&R, SDI, alerting, linking, ...]• database synchronization• harvesting the deep Web• ...
OAMH – applications?
![Page 34: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/34.jpg)
34 herbert van de sompel
OAMH – issues
• sync between data provider and service provider (cf. Web page harvesters)
• everyone harvesting from everyone? (cf. brokers)
• CAN attractive cross-community services be built?
• gaining critical mass
![Page 35: Lecture 14 Interoperability for information discovery](https://reader036.vdocuments.site/reader036/viewer/2022062803/568147e2550346895db51706/html5/thumbnails/35.jpg)
35 herbert van de sompel
• There is (and will never be) one right solution (technical vs. cost vs. complexity vs. ??) (cf metadata)
• Distributed technical solutions have organizational ramifications
• A revival of the simplicity approach: specifications that require little effort by the digital libraries involved, but that can eventually lead to appealing services (when combined with brute force computing and intelligent tools)
Some thoughts