awareness services for digital libraries
DESCRIPTION
Awareness Services for Digital Libraries. Arturo Crespo Hector Garcia-Molina Stanford University. Motivation. Our Objective : create the next generation Data Repositories tailored to Digital Libraries needs: - PowerPoint PPT PresentationTRANSCRIPT
1
Awareness Services for Digital Libraries
Arturo Crespo
Hector Garcia-Molina
Stanford University
2
Data Storage
Motivation
Our Objective: create the next generation Data Repositories tailored to Digital Libraries needs:– Persistence, Distribution, Intellectual Property, Indexing and
Cataloging, Replication, ...
Indexers ReplicaNaming
Data Storage Clients
3
Data Stores and Clients
Clients DataStores
DB Indexer
CS Indexer
DB Tech Reports
AI Tech Reports
HCI Tech Reports
4
Data Store Services
Object access– Via a handle
Object awareness– Clients must be aware of changes at the store
5
A Case Study: CS-TR and SIFT
SIFT: a selective dissemination service CS-TR: A digital library of technical reports from
about 50 universities– Awareness based on timestamps
Problems:– File system timestamps
– Application timestamps
– Deletions
6
The Problem
How can a Data Storage Client detect the changes that have happened in remote Data Storages since the last update
There is not a “Perfect Algorithm”:– The best algorithm for solving this problem depends on the
characteristics of the relation between the Data Storage and the client
7
The Design Space
Ratio of Data Storages per Client Statefull versus Stateless Data Storages in relation
with the Clients Push versus Pull Model
Update Frequency{ Client awareness of Data Storages
Complexity of the Algorithm
How often the repository changesHow often the client is updated
8
Standard Mechanisms for Client Updating
Key Query Algorithm Snapshot Differential Algorithm Timestamps and Versions Logs Triggers Signatures
9
Contributions
Survey of the spectrum of awareness options– Advantages and disadvantages of each one
– All mechanisms can be capture by a single algorithm: the UNI-AWARE algorithm
Enhancements for signature-based schemes– Reduced computation
– Reduced communication costs
10
Related Work
Database replica maintenance
Remote file comparison
Deployment of programs over the network
11
The UNI-AWARE Algorithm
A unified algorithm that “covers” known schemes:– Snapshot algorithm
– Timestamps and versions
– Logs
– Triggers
– Signatures
Algorithm is tailored to a specific scheme through the definition of “custom functions”
12
UNI-AWARE: Signature Algorithm
Signature: a token associated with each document that has a high probability of being unique and changes when the content of the object changes
Example: CRC, checksums
Advantages:– Robust: as it does not require metadata maintenance
– Easy to manage consistently when store fails or object migrates
13
UNI-AWARE: Signature Algorithm
All signatures transferred
Request Documents
Document
Signature
Client DataStore
14
DIST-UNI-AWARE Algorithm
Objective: reduce amount of data exchanged between data store and clients
DIST-UNI-AWARE:– Unified algorithm that can be tailored to different
schemes:» Hierarchical signatures
» Hierarchical timestamps
15
DIST-UNI-AWARE
Signatures of Buckets transferred
Request more Signatures
Document
Signature
Client DataStore
Request Documents
16
Advantages of Signature Algorithms
Support the push and pull models
No need for reliable storage of additional data
structures: if signatures are lost or corrupted, they
can be recomputed
Efficient in usage of network resources, clients and
data stores
Scales well in number of clients and documents
17
DIST-UNI-AWARE: Enhancements
Increase group split factor
Client sends additional information at split time
Clustering of changed objects
18
Conclusions
Awareness mechanism for digital libraries
Separation of storage functionality and other
services
Awareness schemes must be resilient to computer
environment changes and bugs
UNI-AWARE and DIST-UNI-AWARE
19
Reference
Arturo Crespo, Hector Garcia-Molina. "Awareness Services for Digital Libraries." ECDL'97. http://www-db.stanford.edu/~crespo/publications/
20
Awareness Services for Digital Libraries
Arturo Crespo
Hector Garcia-Molina
Stanford University