awareness services for digital libraries

20
1 Awareness Services for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University

Upload: hallie

Post on 04-Feb-2016

46 views

Category:

Documents


0 download

DESCRIPTION

Awareness Services for Digital Libraries. Arturo Crespo Hector Garcia-Molina Stanford University. Motivation. Our Objective : create the next generation Data Repositories tailored to Digital Libraries needs: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Awareness Services  for Digital Libraries

1

Awareness Services for Digital Libraries

Arturo Crespo

Hector Garcia-Molina

Stanford University

Page 2: Awareness Services  for Digital Libraries

2

Data Storage

Motivation

Our Objective: create the next generation Data Repositories tailored to Digital Libraries needs:– Persistence, Distribution, Intellectual Property, Indexing and

Cataloging, Replication, ...

Indexers ReplicaNaming

Data Storage Clients

Page 3: Awareness Services  for Digital Libraries

3

Data Stores and Clients

Clients DataStores

DB Indexer

CS Indexer

DB Tech Reports

AI Tech Reports

HCI Tech Reports

Page 4: Awareness Services  for Digital Libraries

4

Data Store Services

Object access– Via a handle

Object awareness– Clients must be aware of changes at the store

Page 5: Awareness Services  for Digital Libraries

5

A Case Study: CS-TR and SIFT

SIFT: a selective dissemination service CS-TR: A digital library of technical reports from

about 50 universities– Awareness based on timestamps

Problems:– File system timestamps

– Application timestamps

– Deletions

Page 6: Awareness Services  for Digital Libraries

6

The Problem

How can a Data Storage Client detect the changes that have happened in remote Data Storages since the last update

There is not a “Perfect Algorithm”:– The best algorithm for solving this problem depends on the

characteristics of the relation between the Data Storage and the client

Page 7: Awareness Services  for Digital Libraries

7

The Design Space

Ratio of Data Storages per Client Statefull versus Stateless Data Storages in relation

with the Clients Push versus Pull Model

Update Frequency{ Client awareness of Data Storages

Complexity of the Algorithm

How often the repository changesHow often the client is updated

Page 8: Awareness Services  for Digital Libraries

8

Standard Mechanisms for Client Updating

Key Query Algorithm Snapshot Differential Algorithm Timestamps and Versions Logs Triggers Signatures

Page 9: Awareness Services  for Digital Libraries

9

Contributions

Survey of the spectrum of awareness options– Advantages and disadvantages of each one

– All mechanisms can be capture by a single algorithm: the UNI-AWARE algorithm

Enhancements for signature-based schemes– Reduced computation

– Reduced communication costs

Page 10: Awareness Services  for Digital Libraries

10

Related Work

Database replica maintenance

Remote file comparison

Deployment of programs over the network

Page 11: Awareness Services  for Digital Libraries

11

The UNI-AWARE Algorithm

A unified algorithm that “covers” known schemes:– Snapshot algorithm

– Timestamps and versions

– Logs

– Triggers

– Signatures

Algorithm is tailored to a specific scheme through the definition of “custom functions”

Page 12: Awareness Services  for Digital Libraries

12

UNI-AWARE: Signature Algorithm

Signature: a token associated with each document that has a high probability of being unique and changes when the content of the object changes

Example: CRC, checksums

Advantages:– Robust: as it does not require metadata maintenance

– Easy to manage consistently when store fails or object migrates

Page 13: Awareness Services  for Digital Libraries

13

UNI-AWARE: Signature Algorithm

All signatures transferred

Request Documents

Document

Signature

Client DataStore

Page 14: Awareness Services  for Digital Libraries

14

DIST-UNI-AWARE Algorithm

Objective: reduce amount of data exchanged between data store and clients

DIST-UNI-AWARE:– Unified algorithm that can be tailored to different

schemes:» Hierarchical signatures

» Hierarchical timestamps

Page 15: Awareness Services  for Digital Libraries

15

DIST-UNI-AWARE

Signatures of Buckets transferred

Request more Signatures

Document

Signature

Client DataStore

Request Documents

Page 16: Awareness Services  for Digital Libraries

16

Advantages of Signature Algorithms

Support the push and pull models

No need for reliable storage of additional data

structures: if signatures are lost or corrupted, they

can be recomputed

Efficient in usage of network resources, clients and

data stores

Scales well in number of clients and documents

Page 17: Awareness Services  for Digital Libraries

17

DIST-UNI-AWARE: Enhancements

Increase group split factor

Client sends additional information at split time

Clustering of changed objects

Page 18: Awareness Services  for Digital Libraries

18

Conclusions

Awareness mechanism for digital libraries

Separation of storage functionality and other

services

Awareness schemes must be resilient to computer

environment changes and bugs

UNI-AWARE and DIST-UNI-AWARE

Page 19: Awareness Services  for Digital Libraries

19

Reference

Arturo Crespo, Hector Garcia-Molina. "Awareness Services for Digital Libraries." ECDL'97. http://www-db.stanford.edu/~crespo/publications/

Page 20: Awareness Services  for Digital Libraries

20

Awareness Services for Digital Libraries

Arturo Crespo

Hector Garcia-Molina

Stanford University