“good enough” database caching

62
“Good Enough” Database Caching Hongfei Guo University of Wisconsin- Madison

Upload: sonya-james

Post on 30-Dec-2015

38 views

Category:

Documents


0 download

DESCRIPTION

“Good Enough” Database Caching. Hongfei Guo University of Wisconsin-Madison. Motivation — Scaling Google. …. Motivation — Scaling A DBMS By Caching. How to tell whether the cached data is “good enough” for an application? NO data quality requirements from the applications! - PowerPoint PPT Presentation

TRANSCRIPT

“Good Enough” Database Caching

Hongfei GuoUniversity of Wisconsin-

Madison

2

Motivation — Scaling Google

3

Updates

Backend DBMS

How to tell whether the cached data is “good enough” for an application?

NO data quality requirements from the applications! NO data quality guarantees from the caching DBMS!

Motivation — Scaling A DBMS By Caching

Application Server

Application Server

App specific code

Caching DBMS

Asynchronous Updates

4

Apps: Specifies data quality requirements in queries

Cache: Enforces data quality constraint[SIGMOD 2004] [SIGMOD 2004 Demo]

Cache admin: Specify local data quality to be maintained by cache(Data quality-centric database caching model)[TR 2005] [submitted for publication]

Data quality-aware adaptive cache management[ongoing work]

Caching DBMS

Backend DBMS

Application ServerApplication Server

The Thesis

5

Data Quality Metrics (informal)

Currency: The elapsed time since this copy becomes stale

Consistency: A query result is (snapshot) consistent iff it is as if evaluated from a snapshot of the master database

C&C: Currency & Consistency

6

Roadmap

Background Specifying data quality constraints in SQL Data quality-centric caching model Enforcing data quality constraints Other research Future directions

7

Specifying Data Quality Constraints in SQL

[Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004]

Currency requirements Consistency requirements Extend SQL to specify relaxed

C&C requirements Formal semantics of C&C

constraints

8

Example 1: The caching database keeps BookCopy

Customer A is about to purchase –he wants the data to be exactly current (High data quality is preferred)

Customer B is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred)

Currency Requirements

9

Example 1: The caching database keeps BookCopy

Customer A is about to purchase –he wants the data to be exactly current (High data quality is preferred)

Customer B is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred)

Currency Requirements

10

Example 1: The caching database keeps BookCopy

Customer A is about to purchase –he wants the data to be exactly current (High data quality is preferred)

Customer B is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred)

Currency Requirements

Different apps may have different currency requirements for the same

query

11

bid

title author

bid rid

text

1 databases

Raghu 1 1 …

1 databases

Raghu 1 2 …

2 databases

Ullman 2 3 …

Ullmandatabases2

Raghudatabases1

authortitlebid

BookCopy

…23

…12

…11

textbidrid

ReviewCopy

SELECT *FROM Books B, Reviews R WHERE B.bid = R.bid AND

B.title = “Databases”

Example 2:

Consistency Requirements

The whole query result be consistentBooks be consistent & Reviews be consistentEach book be consistent with its reviews

Different apps may have different consistency requirements for the same

query

12

bid

title author

bid rid

text

1 databases

Raghu 1 1 …

1 databases

Raghu 1 2 …

2 databases

Ullman 2 3 …

CURRENCY BOUND 10 min ON (B, R) BY B.bid

CURRENCY BOUND 10 min ON (B), 30 min ON (R)

CURRENCY BOUND 10 min ON (B, R)

Proposed SQL Syntax

Ullmandatabases2

Raghudatabases1

authortitlebid

BookCopy

…23

…12

…11

textbidrid

ReviewCopy

SELECT *FROM Books B, Reviews R WHERE B.bid = R.bid AND

B.title = “Databases“

Consistency class

Currency bound

Group by

13

Extend SQL to express C&C constraints Single-block queries Multi-block (i.e., nested) queries Timeline constraint

Formal semantics of C&C constraints

Specifying Data quality Constraints in SQL: Contributions

Provides correctness standard for using

replicated or cached data

14

Roadmap

Background Specifying data quality constraints in SQL Data quality-centric caching model Enforcing data quality constraints Other Research Future directions

15

Data Quality-Centric Caching

Model[Guo, Larson and Ramakrishnan, submitted]

Cache data quality properties Cache property specification Maintenance and “safety”

16

Cache Properties (=

contract)

Why Define Cache Properties?

Query processing

Cache maintenance

17

Cache Properties (P+3C)

Presence — per object Consistency — a set of objects Completeness — per predicate Currency — object staleness

View 1

View 2View 3

Basic Concepts

ObjectTables

Cache

H2

H1Master Database

Snapshots

View 1

View 2View 3

Cache Property Examples

Cache

H2

H1Master Database

Present Complete

Currency = now – stale point

Consistent

Stale point

20

Specifying Cache Properties

Specified as integrity constraints Presence constraint Consistency constraint Completeness constraint Presence correlation constraint Consistency correlation constraint

21

AuthorList_PCT:

authorId name city

1 Alice Madison

2 Bob Madison

3 Cedric Seattle

Presence Constraint AuthorCopy:

authorId

1

2

3

Backend DBMS

Caching DBMS

22

control-table

CREATE VIEW AuthorCopy AS SELECT * FROM Authors

CREATE TABLE AuthorList_PCT (authorId int)

ALTER VIEW AuthorCopy ADD

ON authorId IN (SELECTauthorId FROM authorId_PCT

Partially materialize

d view[Zhou et al 2005]

authorId name city

Presence ConstraintAuthorCopy:

authorId

AuthorList_PCT:

1 Alice Madison

2 Bob Madison

3 Cedric Seattle

1

2

3

control-key

PRESENCE

23

CityList_CsCT:

authorId name city

1 Alice Madison

2 Bob Madison

3 Cedric Seattle

Consistency Constraint AuthorCopy:

city

Madison

authorId

AuthorList_PCT:

1

2

3

authorId

AuthorList_PCT:

1

2

3

CREATE TABLE CityList_CsCT (city string)

ALTER VIEW AuthorCopy ADD

ON city IN (SELECT city

FROM cityList_CsCT

Consistency

Backend DBMS

Cache Region

24

authorId

AuthorList_PCT:CityList_CpCT:

authorId name city

1 Alice Madison

2 Bob Madison

3 Cedric Seattle

Completeness Constraint AuthorCopy:

city

Madison

CREATE TABLE CityList_CpCT (city string)

ALTER VIEW AuthorCopy ADD

ON city IN (SELECT city

FROM cityList_CsCT

Completeness

Backend DBMS

authorId

AuthorList_PCT:

1

3

1

3

25

111 1 aaa222 1 bbb333 2 ccc444 3 ddd555 3 eee

isbn authorId title

1 Alice Madison

2 Bob Madison3 Cedric Seattle

authorId name city

Presence Correlation Constraint

AuthorCopy:

BookCopy:

ALTER VIEW BookCopy ADD PRESENCE ON authorId IN (SELECT authorId

FROM AuthorCopy)

authorId

AuthorList_PCT:

1

2

3Backend

DBMS

authorId

authorId

26

111 1 aaa222 1 bbb333 2 ccc444 3 ddd555 3 eee

isbn authorId title

1 Alice Madison

2 Bob Madison3 Cedric Seattle

authorId name city

Presence Correlation Constraint

AuthorCopy:

BookCopy:

authorId

AuthorList_PCT:

1

2

3

authorId

authorId

AuthorList_PCT

AuthorCopy

BookCopy

authorId

authorId

27

111 1 aaa222 1 bbb333 2 ccc444 3 ddd555 3 eee

isbn authorId title

1 Alice Madison

2 Bob Madison3 Cedric Seattle

authorId name city

Consistency Correlation Constraint

AuthorCopy:

BookCopy:

authorId

AuthorList_PCT:

1

2

3

authorId

authorIdBackend

DBMS

ALTER VIEW BookCopy ADD CONSISTENCY ROOT

28

111 1 aaa222 1 bbb333 2 ccc444 3 ddd555 3 eee

isbn authorId title

1 Alice Madison

2 Bob Madison3 Cedric Seattle

authorId name city

Consistency Correlation Constraint

AuthorCopy:

BookCopy:

authorId

AuthorList_PCT:

1

2

3

authorId

authorId

AuthorList_PCT

AuthorCopy

BookCopy

authorId

authorId

29

Cache Schema Example

AuthorList_PCT

AuthorCopy

BookCopy

ReviewerList_PCT

ReviewerCopy

authorId

authorId

isbn

reviewId

reviewerId

ReviewCopy

30

Pull-Maintenance

Refresh a region by pulling query results

When refreshing a region, also refresh the affected closure All overlapping regions All correlated regions

31

111 1 aaa222 1 bbb333 1 ccc444 3 aaa555 4 eee

Pull-Maintenance

isbn authorId title

BookCopy:

title

AuthorList_PCT: authorId

TitleList_CsCT:

134

aaa

authorId

32

111 1 aaa222 1 bbb333 1 ccc444 3 aaa555 3 eee

Pull-MaintenanceAuthorCopy:

isbn authorId title

BookCopy:

1 Alice Madison3 Cedric Seattle

authorId name cityAuthorList_PCT

AuthorCopy

BookCopy

authorId

authorIdauthorId

33

Inefficient PullingAuthorCopy:

isbn price title

BookCopy:

1 Alice Madison3 Cedric Seattle

authorId name city

111 10 aaa222 20 bbb333 30 ccc555 50 eee

AuthorBookCopy:authorId isbn

1 111

1 2221 3333 1113 555

authorId

isbn

Shared-row

problem

34

Issues

Inefficient pulling: Calculation of the affected closure

requires checking the rows

Efficient pulling: The affected closure does NOT

depend on the instance of a view Only requires forward pull among

correlated views

35

Theoretical Results Definition:

(Safe PMV) A partially materialized view V is safe if the following two conditions hold for every instance of the cache that satisfies all integrity constraints:

For any pair of regions in V, either they don’t overlap or one is contained in the other.

If V is gray, let X denote the set of regions in V defined by presence control-key values. X is a partitioning of V and no pair of regions in X is contained in any one region defined on V.

Cache schema design rules:

Rule 1: A cache graph is a DAG.

Rule 2: Only red nodes can have independent completeness or consistency control-tables.

Rule 3: Every PMV with more than one parent must be a red circle.

Rule 4: If a PMV has the shared-row problem according to Lemma 5.2, then it cannot be gray.

Rule 5: A PMV cannot have non-compatible control-tables.

Property for every instance

Syntactically checkable conditions

(polynomial)

Theorem:

Given a cache schema <W, E>, if it satisfies the design rules, then every PMV in W is safe. Conversely, if the schema violates one of these rules, there is an instance of the cache satisfying all specified integrity constraints in which some PMV is unsafe.

36

Data Quality-Centric Caching Model: Contributions

Four cache properties Specifying cache properties

Cache property unit: cache region Safe views and efficient pulling

Provides an abstraction layer (contract) between query

processing and cache maintenance

37

Roadmap

Background Specifying data quality constraints in SQL Data quality-centric caching model Enforcing data quality constraints Other research Future directions

38

Enforcing Data Quality Constraints

Overview Simple case: View-level

consistency [Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004] [Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004 Demo]

Implemented in MS SQL Server code base

General case: Row-level consistency[Guo, Larson and Ramakrishnan, submitted]

QueriesQueries with Relaxed

C&C Requirements

Results

QueryOptimizer

ExecutionEngine

Results

Cache Region

Metadata

HeartbeatTables

Backend DBMS

Local Materialized

Views

Caching DBMS

Extension to MTCache Framework

Shadow Databases

MTCache Framework [Larson et al. 2004]

40

Simple Case Assumptions

Fully materialized views Each view is consistent Push-based maintenance

E.g., MS replication service

QueryOptimizer

ExecutionEngine

Results

Queries with Relaxed C&C Requirements

Cache Region

Metadata

HeartbeatTables

Backend DBMS

Local Materialized

Views

Results

Extension to MTCache Framework

Shadow Databases

Caching DBMS

42

Consistency tracking cache region (CR) The unit of update propagation Data mutually consistent all the time Properties, e.g., est. delay, est. interval

Currency tracking heartbeat table

12: 2012: 3012: 301 12: 0012: 00

Cid Timestamp

1

2 12: 00

12: 10

V 1

V 3

V 4 V 5

V2

C&C Tracking Mechanism

V 1

V 3

V 4 V 5

V2

Backend Cache

CR1:

2 12: 00 CR2:

QueryOptimizer

ExecutionEngine

Results

Queries with Relaxed C&C Requirements

Currency Region

Metadata

HeartbeatTables

Backend DBMS

Local Materialized

Views

Results

Extension to MTCache Framework

Shadow Databases

Queries with Relaxed C&C Requirements

Caching DBMS

The best plan that: Satisfies consistency requirements Includes run-time currency checking

44

Extension to the Optimizer

Compile-time consistency checking

Run-time currency checking Cost estimation

45

Consistency Checking

Enforced at optimization time Immediately prune a sub-plan if it

violates consistency constraints

Merge join

Local scanReviews

Remote queryon Books

Q1: σ( Books Reviews) CURRENCY 5 ON (Books, Reviews)

46

Run-time Currency Checking

When view V matches expression E

E V

Currency guard:Check if local view V satisfies currency requirement

SwitchUnion

CurrencyGuard

Remote planrequesting E

Local plan using V

47

Cost Estimation

Cost for the SwitchUnion operator:

C = p * Clocal + (1- p) * Cremote + Ccg

p : probability that the local branch will be usedClocal : cost of execution of the local branchCremote : cost of execution of the remote branchCcg : cost of currency checking

48

Estimating p

Compute p from three parameters:f : estimated refresh interval

d : estimated minimal delay B : currency bound

0 if B-d ≤ 0,(B-d)/f if 0 < B-d ≤ f,1 if B-d > f

p =

49

Changing The Assumptions

Fully materialized views

Consistent views

Push-based maintenance

Partially materialized views

Row-level consistency

Pull-based maintenance

More general algorithms Run-time check for consistency

constraints that can not be validated at compile-time

50

Run-time C&C Checking

When view V matches expression E

E SwitchUnion

CurrencyGuard

Remote planrequesting E

Local plan using V

Currency guard:Check if local view V satisfies currency requirement

51

Run-time C&C Checking

When view V matches expression E

E SwitchUnion

CurrencyGuard

Remote planrequesting E

Local plan using V

C&CGuard

Consistency guard:Check if local view V satisfies consistency requirement

Currency guard:Check if local view V satisfies currency requirement

52

Performance Evaluation Goals

Currency guards overhead Consistency guards overhead

Simple checks A spectrum of checks ranging from

simple to complicated

53

Experimental Setting

Back-end hosts a TPCD database tpcd1gh with scale factor 1.0 (~1GB)

Cache server has a shadow of tpcd1gh

Two local views: custCopy, orderCopy LAN connection between cache and

backend server

54

Queries Used

Qa: key select

SELECT * FROM Customers C WHERE c_custkey=1 CURRENCY 10 ON (C)

Qb: join query

SELECT * FROM Customers C, Orders O WHERE c_custkey=o_custkey and c_custkey=1 CURRENCY 10 ON (C), 20 ON (O)

Qc: non-key select

SELECT * FROM Customers C WHERE c_nationkey = 1 CURRENCY 10 on (C)

55

0

50

100

150

200

250

Qa Qb Qc Qa Qb Qc

Currency guard

Query

Currency Guards Overhead

15.26%

21.3%

3.66%

3.59% 4.31%

0.41%

Local

Remote

Execu

tion t

ime (

ms)

56

Simple Consistency Guards Overhead

0

10

20

30

40

50

60

70

80

Qa Qb Qc Qa Qb Qc

Consistency guard

Query

Local

Remote

Execu

tion t

ime (

ms)

16.56%

14.00%

1.72%

1.59%1.66%

1.6%

57

0

1

2

3

4

5

6

7

A11a A11b A12 S11 S12 A11a A11b A12 S11 S12

Consistency guard

Query

Single Table Consistency Guard Overhead

Local

Remote

Execu

tion t

ime (

ms)

62.85%

16.98% 71.41%

6.06% 8.79%7.48%2.33%4.95%

58.32%

23.77%

(Qa is used)

58

Enforcing Data Quality Constraints: contributions

Algorithms for enforcing C&C constraints in query processing

Implemented a prototype in MS SQL Server code base for a restricted case

Provides DBMS guarantees for C&C requirements

59

Related WorkRelaxing data quality Distributed databases

Read-only transactions [Garcia-Moninaet al. 1982]

Demarcation protocol [Barbará et al 1992]

TACC [Yu et al. 2000] Epsilon-serilizability [Pu et al. 1992]

Warehousing and web views WebViews [Labrinidis et al 2003] FAS [Röhm et al. 2002] Obsolescent views [Gal 1999] Distributed views [Segev et al 1990]

Replica management Quasi-copies [Alonso et al. 1998],

[Gallersdörfer et al. 1995] Good-enough views [Seligman et al.

1997] TRAPP [Olson et al. 2000]

Caching Database caching

DBCache [Altinel et al. 2003] Constraint-based database caching

[Härder et al. 2004] Mid-Tier caching [TimesTen 2002] Shared-storage caching [Khalil et al 2002]

Others Semantic caching [Dar et al 1996] Cache in Postgres [Stonebraker et al 1990] Predicate-based caching [Keller et al 1996] WATCHMAN [Scheuermann et al 1996] Cache investment [Kossmann et al 2000] DECAF [Kiernan and Carey 2000] Proxy caching [Luo et al 2001]

Uniqueness of our approach (query-centric): Query: Specifies fine-grained C&C

constraints Admin: Flexible data quality control in

terms of granularity and properties Caching DBMS: Provides C&C

guarantees for individual query

60

Other Research UW: Indexing large-scale, dynamic one-

dimensional intervals [In preparation] A family of data structures Differed index

Evaluating different locking protocols for database caching [ongoing]

Quality of services evaluation of multicast streaming protocols [SIGMETRICS 2002]

MS: SchemaGen project [Software released] Designed and implemented a relational schema

generator for annotated XML schemas MSR-Redmond: RECYCLE project

Added support for update statistics for query result caching in SQL Server

61

Future DirectionsImprove current prototype Read-write

transactions? Time-line

constraints?

Automate cache design/tuning How to get a good

cache schema?

Apply “good enough” to other forms of replications Indexing data?

62

Summary

Problem: Gap between applications and caching DBMS

A comprehensive solution Specifying data quality constraints Data quality-centric cache model Enforcing Data quality constraints Data quality-aware adaptive cache

management

Questions?