a transactional model for data warehouse maintenance

22
1 A Transactional Model A Transactional Model for Data Warehouse for Data Warehouse intenance intenance Authored by: Jun Chen, Songting Chen, Elke A. Rundensteiner Published in ER’2002, Finland Database Systems Research Group Worcester Polytechnic Institute

Upload: candra

Post on 05-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

A Transactional Model for Data Warehouse Maintenance. Authored by: Jun Chen, Songting Chen, Elke A. Rundensteiner Published in ER’2002, Finland D atabase S ystems R esearch G roup Worcester Polytechnic Institute. DWMS. Wrapper. Wrapper. Wrapper. Base. Base. Base. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Transactional Model              for Data Warehouse Maintenance

11

A Transactional Model A Transactional Model for Data Warehouse for Data Warehouse MaintenanceMaintenance

Authored by:Jun Chen, Songting Chen, Elke A.

RundensteinerPublished in ER’2002, Finland

Database Systems Research Group Worcester Polytechnic Institute

Page 2: A Transactional Model              for Data Warehouse Maintenance

22

Data WarehousingData Warehousing

Data Warehouse

Wrapper

. .

.

DWMS

Wrapper

Base

Base

Wrapper

Base

Data Integration from Remote Base SourcesData Integration from Remote Base Sources Difficult and Labor-IntensiveDifficult and Labor-Intensive Better Do it only ONCE and Materialize the ResultsBetter Do it only ONCE and Materialize the Results Share Materialized Data by Many ApplicationsShare Materialized Data by Many Applications

Page 3: A Transactional Model              for Data Warehouse Maintenance

33

Data Warehouse Data Warehouse MaintenanceMaintenance Motivation: Keep Data Warehouse (DW) Update-to-Motivation: Keep Data Warehouse (DW) Update-to-DateDate Base Base ChangesChanges over Time over Time

Source Data UpdatesSource Data Updates insert, delete, updateinsert, delete, update

Source Schema ChangesSource Schema Changes add, drop, renameadd, drop, rename

Basic Idea: Basic Idea: IncrementalIncremental instead of Re-computation instead of Re-computation Re-computation may take weeksRe-computation may take weeks

Page 4: A Transactional Model              for Data Warehouse Maintenance

44

General Maintenance General Maintenance AlgorithmsAlgorithms View Maintenance (VM)View Maintenance (VM)

Incrementally incorporate source data updatesIncrementally incorporate source data updates [BLT86], [GMS93], [ZGH+95], [SBC+00][BLT86], [GMS93], [ZGH+95], [SBC+00]

View Synchronization (VS)View Synchronization (VS) Rewrite data warehouse view definition after the Rewrite data warehouse view definition after the schema schema of the source changedof the source changed [NLR98], [LNR02][NLR98], [LNR02]

View Adaptation (VA)View Adaptation (VA) Adapt view extent after the view definition changedAdapt view extent after the view definition changed [NR99], [GMR+01][NR99], [GMR+01]

Page 5: A Transactional Model              for Data Warehouse Maintenance

55

DW Maintenance ExampleDW Maintenance Example

CREATE VIEW Asia_Traveller ASSELECT C.Name, C.Address,

F.FlightNoFROM Customer C, FlightRes FWHERE C.Name = F.Name AND F.Dest = ‘Asia’;

Customer FlightRes

View: Asia_Traveller

MAMAEllenEllen

WPIWPIDaveDave

AddressAddressNameName DestDestFlightNoFlightNoAgeAgeNameName

EuropeEuropeUA77788UA777882222SteveSteve

AsiaAsiaAA8384AA83842222DaveDave

AA8384AA8384WPIWPIDaveDave

FlightNoFlightNoAddressAddressNameName

Insert ( ‘Steve’, ‘Boston’)

Select FlightNo from FlightRes where Name=‘Steve’

Page 6: A Transactional Model              for Data Warehouse Maintenance

66

Maintenance Anomaly Maintenance Anomaly ProblemProblem

Customer

MAMAEllenEllen

WPIWPIDaveDave

AddressAddressNameName

FlightRes

DestDestFlightNoFlightNoAgeAgeNameName

EuropeEuropeUA77788UA777882222SteveSteve

AsiaAsiaAA8384AA83842222DaveDave

View: Asia_Traveller

AA8384AA8384WPIWPIDaveDave

FlightNoFlightNoAddressAddressNameName

1. Insert ( ‘Steve’, ‘Boston’)

3. Select FlightNo from FlightRes where Name=‘Steve’

2. Rename (FlightRes, FlightReservation)

Broken Query!

Page 7: A Transactional Model              for Data Warehouse Maintenance

77

Inside Broken QueryInside Broken Query Two TransactionsTwo Transactions

Base Update TransactionBase Update Transaction w(Bw(Bii)c(B)c(Bii))

DW Maintenance TransactionDW Maintenance Transaction r(Br(B11)r(B)r(B22)…r(B)…r(Bnn)w(DW)c(DW) )w(DW)c(DW)

Read-write conflicts between two transactionsRead-write conflicts between two transactions Two Independent TransactionsTwo Independent Transactions w(Bw(Bii)) / / r(Br(Bii))

Data Update Data Update w(Bw(Bii): ): Incorrect Query Results Incorrect Query Results [ZGH+95][ZGH+95] Schema Change Schema Change w(Bw(Bii): ): Broken QueryBroken Query

Page 8: A Transactional Model              for Data Warehouse Maintenance

88

A Transactional ApproachA Transactional Approach A Global Transaction ModelA Global Transaction Model

DWMS_TransactionDWMS_Transaction Integrates both Integrates both base update transactionbase update transaction and its and its corresponding corresponding DW maintenance transactionDW maintenance transaction w(Bw(Bii)c(B)c(Bii)r(B)r(B11)r(B)r(B22)…r(B)…r(Bnn)w(DW)c(DW) )w(DW)c(DW)

Maintenance AnomalyMaintenance Anomaly Rephrased to read-write conflicts of DWMS_TransactionsRephrased to read-write conflicts of DWMS_Transactions

w(Bw(Bii)c(B)c(Bii)r(B)r(B11)r(B)r(B22)…)…r(Br(Bjj))…r(B…r(Bnn)w(DW)c(DW) )w(DW)c(DW) w(Bw(Bjj))c(Bc(Bjj)r(B)r(B11)r(B)r(B22)…r(B)…r(Bnn)w(DW)c(DW) )w(DW)c(DW)

Page 9: A Transactional Model              for Data Warehouse Maintenance

99

Serializability of Serializability of DWMS_TransactionDWMS_Transaction

TheoremTheorem A history of DWMS_Transactions S is serializable A history of DWMS_Transactions S is serializable iff it is equivalent to some serial schedule S’ of the iff it is equivalent to some serial schedule S’ of the same DWMS_Transactions.same DWMS_Transactions.

Basis for Solving Anomaly ProblemsBasis for Solving Anomaly Problems To solve the anomaly problem, To solve the anomaly problem, we need all DWMS_Transactions serializable. we need all DWMS_Transactions serializable.

Page 10: A Transactional Model              for Data Warehouse Maintenance

1010

Traditional Serializability Traditional Serializability AlgorithmsAlgorithms

Lock-basedLock-based Reads / writes acquire locks for access to shared Reads / writes acquire locks for access to shared resourcesresources Transactions block each otherTransactions block each other

Multiversion-basedMultiversion-based Write on a version, read on another versionWrite on a version, read on another version Transactions do not block each otherTransactions do not block each other

Page 11: A Transactional Model              for Data Warehouse Maintenance

1111

Traditional Serializability Traditional Serializability AlgorithmsAlgorithms

Lock-basedLock-based Read / write would need to lock data in sources? Read / write would need to lock data in sources? Not desirable in DW environmentNot desirable in DW environment

Data sources are autonomousData sources are autonomous Not realistic to impose locking on themNot realistic to impose locking on them

Multiversion-basedMultiversion-basedDo not block each otherDo not block each other Desirable in DW environmentDesirable in DW environment

DW and data sources do not block each otherDW and data sources do not block each other Need to maintain versions somewhereNeed to maintain versions somewhere

Page 12: A Transactional Model              for Data Warehouse Maintenance

1212

TxnWrap: A Multiversion TxnWrap: A Multiversion AlgorithmAlgorithm

CREATE VIEW Asia_Traveller ASSELECT C.Name, C.Address,

F.FlightNoFROM Customer C, FlightRes FWHERE C.Name = F.Name AND F.Dest = ‘Asia’;

Customer

MAMAEllenEllen

WPIWPIDaveDave

AddressAddressNameName

FlightRes

DestDestFlightNoFlightNoAgeAgeNameName

EuropeEuropeUA77788UA777882222SteveSteve

AsiaAsiaAA8384AA83842222DaveDave

View: Asia_Traveller

AA8384AA8384WPIWPIDaveDave

FlightNoFlightNoAddressAddressNameName

CREATE VIEW Asia_Traveller ASSELECT C.Name, C.Address,

F.FlightNoFROM Customer’ C,FlightRes’ FWHERE C.Name = F.Name AND F.Dest = ‘Asia’;

Wrapper

FlightRes’ Meta Relation

……………………

……………………

…………

NameNameFli’Fli’

D.D.F.F. A.A. N.N.

WrapperCustomer’

Meta Relation

MAMAEllenEllen

WPIWPIDaveDave

AddressAddressNameName

AddressAddressCust’Cust’

NameNameCust’Cust’

AttrAttrRelRel AttrAttrRelRel

Page 13: A Transactional Model              for Data Warehouse Maintenance

1313

Versioned WrapperVersioned Wrapper

Semantics: life time of a tuple is #born <= time < #dead

Wrapper for Customer

NamNamee

AddresAddresss

#born#born #dead#dead

DaveDave WPIWPI 00

EllenEllen MAMA 00

Relation Customer’

ReRell

AttrAttr Rel’Rel’ AttrAttr’’

#bor#bornn

#dea#deadd

C’C’ NamNamee

-- -- 00

C’C’ AddrAddr..

-- -- 00

Meta Relation

Page 14: A Transactional Model              for Data Warehouse Maintenance

1414

Source Updates on Versioned Source Updates on Versioned WrapperWrapper

Transcation 2:

Drop Customer.Address;

Relation Customer’ (Init)

Transaction1:

1. DELETE FROM Customer C

WHERE C.Name = ‘Dave’;

2. INSERT (‘Steve’, ‘Boston’);

MAMA

WPIWPI

AddresAddresss

00

00

#born#born

EllenEllen

DaveDave

#dea#deadd

NamNamee

Relation Customer’ (state 1 )

00MAMAEllenEllen

11BostonBostonSteveSteve

WPIWPI

AddresAddresss

00

#born#born

11DaveDave

#dead#deadNamNamee

Relation Customer’ (state 2 )

00MAMAEllenEllen

11BostonBostonStoveStove

WPIWPI

AddresAddresss

00

#born#born

11DaveDave

#dead#deadNamNamee

Meta Relation (state 2 )

--

--

Rel’Rel’

--

--

Attr’Attr’

2200Addr.Addr.C’C’

00NameNameC’C’

#dead#dead#born#bornAttrAttrRelRel

Page 15: A Transactional Model              for Data Warehouse Maintenance

1515

DW Maintenance Query Rewritten DW Maintenance Query Rewritten for Versioned Wrapperfor Versioned Wrapper

The maintenance query issued in Transaction2:

SELECT Name, Address

FROM Customer

WHERE condition;

Rewritten versioned maintenance query:

SELECT Name, Address

FROM Customer’

WHERE condition and

#born <= 2 and #dead > 2;

Relation Customer’ (State 1 )

00MAMAEllenEllen

11BostonBostonStovStovee

WPIWPI

AddresAddresss

00

#born#born

11DaveDave

#dead#deadNamNamee

Page 16: A Transactional Model              for Data Warehouse Maintenance

1616

Performance EvaluationPerformance Evaluation ImplementationImplementation

In JavaIn Java Platform: Oracle, JDBC on Windows NTPlatform: Oracle, JDBC on Windows NT Embedded in DyDa [CCZ+01] System at WPI Embedded in DyDa [CCZ+01] System at WPI

TestbedTestbed 6 data sources with one relation each6 data sources with one relation each Each relation has 4 attributes and 100,000 tuplesEach relation has 4 attributes and 100,000 tuples One materialized joined view over these data sourcesOne materialized joined view over these data sources TxnWrap VS. compensation (SWEEP [AAS+97] & DyDa)TxnWrap VS. compensation (SWEEP [AAS+97] & DyDa)

Page 17: A Transactional Model              for Data Warehouse Maintenance

1717

Data Update ProcessingData Update Processing

0

0.1

0.2

0.3

0.4

0.5

0 100 200 300 400 500 600 700 800 900 1000

SWEEP TxnWrap # Concurrent DUs

Time (s)

Page 18: A Transactional Model              for Data Warehouse Maintenance

1818

Schema Change ProcessingSchema Change Processing

0100200

300400500600700

800900

1000

0 6 12 18 24 30 36 42 48 54 60

DyDa Abort of DyDaTxnWrap Abort of TxnWrap

Time (s)

Time Interval (s)

Time (s)

Page 19: A Transactional Model              for Data Warehouse Maintenance

1919

Related WorkRelated Work View MaintenanceView Maintenance

View Maintenance / Synchronization / Adaptation View Maintenance / Synchronization / Adaptation

Maintenance AnomalyMaintenance Anomaly ECA [ZGH+95], SWEEP [AAS+97] handles only ECA [ZGH+95], SWEEP [AAS+97] handles only concurrent data updatesconcurrent data updates

Compensation-basedCompensation-based Performance degrades at a high loadPerformance degrades at a high load

Multi-version AlgorithmsMulti-version Algorithms 2-version, n-version, unlimited-version algorithms 2-version, n-version, unlimited-version algorithms [MPL92][MPL92]

Page 20: A Transactional Model              for Data Warehouse Maintenance

2020

ConclusionsConclusions Identify the Maintenance Anomaly Problem in Identify the Maintenance Anomaly Problem in mixed model environmentmixed model environment

Design a global Transaction DWMS_Transaction Design a global Transaction DWMS_Transaction model that integrates both source update model that integrates both source update transaction and maintenance transaction.transaction and maintenance transaction.

Rephrase the maintenance anomaly in terms of Rephrase the maintenance anomaly in terms of serializability of DWMS_Transactionsserializability of DWMS_Transactions Propose multiversion algorithm to achieve Propose multiversion algorithm to achieve serializabilityserializability

Implemented the maintenance solution in DydaImplemented the maintenance solution in Dyda Achieve stable performance under various Achieve stable performance under various workloadsworkloads

Page 21: A Transactional Model              for Data Warehouse Maintenance

2121

Other Activities and Future Other Activities and Future WorkWork Batching of updates into more complex Batching of updates into more complex maintenance maintenance plans plans

Parallelism of maintenance processesParallelism of maintenance processes Support more complex views, e.g., aggregationSupport more complex views, e.g., aggregation Generalize to more change typesGeneralize to more change types Provide alternate view synchronization algorithmsProvide alternate view synchronization algorithms Discovery of changes by non-cooperating sourcesDiscovery of changes by non-cooperating sources Discovery of Discovery of meta data in terms of source meta data in terms of source relationships of distributed sourcesrelationships of distributed sources

Move beyond relational middle-layer modelMove beyond relational middle-layer model

Page 22: A Transactional Model              for Data Warehouse Maintenance

2222

Questions?Questions?