midterm review 1 - github pages€¦ · review relational databases and relational algebra 2. next...

49
CS639: Data Management for Data Science Midterm Review 1: Relational Databases and Relational Algebra Theodoros Rekatsinas 1

Upload: others

Post on 17-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

CS639:DataManagementfor

DataScienceMidtermReview1:RelationalDatabasesandRelationalAlgebra

TheodorosRekatsinas

1

Page 2: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

Today’sLecture

1. ReviewRelationalDatabasesandRelationalAlgebra

2. NextLecture:ReviewMapReduceandNoSQLsystems

2

Page 3: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

Datascienceworkflow

3

Section2

https://cacm.acm.org/blogs/blog-cacm/169199-data-science-workflow-overview-and-challenges/fulltext

Page 4: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

• Datarepresentsthetraces ofreal-worldprocesses.• Thecollectedtracescorrespondtoasample ofthoseprocesses.

• Thereisrandomness anduncertainty inthedatacollectionprocess.

• Theprocessthatgeneratesthedataisstochastic (random).• Example:Let’stossacoin!Whatwilltheoutcomebe?Headsortails?Therearemanyfactorsthatmakeacointossastochasticprocess.

• Thesamplingprocessintroducesuncertainty.• Example:ErrorsduetosensorpositionduetoerrorinGPS,errorsduetotheanglesoflasertraveletc.

4

Section2

UncertaintyandRandomness

Page 5: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

• Datarepresentsthetraces ofreal-worldprocesses.

• Partofthedatascienceprocess:Weneedtomodel thereal-world.

• Amodelisafunction fθ(x)• x:inputvariables(canbeavector)• θ:modelparameters

5

Section2

Models

Page 6: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

• Datarepresentsthetraces ofreal-worldprocesses.

• Thereisrandomness anduncertainty inthedatacollectionprocess.

• Amodelisafunction fθ(x)• x:inputvariables(canbeavector)• θ:modelparameters

• Modelsshouldrelyonprobabilitytheorytocaptureuncertaintyandrandomness!

6

Section2

ModelingUncertaintyandRandomness

Page 7: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

TheRelationalModel:Schemata

• RelationalSchema:

Students(sid: string, name: string, gpa: float)

AttributesString,float,int,etc.arethedomains oftheattributes

Relationname

Page 8: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

8

TheRelationalModel:Data

sid name gpa

001 Bob 3.2

002 Joe 2.8

003 Mary 3.8

004 Alice 3.5

Student

Anattribute (orcolumn)isatypeddataentrypresentineachtupleintherelation

Thenumberofattributesisthearity oftherelation

Page 9: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

9

TheRelationalModel:Data

sid name gpa

001 Bob 3.2

002 Joe 2.8

003 Mary 3.8

004 Alice 3.5

Student

Atuple orrow (orrecord)isasingleentryinthetablehavingtheattributesspecifiedbytheschema

Thenumberoftuplesisthecardinality oftherelation

Page 10: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

10

TheRelationalModel:DataStudent

Arelationalinstance isaset oftuplesallconformingtothesameschema

InpracticeDBMSsrelaxthesetrequirement,andusemultisets.

sid name gpa

001 Bob 3.2

002 Joe 2.8

003 Mary 3.8

004 Alice 3.5

Page 11: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

• Arelationalschema describesthedatathatiscontainedinarelationalinstance

ToReiterate

LetR(f1:Dom1,…,fm:Domm)bearelationalschema then,aninstanceofRisasubsetofDom1 xDom2 x…xDomn

Inthisway,arelationalschema Risatotalfunctionfromattributenames totypes

Page 12: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

• Arelationalschema describesthedatathatiscontainedinarelationalinstance

OneMoreTime

ArelationRofarity t isafunction:R:Dom1 x…xDomt à {0,1}

Then,theschemaissimplythesignatureofthefunction

I.e.returnswhetherornotatupleofmatchingtypesisamemberofit

Noteherethatordermatters,attributenamedoesn’t…We’ll(mostly)workwiththeothermodel(lastslide)in

whichattributenamematters,orderdoesn’t!

Page 13: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

Arelationaldatabase

• Arelationaldatabaseschema isasetofrelationalschemata,oneforeachrelation

• Arelationaldatabaseinstance isasetofrelationalinstances,oneforeachrelation

Twoconventions:1. Wecallrelationaldatabaseinstancesassimplydatabases2. Weassumeallinstancesarevalid,i.e.,satisfythedomainconstraints

Page 14: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

RDBMSArchitecture

HowdoesaSQLenginework?

SQLQuery

RelationalAlgebra(RA)

Plan

OptimizedRAPlan Execution

Declarativequery(fromuser)

Translatetorelationalalgebraexpression

Findlogicallyequivalent- butmoreefficient- RAexpression

Executeeachoperatoroftheoptimizedplan!

Page 15: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

• Fivebasicoperators:1. Selection: s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-

• Derivedorauxiliaryoperators:• Intersection,complement• Joins(natural,equi-join,thetajoin,semi-join)• Renaming: r• Division

RelationalAlgebra(RA)

Page 16: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

NotethatRAOperatorsareCompositional!

SELECT DISTINCTsname,gpa

FROM StudentsWHERE gpa > 3.5;

Students(sid,sname,gpa)

HowdowerepresentthisqueryinRA?

Π"#$%&,()$(𝜎()$,-./(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))

𝜎()$,-./(Π"#$%&,()$(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))

Aretheselogicallyequivalent?

Page 17: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

• Notation:R1⋈R2

• JoinsR1 andR2 onequalityofallsharedattributes• IfR1 hasattributesetA,andR2 hasattributesetB,andtheyshareattributesA⋂B=C,canalsobewritten:R1⋈ 𝐶R2

• OurfirstexampleofaderivedRA operator:• Meaning:R1⋈ R2 =PAUB(sC=D(𝜌=→?(R1)´ R2))• Where:

• Therename𝜌=→? renamesthesharedattributesinoneoftherelations

• TheselectionsC=Dchecksequalityofthesharedattributes• TheprojectionPAUBeliminatestheduplicate

commonattributes

NaturalJoin(⋈)

SELECT DISTINCTssid, S.name, gpa,ssn, address

FROM Students S,People P

WHERE S.name = P.name;

SQL:

RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒

Students(sid,name,gpa)People(ssn,name,address)

Page 18: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

Example:ConvertingSQLQuery->RA

SELECT DISTINCTgpa,address

FROM Students S,People P

WHERE gpa > 3.5 ANDsname = pname;

Π()$,$DDE&""(𝜎()$,-./(𝑆 ⋈ 𝑃))

Students(sid,sname,gpa)People(ssn,sname,address)

Page 19: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

RAExpressionsCanGetComplex!

PersonPurchasePersonProduct

sname=fred sname=gizmo

P pidP ssn

seller-ssn=ssn

pid=pid

buyer-ssn=ssn

P name

Page 20: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

RAhasLimitations!

• Cannotcompute“transitiveclosure”

• FindalldirectandindirectrelativesofFred• CannotexpressinRA!!!

• NeedtowriteCprogram,useagraphengine,ormodernSQL…

Name1 Name2 RelationshipFred Mary FatherMary Joe CousinMary Bill SpouseNancy Lou Sister

Page 21: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

SQLTime!

Page 22: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

SQLTime!

Find all the distinct names of all companies that are based in Japan.

Page 23: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

SQLTime!

Find all the distinct names of all companies that are based in Japan.

Page 24: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

SQLTime!

FindthedistinctnamesofallcompaniesthatarebasedinJapanandthatsoldaproducttoanAIbasedinCupertino.

Page 25: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

SQLTime!

FindthedistinctnamesofallcompaniesthatarebasedinJapanandthatsoldaproducttoanAIbasedinCupertino.

Page 26: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

SQLTime!

Findthedistinctnamesofallcompaniesthathavesoldatleastsixdistinctproducts.

Page 27: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

SQLTime!

Findthedistinctnamesofallcompaniesthathavesoldatleastsixdistinctproducts.

Page 28: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

SQLTime!

Findthedistinctnamesofallcompaniesthathavenotsoldevenasingleproduct.

Page 29: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

SQLTime!

Findthedistinctnamesofallcompaniesthathavenotsoldevenasingleproduct.

Page 30: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

SQLTime!

Findthedistinctnamesofallcompaniessuchthateveryproducttheyhaveeversoldcostsatleast10thousanddollars.Companiesthathavenotsoldanyproductsshouldnotbecounted,astheyarelosers.

Page 31: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

SQLTime!

Findthedistinctnamesofallcompaniessuchthateveryproducttheyhaveeversoldcostsatleast10thousanddollars.Companiesthathavenotsoldanyproductsshouldnotbecounted,astheyarelosers.

Page 32: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

Logicalvs.PhysicalOptimization

• Logicaloptimization(wewillonlyseethisone):• Findequivalentplansthataremoreefficient• Intuition:Minimize#oftuplesateachstepbychangingtheorderofRAoperators

• Physicaloptimization:• FindalgorithmwithlowestIOcosttoexecuteourplan• Intuition:Calculatebasedonphysicalparameters(buffersize,etc.)andestimatesofdatasize(histograms)

Execution

SQLQuery

RelationalAlgebra(RA)Plan

OptimizedRAPlan

Page 33: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

Recall:LogicalEquivalenceofRAPlans

• GivenrelationsR(A,B)andS(B,C):

• Here,projection&selectioncommute:• 𝜎FG/(ΠF(𝑅)) = ΠF(𝜎FG/(𝑅))

• Whatabouthere?• 𝜎FG/(ΠJ(𝑅))?= ΠJ(𝜎FG/(𝑅))

Page 34: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

ΠF,?

R(A,B) S(B,C)

T(C,D)

sA<10

ΠF,?(𝜎FLMN 𝑇 ⋈ 𝑅 ⋈ 𝑆 )

SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B

AND S.C = T.CAND R.A < 10;

R(A,B) S(B,C) T(C,D)

TranslatingtoRA

Page 35: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

LogicalOptimization

• Heuristically,wewantselectionsandprojectionstooccurasearlyaspossibleintheplan• Terminology:“pushdownselections”and“pushingdownprojections.”

• Intuition:Wewillhavefewertuplesinaplan.• Couldfailiftheselectionconditionisveryexpensive(sayrunssomeimageprocessingalgorithm).• Projectioncouldbeawasteofeffort,butmorerarely.

Page 36: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

ΠF,?

R(A,B) S(B,C)

T(C,D)

sA<10

ΠF,?(𝜎FLMN 𝑇 ⋈ 𝑅 ⋈ 𝑆 )

SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B

AND S.C = T.CAND R.A < 10;

R(A,B) S(B,C) T(C,D)

OptimizingRAPlan PushdownselectiononAsoitoccursearlier

Page 37: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

ΠF,?

R(A,B)

S(B,C)

T(C,D)

ΠF,? 𝑇 ⋈ 𝜎FLMN(𝑅) ⋈ 𝑆

SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B

AND S.C = T.CAND R.A < 10;

R(A,B) S(B,C) T(C,D)

OptimizingRAPlan PushdownselectiononAsoitoccursearlier

sA<10

Page 38: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

ΠF,?

R(A,B)

S(B,C)

T(C,D)

ΠF,? 𝑇 ⋈ 𝜎FLMN(𝑅) ⋈ 𝑆

SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B

AND S.C = T.CAND R.A < 10;

R(A,B) S(B,C) T(C,D)

OptimizingRAPlan Pushdownprojectionsoitoccursearlier

sA<10

Page 39: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

ΠF,?

R(A,B)

S(B,C)

T(C,D)

ΠF,? 𝑇 ⋈ ΠF,P 𝜎FLMN(𝑅) ⋈ 𝑆

SELECT R.A,S.DFROM R,S,TWHERE R.B = S.B

AND S.C = T.CAND R.A < 10;

R(A,B) S(B,C) T(C,D)

OptimizingRAPlan WeeliminateBearlier!

sA<10

ΠF,=

Ingeneral,whenisanattributenotneeded…?

Page 40: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

Pleasegoovertheexampleshere:

• https://courses.cs.washington.edu/courses/cse544/99sp/homeworks/sample/sample.html

• Onlythefirst4questions!

40

Page 41: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

41

TransactionProperties:ACID

• Atomic• Stateshowseitheralltheeffectsoftxn,ornoneofthem

• Consistent• Txn movesfromastatewhereintegrityholds,toanotherwhereintegrityholds

• Isolated• Effectoftxns isthesameastxns runningoneafteranother(ie lookslikebatchmode)

• Durable• Onceatxn hascommitted,itseffectsremaininthedatabase

ACIDcontinuestobeasourceofgreatdebate!

Page 42: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

42

ACID:Atomicity

• TXN’sactivitiesareatomic:allornothing

• Intuitively:intherealworld,atransactionissomethingthatwouldeitheroccurcompletely ornotatall

• TwopossibleoutcomesforaTXN

• Itcommits:allthechangesaremade

• Itaborts:nochangesaremade

Page 43: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

Transactions• Akeyconceptisthetransaction(TXN):an atomicsequenceofdbactions(reads/writes)

Atomicity:Anactioneithercompletesentirely ornotatall

43

Acct Balancea10 20,000a20 15,000

Acct Balancea10 17,000a20 18,000

Transfer$3kfroma10toa20:1. Debit$3kfroma102. Credit$3ktoa20

• Crashbefore1,• After1butbefore2,• After2.

Writtennaively,inwhichstatesis

atomicity preserved?

DBAlwayspreservesatomicity!

Page 44: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

44

ACID:Consistency

• Thetablesmustalwayssatisfyuser-specifiedintegrityconstraints• Examples:

• Accountnumberisunique• Stockamountcan’tbenegative• Sumofdebitsandofcredits is0

• Howconsistencyisachieved:• Programmermakessureatxn takesaconsistentstatetoaconsistentstate• Systemmakessurethatthetxn isatomic

Page 45: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

45

ACID:Isolation

• Atransactionexecutesconcurrentlywithothertransactions

• Isolation:theeffectisasifeachtransactionexecutesinisolation oftheothers.

• E.g.Shouldnotbeabletoobservechangesfromothertransactionsduringtherun

Page 46: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

Challenge:SchedulingConcurrentTransactions• TheDBMSensuresthattheexecutionof{T1,…,Tn}isequivalenttosomeserial execution

• Onewaytoaccomplishthis:Locking• Beforereadingorwriting,transactionrequiresalockfromDBMS,holdsuntiltheend

• KeyIdea: IfTi wantstowritetoanitemxandTjwantstoreadx,thenTi,Tj conflict.Solutionvialocking:• onlyonewinnergetsthelock• loserisblocked(waits)untilwinnerfinishes

AsetofTXNsisisolated iftheireffectisasifallwereexecutedserially

46

WhatifTiandTj needXandY,andTi asksforXbeforeTj,andTj asksforYbeforeTi?->Deadlock!Oneisaborted…

AllconcurrencyissueshandledbytheDBMS…

Page 47: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

47

ACID:Durability

• TheeffectofaTXNmustcontinuetoexist(“persist”)aftertheTXN• Andafterthewholeprogramhasterminated• Andeveniftherearepowerfailures,crashes,etc.• Andetc…

•Means:Writedatatodisk

Page 48: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

EnsuringAtomicity&Durability• DBMSensuresatomicity evenifaTXNcrashes!

• Onewaytoaccomplishthis:Write-aheadlogging(WAL)

• KeyIdea: Keepalogofallthewritesdone.• Afteracrash,thepartiallyexecutedTXNsareundoneusingthelog

Write-aheadLogging(WAL): Beforeanyactionisfinalized,acorrespondinglogentryisforcedtodisk

48

Weassumethatthelogison“stable”storage

AllatomicityissuesalsohandledbytheDBMS…

Page 49: Midterm Review 1 - GitHub Pages€¦ · Review Relational Databases and Relational Algebra 2. Next Lecture: Review MapReduce and NoSQL systems 2. Data science workflow 3 ... 002 Joe

ChallengesforACIDproperties

• Inspiteoffailures:Powerfailures,butnotmediafailures

• Usersmayaborttheprogram:needto“rollbackthechanges”• Needtolog whathappened

• Manyusersexecutingconcurrently• Canbesolvedvialocking(we’llseethisnextlecture!)

Andallthiswith…Performance!!