lecture 4 ra - github pages...•need to write c program, use a graph engine, or modern sql… name1...

40
CS639: Data Management for Data Science Lecture 4: Relational Algebra Theodoros Rekatsinas 1

Upload: others

Post on 18-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • CS639:DataManagementfor

    DataScienceLecture4:RelationalAlgebra

    TheodorosRekatsinas1

  • 2

    Announcements• Assignment1

    • HintsandGrading

  • Today’sLecture

    1. TheRelationalModel&RelationalAlgebra

    2. RelationalAlgebraPt.II

    3

  • 1.TheRelationalModel&RelationalAlgebra

    4

  • Whatyouwilllearnaboutinthissection

    1. TheRelationalModel

    2. RelationalAlgebra:BasicOperators

    5

  • 6

    Levelsofabstraction

  • Motivation

    TheRelationalmodelisprecise,implementable,andwecanoperateonit

    (query/update,etc.)

    Databasemapsinternallyintothisprocedurallanguage.

  • TheRelationalModel:Schemata

    • RelationalSchema:

    Students(sid: string, name: string, gpa: float)

    AttributesString,float,int,etc.arethedomains oftheattributes

    Relationname

  • 9

    TheRelationalModel:Data

    sid name gpa

    001 Bob 3.2

    002 Joe 2.8

    003 Mary 3.8

    004 Alice 3.5

    Student

    Anattribute (orcolumn)isatypeddataentrypresentineachtupleintherelation

    Thenumberofattributesisthearity oftherelation

  • 10

    TheRelationalModel:Data

    sid name gpa

    001 Bob 3.2

    002 Joe 2.8

    003 Mary 3.8

    004 Alice 3.5

    Student

    Atuple orrow (orrecord)isasingleentryinthetablehavingtheattributesspecifiedbytheschema

    Thenumberoftuplesisthecardinality oftherelation

  • 11

    TheRelationalModel:DataStudent

    Arelationalinstance isaset oftuplesallconformingtothesameschema

    InpracticeDBMSsrelaxthesetrequirement,andusemultisets.

    sid name gpa

    001 Bob 3.2

    002 Joe 2.8

    003 Mary 3.8

    004 Alice 3.5

  • • Arelationalschema describesthedatathatiscontainedinarelationalinstance

    ToReiterate

    LetR(f1:Dom1,…,fm:Domm)bearelationalschema then,aninstanceofRisasubsetofDom1 xDom2 x…xDomn

    Inthisway,arelationalschema Risatotalfunctionfromattributenames totypes

  • • Arelationalschema describesthedatathatiscontainedinarelationalinstance

    OneMoreTime

    ArelationRofarity t isafunction:R:Dom1 x…xDomt à {0,1}

    Then,theschemaissimplythesignatureofthefunction

    I.e.returnswhetherornotatupleofmatchingtypesisamemberofit

    Noteherethatordermatters,attributenamedoesn’t…We’ll(mostly)workwiththeothermodel(lastslide)in

    whichattributenamematters,orderdoesn’t!

  • Arelationaldatabase

    • Arelationaldatabaseschema isasetofrelationalschemata,oneforeachrelation

    • Arelationaldatabaseinstance isasetofrelationalinstances,oneforeachrelation

    Twoconventions:1. Wecallrelationaldatabaseinstancesassimplydatabases2. Weassumeallinstancesarevalid,i.e.,satisfythedomainconstraints

  • RemembertheCMS

    • RelationDBSchema• Students(sid:string,name:string,gpa:float)• Courses(cid:string,cname:string,credits:int)• Enrolled(sid:string,cid:string,grade:string)

    Sid Name Gpa101 Bob 3.2123 Mary 3.8

    Students

    cid cname credits564 564-2 4308 417 2

    Coursessid cid Grade123 564 A

    Enrolled

    RelationInstances

    15

    Notethattheschemasimposeeffectivedomain/typeconstraints,i.e.Gpacan’tbe“Apple”

  • 2nd PartoftheModel:Querying

    “FindnamesofallstudentswithGPA>3.5”

    Wedon’ttellthesystem howorwhere togetthedata- justwhatwewant,i.e.,Queryingisdeclarative

    SELECT S.nameFROM Students SWHERE S.gpa > 3.5;

    Tomakethishappen,weneedtotranslatethedeclarativequeryintoaseriesofoperators…we’llseethisnext!

  • Virtuesofthemodel

    • Physicalindependence(logicaltoo),Declarative

    • Simple,elegantclean:Everythingisarelation

  • RelationalAlgebra

  • RDBMSArchitecture

    HowdoesaSQLenginework?

    SQLQuery

    RelationalAlgebra(RA)

    Plan

    OptimizedRAPlan Execution

    Declarativequery(fromuser)

    Translatetorelationalalgebraexpression

    Findlogicallyequivalent- butmoreefficient- RAexpression

    Executeeachoperatoroftheoptimizedplan!

  • RDBMSArchitecture

    HowdoesaSQLenginework?

    SQLQuery

    RelationalAlgebra(RA)

    Plan

    OptimizedRAPlan Execution

    RelationalAlgebraallowsustotranslatedeclarative(SQL)queriesintopreciseandoptimizableexpressions!

  • • Fivebasicoperators:1. Selection: s2. Projection:P3. CartesianProduct:´4. Union:È5. Difference:-

    • Derivedorauxiliaryoperators:• Intersection,complement• Joins(natural,equi-join,thetajoin,semi-join)• Renaming: r• Division

    RelationalAlgebra(RA)

  • Keepinmind:RAoperatesonsets!

    • RDBMSsusemultisets,howeverinrelationalalgebraformalismwewillconsidersets!

    • Also:wewillconsiderthenamedperspective,whereeveryattributemusthaveauniquename• àattributeorderdoesnotmatter…

    NowontothebasicRAoperators…

  • • Returnsalltupleswhichsatisfyacondition• Notation: sc(R)• Examples• sSalary >40000 (Employee)• sname =“Smith” (Employee)

    • Theconditionccanbe=,,³,

    1.Selection(𝜎)

    SELECT *FROM StudentsWHERE gpa > 3.5;

    SQL:

    RA:𝜎"#$&'.)(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

    Students(sid,sname,gpa)

  • sSalary >40000 (Employee)

    SSN Name Salary1234545 John 2000005423341 Smith 6000004352342 Fred 500000

    SSN Name Salary5423341 Smith 6000004352342 Fred 500000

    Anotherexample:

  • • Eliminatescolumns,thenremovesduplicates• Notation:P A1,…,An (R)• Example:projectsocial-securitynumberandnames:• P SSN,Name (Employee)• Outputschema:Answer(SSN,Name)

    2.Projection(Π)

    SELECT DISTINCTsname,gpa

    FROM Students;

    SQL:

    RA:Π45$67,"#$(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

    Students(sid,sname,gpa)

  • P Name,Salary (Employee)

    SSN Name Salary1234545 John 2000005423341 John 6000004352342 John 200000

    Name SalaryJohn 200000John 600000

    Anotherexample:

  • NotethatRAOperatorsareCompositional!

    SELECT DISTINCTsname,gpa

    FROM StudentsWHERE gpa > 3.5;

    Students(sid,sname,gpa)

    HowdowerepresentthisqueryinRA?

    Π45$67,"#$(𝜎"#$&'.)(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))

    𝜎"#$&'.)(Π45$67,"#$(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠))

    Aretheselogicallyequivalent?

  • • EachtupleinR1witheachtupleinR2• Notation:R1´ R2• Example:• Employee´ Dependents

    • Rareinpractice;mainlyusedtoexpressjoins

    3.Cross-Product(×)

    SELECT *FROM Students, People;

    SQL:

    RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒

    Students(sid,sname,gpa)People(ssn,pname,address)

  • ssn pname address1234545 John 216 Rosse

    5423341 Bob 217 Rosse

    sid sname gpa001 John 3.4

    002 Bob 1.3

    𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠×𝑃𝑒𝑜𝑝𝑙𝑒

    ×

    ssn pname address sid sname gpa1234545 John 216 Rosse 001 John 3.4

    5423341 Bob 217 Rosse 001 John 3.4

    1234545 John 216 Rosse 002 Bob 1.3

    5423341 Bob 216 Rosse 002 Bob 1.3

    People StudentsAnotherexample:

  • • Changestheschema,nottheinstance• A‘special’operator- neitherbasicnorderived• Notation:r B1,…,Bn (R)

    • Note:thisisshorthandfortheproperform(sincenames,notordermatters!):• r A1àB1,…,AnàBn (R)

    Renaming(𝜌)

    SELECTsid AS studId,sname AS name,gpa AS gradePtAvg

    FROM Students;

    SQL:

    RA:𝜌4?@ABA,5$67,"C$A7D?EF"(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

    Students(sid,sname,gpa)

    Wecareaboutthisoperatorbecause weareworkinginanamedperspective

  • sid sname gpa001 John 3.4

    002 Bob 1.3

    𝜌4?@ABA,5$67,"C$A7D?EF"(𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠)

    Students

    studId name gradePtAvg001 John 3.4

    002 Bob 1.3

    Students

    Anotherexample:

  • • Notation:R1⋈R2

    • JoinsR1 andR2 onequalityofallsharedattributes• IfR1 hasattributesetA,andR2 hasattributesetB,andtheyshareattributesA⋂B=C,canalsobewritten:R1⋈ 𝐶R2

    • OurfirstexampleofaderivedRA operator:• Meaning:R1⋈ R2 =PAUB(sC=D(𝜌J→L(R1)´ R2))• Where:

    • Therename𝜌J→L renamesthesharedattributesinoneoftherelations• TheselectionsC=Dchecksequalityofthesharedattributes• TheprojectionPAUBeliminatestheduplicatecommonattributes

    NaturalJoin(⋈)

    SELECT DISTINCTssid, S.name, gpa,ssn, address

    FROM Students S,People P

    WHERE S.name = P.name;

    SQL:

    RA:𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒

    Students(sid,name,gpa)People(ssn,name,address)

  • ssn P.name address1234545 John 216 Rosse

    5423341 Bob 217 Rosse

    sid S.name gpa001 John 3.4

    002 Bob 1.3

    𝑆𝑡𝑢𝑑𝑒𝑛𝑡𝑠 ⋈ 𝑃𝑒𝑜𝑝𝑙𝑒

    sid S.name gpa ssn address001 John 3.4 1234545 216 Rosse

    002 Bob 1.3 5423341 216 Rosse

    PeoplePStudentsSAnotherexample:

  • NaturalJoin

    • GivenschemasR(A,B,C,D),S(A,C,E),whatistheschemaofR⋈S?

    • GivenR(A,B,C),S(D,E),whatisR⋈S?

    • GivenR(A,B),S(A,B),whatisR⋈S?

  • Example:ConvertingSQLQuery->RA

    SELECT DISTINCTgpa,address

    FROM Students S,People P

    WHERE gpa > 3.5 ANDsname = pname;

    Π"#$,$AAC744(𝜎"#$&'.)(𝑆 ⋈ 𝑃))

    Students(sid,sname,gpa)People(ssn,sname,address)

  • LogicalEquivalenceofRAPlans

    • GivenrelationsR(A,B)andS(B,C):

    • Here,projection&selectioncommute:• 𝜎EM)(ΠE(𝑅)) = ΠE(𝜎EM)(𝑅))

    • Whatabouthere?• 𝜎EM)(ΠP(𝑅))?= ΠP(𝜎EM)(𝑅))

  • 1.Union(È) and2.Difference(–)

    • R1È R2• Example:• ActiveEmployeesÈ RetiredEmployees

    • R1– R2• Example:• AllEmployees -- RetiredEmployees

    R1 R2

    R1 R2

  • WhataboutIntersection(Ç) ?

    • Itisaderivedoperator• R1Ç R2=R1– (R1– R2)• Alsoexpressedasajoin!• Example

    • UnionizedEmployeesÇ RetiredEmployees

    R1 R2

  • RAExpressionsCanGetComplex!

    PersonPurchasePersonProduct

    sname=fred sname=gizmo

    P pidP ssn

    seller-ssn=ssn

    pid=pid

    buyer-ssn=ssn

    P name

  • RAhasLimitations!

    • Cannotcompute“transitiveclosure”

    • FindalldirectandindirectrelativesofFred• CannotexpressinRA!!!

    • NeedtowriteCprogram,useagraphengine,ormodernSQL…

    Name1 Name2 RelationshipFred Mary FatherMary Joe CousinMary Bill SpouseNancy Lou Sister