1 lecture 4: relational algebra

34
1 Lecture 4: Relational algebra www.cl.cam.ac.uk/Teaching/current/ Databases/

Post on 20-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Lecture 4: Relational algebra

1

Lecture 4:Relational algebra

www.cl.cam.ac.uk/Teaching/current/Databases/

Page 2: 1 Lecture 4: Relational algebra

2

Today’s lecture

• What’s the (core) relational algebra?

• How can we write queries using the relational algebra?

• How powerful is the relational algebra?

Page 3: 1 Lecture 4: Relational algebra

3

Relational query languages

• Query languages allow the manipulation and retrieval of data from a database

• The relational model supports simple, powerful query languages– Strong formal foundation– Allows for much (provably correct)

optimisation

• NOTE: Query languages are not (necessarily) programming languages

Page 4: 1 Lecture 4: Relational algebra

4

Formal relational query languages

• Two formal query languages1. Relational Algebra

• Simple ‘operational’ model, useful for expressing execution plans

2. Relational Calculus• Logical model (‘declarative’), useful for theoretical results

• Both languages were introduced by Codd in a series of papers

• They have equivalent expressive power

They are the key to understanding SQL query processing!

Page 5: 1 Lecture 4: Relational algebra

5

Preliminaries

• A query is applied to relation instances, and the result of a query is also a relation instance– Schema of relations are fixed (cf. types)– The query will then execute over any valid

instance– The schema of the result can also be

determined

Page 6: 1 Lecture 4: Relational algebra

6

Example relation instances

• A database of boats, sailors, and reservationssid bid day

22 101 101001

99 103 111201

sid sname rating age

11 Sue 7 26

22 Tim 8 26

33 Bob 9 28

55 Kim 10 28

sid sname rating age

10 Myleene 6 23

22 Tim 8 26

99 Julia 100 20

88 Gavin 100 21

R1

S1

S2

bid colour

101 red

102 blue

103 green

B1

Page 7: 1 Lecture 4: Relational algebra

7

Core relational algebra

• Five basic operator classes:1. Selection

• Selects a subset of rows

2. Projection• Picking certain columns

3. Renaming • Renaming attributes

4. Set theoretic operations• The familiar operations: union, intersection, difference, …

5. Products and joins• Combining relations in useful ways

Page 8: 1 Lecture 4: Relational algebra

8

Selection

• Selects rows that satisfy a condition, written

R1 = c(R2)

• where c is a condition involving the attributes of R2, e.g.

rating>8(S2)

returns the relation instance

sid sname rating

age

99 Julia 100 20

88 Gavin 100 21

Page 9: 1 Lecture 4: Relational algebra

9

Selection cont.

• Note:1. The schema of the result is exactly the same

as the schema of the input relation instance

2. There are no duplicates in the resulting relation instance (why?)

3. The resulting relation instance can be used as the input for another relational algebra operator, e.g.

sname=“Julia”(rating>8(S2))

Page 10: 1 Lecture 4: Relational algebra

10

Projection

Deletes fields that are not in the

projection list

R1=A(R2)

where A is a list of attributes from the

schema of R2, e.g.

sname,rating(S2)

returns the relation instance

sname rating

Myleene 6

Tim 8

Julia 100

Gavin 100

Page 11: 1 Lecture 4: Relational algebra

11

Projection cont.

• Note:1. Projection operator has to eliminate

duplicates (why?)

2. Aside: Real systems don’t normally perform duplicate elimination unless the user explicitly asks for it (why not?)

Page 12: 1 Lecture 4: Relational algebra

12

Renaming

R1= A:=B(R2)

• Returns a relation instance identical to R2 except that field A is renamed B

• For example, sname:=nom(S1)

sid nom rating

age

11 Sue 7 26

22 Tim 8 26

33 Bob 9 28

55 Kim 10 28

Page 13: 1 Lecture 4: Relational algebra

13

Familiar set operations

• We have the familiar set-theoretic operators, e.g. , , -

• There is a restriction on their input relation instances: they must be union compatible– Same number of fields– Same field names and domains

• E.g. S1S2 is valid, but S1R1 is not

Page 14: 1 Lecture 4: Relational algebra

14

Cartesian products

AB

• Concatenate every row of A with every row of B

• What do we do if A and B have some field names in common?– Several choices, but we’ll simply assume that

the resulting duplicate field names will have the suffix 1 and 2

Page 15: 1 Lecture 4: Relational algebra

15

Example

S1R1

sid.1 sname rating age sid.2 bid day

11 Sue 7 26 22 101 101001

11 Sue 7 26 99 103 111201

22 Tim 8 26 22 101 101001

22 Tim 8 26 99 103 111201

33 Bob 9 28 22 101 101001

33 Bob 9 28 99 103 111201

55 Kim 10 28 22 101 101001

55 Kim 10 28 99 103 111201

Note!

Page 16: 1 Lecture 4: Relational algebra

16

Theta join

• Theoretically, it is a derived operator

R1 Vc R2 @ c(R1R2)

• E.g., S1 Vsid.1<=sid.2R1sid.1 sname rating age sid.2 bid day

11 Sue 7 26 22 101 101001

11 Sue 7 26 99 103 111201

22 Tim 8 26 22 101 101001

22 Tim 8 26 99 103 111201

33 Bob 9 28 99 103 111201

55 Kim 10 28 99 103 111201

Page 17: 1 Lecture 4: Relational algebra

17

Theta join cont.

1. The result schema is the same as for a cross-product

2. Sometimes this operator is called a conditional join

3. Most commonly the condition is an equality on field names, e.g. S1 Vsid.1=sid.2R1

Page 18: 1 Lecture 4: Relational algebra

18

Equi- and natural join

• Equi-join is a special case of theta join where the condition is equality of field names, e.g. S1 Vsid R1

• Natural join is an equi-join on all common fields where the duplicate fields are removed. It is written simply A V B

sid.1 sname rating age sid.2 bid day

22 Tim 8 26 22 101 101001

Page 19: 1 Lecture 4: Relational algebra

19

Natural join cont.

• Note that the common fields appear only once in the resulting relation instance

• This operator appears very frequently in real-life queries

• It is always implemented directly by the query engine (why?)

Page 20: 1 Lecture 4: Relational algebra

20

Division

• Not a primitive operator, but useful to express queries such as

Find sailors who have reserved all the boats• Consider the simple case, where relation A has

fields x and y, and relation B has field y• A/B is the set of xs (sailors) such that for every y

(boat) in B, there is a row (x,y) in A

Page 21: 1 Lecture 4: Relational algebra

21

Division cont.

• Can you code this up in the relational algebra?

Page 22: 1 Lecture 4: Relational algebra

22

Division cont.

• Can you code this up in the relational algebra?

x’s that are disqualified: x((x(A) B) – A)

Thus: x(A)-x((x(A) B) – A)

Page 23: 1 Lecture 4: Relational algebra

23

Example 1

Find names of sailors who’ve reserved boat 103

Solution 1: sname(bid=103(Reserves) V Sailors)

Solution 2: sname(bid=103(Reserves V Sailors))

Which is more efficient?

Queryoptimisatio

n

Page 24: 1 Lecture 4: Relational algebra

24

Example 2

Find names of sailors who’ve reserved a red boat

Page 25: 1 Lecture 4: Relational algebra

25

Example 2

Find names of sailors who’ve reserved a red boat

sname(colour=“red”(Boats) V Reserves V Sailors)

Better:sname(sid(bid(colour=“red”(Boats)) V Reserves) V Sailors)

Page 26: 1 Lecture 4: Relational algebra

26

Example 3

Find sailors who’ve reserved a red or a green boat

Page 27: 1 Lecture 4: Relational algebra

27

Example 3

Find sailors who’ve reserved a red or a green boat

let T = colour=“red”colour=“green”(Boats)

in sname(T V Reserves V Sailors)

Page 28: 1 Lecture 4: Relational algebra

28

Example 4

Find sailors who’ve reserved a red and a green boat

Page 29: 1 Lecture 4: Relational algebra

29

Example 4

Find sailors who’ve reserved a red and a green boat

let T1 = sid (colour=“red”(Boats) V Reserves)

T2 = sid (colour=“green”(Boats) V Reserves)

in sname((T1 T2) V Sailors)

NOTE: Can’t just trivially modify last solution!

Page 30: 1 Lecture 4: Relational algebra

30

Example 5

Find the names of sailors who’ve reserved at least two boats

let T = sid.1:=sid (sid.1,sname,bid (Sailors V Reserves))

in

sname.1 (sid.1=sid.2bid.1bid.2(T T))

Page 31: 1 Lecture 4: Relational algebra

31

Example 6

Find the names of sailors who’ve reserved all boats

let T = sid,bid (Reserves) / bid (Boats)

in sname(T V Sailors)

Page 32: 1 Lecture 4: Relational algebra

32

Computational limitations

• Suppose we have a relation SequelOf of movies and their immediate sequels

• We want to compute the relation ‘isFollowedBy’ …

movie sequel

Naked Gun Naked Gun 2½

Naked Gun 2½ Naked Gun 33 1/3

Rocky Rocky II

Rocky II Rocky III

Rocky III Rocky IV

Rocky IV Rocky V

Page 33: 1 Lecture 4: Relational algebra

33

Computational limitations

• We could compute

fst,thd(movie:=fst,sequel:=snd(SequelOf)

V movie:=snd,sequel:=thd(SequelOf))

• This provides us with sequels-of-sequels• We could write three joins to get sequels-of- sequels-of-

sequels and union the results• What about Friday the 13th (9 sequels)? • In general we need to be able to write an arbitrarily large

union…• The relational algebra needs to be extended to handle

these sorts of queries

Page 34: 1 Lecture 4: Relational algebra

34

Summary

You should now understand:

• The core relational algebra– Operations and semantics– Union compatibility

• Computational limitations of the relational algebra

Next lecture: Relational calculus