lecture 07: relational algebra

31
1 Lecture 07: Relational Algebra

Upload: karsen

Post on 05-Jan-2016

78 views

Category:

Documents


0 download

DESCRIPTION

Lecture 07: Relational Algebra. Outline. Relational Algebra (Section 6.1). Relational Algebra. Formalism for creating new relations from existing ones Its place in the big picture:. Declarative query language. Algebra. Implementation. Relational algebra. SQL, relational calculus. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 07: Relational Algebra

1

Lecture 07:Relational Algebra

Page 2: Lecture 07: Relational Algebra

2

Outline

• Relational Algebra (Section 6.1)

Page 3: Lecture 07: Relational Algebra

3

Relational Algebra

• Formalism for creating new relations from existing ones

• Its place in the big picture:

Declarativequery

language

Declarativequery

languageAlgebraAlgebra ImplementationImplementation

SQL,relational calculus

Relational algebra

Page 4: Lecture 07: Relational Algebra

4

Relational Algebra• Five operators:

– Union: – Difference: -– Selection:– Projection: – Cartesian Product:

• Derived or auxiliary operators:– Intersection, complement– Joins (natural,equi-join, theta join, semi-join)– Renaming:

Page 5: Lecture 07: Relational Algebra

5

1. Union and 2. Difference

• R1 R2Example: – ActiveEmployees RetiredEmployees

• R1 – R2Example:– AllEmployees − RetiredEmployees

Page 6: Lecture 07: Relational Algebra

6

What about Intersection ?

• It is a derived operatorR1 R2 = R1 – (R1 – R2)

• Also expressed as a join (will see later)Example– UnionizedEmployees RetiredEmployees

Page 7: Lecture 07: Relational Algebra

7

3. Selection

• Returns all tuples which satisfy a condition

• Notation: c(R)

• Examples– Salary > 40000 (Employee)

– name = “Smith” (Employee)

• The condition c can be =, <, , >, , <>

[in SQL: SELECT * FROM Employee WHERE Salary > 40000]

Page 8: Lecture 07: Relational Algebra

8

Selection Example

EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000

SSN Name DepartmentID Salary888888888 Alice 2 45,000

Find all employees with salary more than $40,000.Salary > 40000 (Employee)

Page 9: Lecture 07: Relational Algebra

9

4. Projection• Eliminates columns, then removes duplicates

• Notation: A1,…,An (R)

• Example: project to social-security number and names:– SSN, Name (Employee)

– Output schema: Answer(SSN, Name)

[In SQL: SELECT DISTINCT SSN, Name FROM Employee]

Page 10: Lecture 07: Relational Algebra

10

Projection Example

EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000

SSN Name999999999 John777777777 Tony888888888 Alice

SSN, Name (Employee)

Page 11: Lecture 07: Relational Algebra

11

5. Cartesian Product

• Combine each tuple in R1 with each tuple in R2• Notation: R1 R2• Example:

– Employee Dependents

• Very rare in practice; mainly used to express joins

[In SQL: SELECT * FROM R1, R2]

Page 12: Lecture 07: Relational Algebra

12

Cartesian Product Example Employee Name SSN John 999999999 Tony 777777777 Dependents EmployeeSSN Dname 999999999 Emily 777777777 Joe Employee x Dependents Name SSN EmployeeSSN Dname John 999999999 999999999 Emily John 999999999 777777777 Joe Tony 777777777 999999999 Emily Tony 777777777 777777777 Joe

Page 13: Lecture 07: Relational Algebra

13

Relational Algebra• Five operators:

– Union: – Difference: -– Selection:– Projection: – Cartesian Product:

• Derived or auxiliary operators:– Intersection, complement– Joins (natural,equi-join, theta join, semi-join)– Renaming:

Page 14: Lecture 07: Relational Algebra

14

Renaming

• Changes the schema, not the instance

• Schema: R(A1, …, An )

• Notation: B1,…,Bn (R)

• Example:– LastName, SocSocNo (Employee)

– Output schema: Answer(LastName, SocSocNo)

[in SQL: SELECT Name AS LastName, SSN AS SocSocNo FROM Employee]

Page 15: Lecture 07: Relational Algebra

15

Renaming Example

EmployeeName SSNJohn 999999999Tony 777777777

LastName SocSocNoJohn 999999999Tony 777777777

LastName, SocSocNo (Employee)

Page 16: Lecture 07: Relational Algebra

16

Natural Join• Notation: R1 R2⋈• Meaning: R1 R2 = ⋈ A(C(R1 R2))

• Where:– The selection C checks equality of all common attributes– The projection eliminates the duplicate common attributes

[in SQL: SELECT DISTINCT R1.A, R1. B, R2.C FROM R1, R2

WHERE R1.B = R2.B

Schema: R1(A,B), R2(B,C)]

Page 17: Lecture 07: Relational Algebra

17

Natural Join Example

EmployeeName SSNJohn 999999999Tony 777777777

DependentsSSN Dname999999999 Emily777777777 Joe

Name SSN DnameJohn 999999999 EmilyTony 777777777 Joe

Employee Dependents = Name, SSN, Dname( SSN=SSN2(Employee x SSN2, Dname(Dependents))

Page 18: Lecture 07: Relational Algebra

18

Natural Join

• R= S=

• R ⋈ S=

A B

X Y

X Z

Y Z

Z V

B C

Z U

V W

Z V

A B C

X Z U

X Z V

Y Z U

Y Z V

Z V W

Page 19: Lecture 07: Relational Algebra

19

Natural Join

• Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ?

• Given R(A, B, C), S(D, E), what is R ⋈ S ?

• Given R(A, B), S(A, B), what is R ⋈ S ?

Page 20: Lecture 07: Relational Algebra

20

Theta Join

• A join that involves a predicate

• R1 ⋈ R2 = (R1 R2)

• Here can be any condition

Page 21: Lecture 07: Relational Algebra

21

Eq-join

• A theta join where is an equality

R1 ⋈A=B R2 = A=B (R1 R2)

• Example:– Employee ⋈SSN=SSN Dependents

• Most useful join in practice(difference to natural join?)

Page 22: Lecture 07: Relational Algebra

22

Semijoin

• R ⋉ S = A1,…,An (R ⋈ S)

• Where A1, …, An are the attributes in R

• Example:– Employee ⋉ Dependents

Page 23: Lecture 07: Relational Algebra

23

Semijoins in Distributed Databases

• Semijoins are used in distributed databases

SSN Name

. . . . . .

SSN Dname Age

. . . . . .

EmployeeDependents

network

Employee ⋈ssn=ssn (age>71 (Dependents))Employee ⋈ssn=ssn (age>71 (Dependents))

T = SSN age>71 (Dependents)R = Employee T⋉

Answer = R ⋈ Dependents

Page 24: Lecture 07: Relational Algebra

24

Complex RA Expressions

Person Purchase Person Product

name=fred name=gizmo

pid ssn

seller-ssn=ssn

pid=pid

buyer-ssn=ssn

name

Page 25: Lecture 07: Relational Algebra

25

Application: Query Rewriting for Optimization

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Reserves Sailors

sid=sid

bid=100

sname

rating > 5(Scan;write to temp T1)

(Scan;write totemp T2)

The earlier we process selections, less tuples we need to manipulatehigher up in the tree (predicate pushdown)Disadvantages?

Page 26: Lecture 07: Relational Algebra

26

Algebraic Laws (Examples)

• Commutative and Associative Laws– R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T

– R S = S R, R (S T) = (R S) T

• Laws involving selection– C AND C’(R) = C( C’(R)) = C(R) ∩ C’(R)

– C (R S) = C (R) S • When C involves only attributes of R

• Laws involving projections– M(N(R)) = M,N(R)

Page 27: Lecture 07: Relational Algebra

27

Operations on Bags

A bag = a set with repeated elements

All operations need to be defined carefully on bags• {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}• {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}

• C(R): preserve the number of occurrences

• A(R): no duplicate elimination

• Cartesian product, join: no duplicate elimination

Important ! Relational Engines work on bags, not sets !

Page 28: Lecture 07: Relational Algebra

28

Finally: RA has Limitations !

• Cannot compute “transitive closure”

• Find all direct and indirect relatives of Fred• Cannot express in RA !!! Need to write C program

Name1 Name2 Relationship

Fred Mary Father

Mary Joe Cousin

Mary Bill Spouse

Nancy Lou Sister

Page 29: Lecture 07: Relational Algebra

29

Formulating queries in RA

• Consider a database for student enrollment for courses, and books used in the courses– STUDENT (SSN, Name, Major, Bdate)

– COURSE (Course#, Cname, Dept)

– ENROLL (SSN, Course#, Quarter, Grade)

– BOOK_ADOPTION (Course#, Quarter, Book_ISBN)

– TEXT (Book_ISBN, Book_Title, Publisher, Author)

Page 30: Lecture 07: Relational Algebra

30

Formulating queries in RA

• Specify the following queries in relational algebra– List the number of courses (Course#) taken by all

students named ‘John Smith’ in Winter 1999 (i.e., Quarter = W99)

– List any department which has all its adopted books published by ‘BC Publishing’

Page 31: Lecture 07: Relational Algebra

31

Formulating Queries in RA

Course# (Quarter=W99 ((Name= ‘John Smith’ (STUDENT) ⋈ ENROLL))

OtherDept = Dept ((Publisher <> ‘PS Publishers’

(BOOK_ADOPTION ⋈ TEXT)) ⋈ COURSE) AllDept = Dept (BOOK_ADOPTION ⋈ COURSE)Answer = AllDept - OtherDept

And how will you express it in SQL?

WHY?