slides relationalalgebra
TRANSCRIPT
1
Lecture 07:Relational Algebra
2
Outline
• Relational Algebra (Section 6.1)
3
Relational Algebra
• Formalism for creating new relations from existing ones
• Its place in the big picture:
Declarativequery
languageAlgebra Implementation
SQL,relational calculus
Relational algebra
4
Relational Algebra• Five operators:
– Union: – Difference: -– Selection:– Projection: – Cartesian Product:
• Derived or auxiliary operators:– Intersection, complement– Joins (natural,equi-join, theta join, semi-join)– Renaming:
5
1. Union and 2. Difference
• R1 R2Example: – ActiveEmployees RetiredEmployees
• R1 – R2Example:– AllEmployees − RetiredEmployees
6
What about Intersection ?
• It is a derived operatorR1 R2 = R1 – (R1 – R2)
• Also expressed as a join (will see later)Example– UnionizedEmployees RetiredEmployees
7
3. Selection• Returns all tuples which satisfy a condition• Notation: c(R)• Examples
– Salary > 40000 (Employee)– name = “Smith” (Employee)
• The condition c can be =, <, , >, , <>
[in SQL: SELECT * FROM Employee WHERE Salary > 40000]
8
Selection Example
EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000
SSN Name DepartmentID Salary888888888 Alice 2 45,000
Find all employees with salary more than $40,000.Salary > 40000 (Employee)
9
4. Projection• Eliminates columns, then removes duplicates• Notation: A1,…,An (R)• Example: project to social-security number and
names:– SSN, Name (Employee)– Output schema: Answer(SSN, Name)
[In SQL: SELECT DISTINCT SSN, Name FROM Employee]
10
Projection Example
EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000
SSN Name999999999 John777777777 Tony888888888 Alice
SSN, Name (Employee)
11
5. Cartesian Product
• Combine each tuple in R1 with each tuple in R2• Notation: R1 R2• Example:
– Employee Dependents• Very rare in practice; mainly used to express
joins
[In SQL: SELECT * FROM R1, R2]
12
Cartesian Product Example Employee Name SSN John 999999999 Tony 777777777 Dependents EmployeeSSN Dname 999999999 Emily 777777777 Joe Employee x Dependents Name SSN EmployeeSSN Dname John 999999999 999999999 Emily John 999999999 777777777 Joe Tony 777777777 999999999 Emily Tony 777777777 777777777 Joe
13
Relational Algebra• Five operators:
– Union: – Difference: -– Selection:– Projection: – Cartesian Product:
• Derived or auxiliary operators:– Intersection, complement– Joins (natural,equi-join, theta join, semi-join)– Renaming:
14
Renaming
• Changes the schema, not the instance• Schema: R(A1, …, An )
• Notation: B1,…,Bn (R)• Example:
– LastName, SocSocNo (Employee)– Output schema: Answer(LastName, SocSocNo)
[in SQL: SELECT Name AS LastName, SSN AS SocSocNo FROM Employee]
15
Renaming Example
EmployeeName SSNJohn 999999999Tony 777777777
LastName SocSocNoJohn 999999999Tony 777777777
LastName, SocSocNo (Employee)
16
Natural Join• Notation: R1 R2⋈• Meaning: R1 R2 = ⋈ A(C(R1 R2))
• Where:– The selection C checks equality of all common attributes– The projection eliminates the duplicate common attributes
[in SQL: SELECT DISTINCT R1.A, R1. B, R2.C FROM R1, R2WHERE R1.B = R2.BSchema: R1(A,B), R2(B,C)]
17
Natural Join ExampleEmployeeName SSNJohn 999999999Tony 777777777
DependentsSSN Dname999999999 Emily777777777 Joe
Name SSN DnameJohn 999999999 EmilyTony 777777777 Joe
Employee Dependents = Name, SSN, Dname( SSN=SSN2(Employee x SSN2, Dname(Dependents))
18
Natural Join
• R= S=
• R ⋈ S=
A B
X Y
X Z
Y Z
Z V
B C
Z U
V W
Z V
A B CX Z UX Z VY Z UY Z VZ V W
19
Natural Join
• Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R ⋈ S ?
• Given R(A, B, C), S(D, E), what is R ⋈ S ?
• Given R(A, B), S(A, B), what is R ⋈ S ?
20
Theta Join
• A join that involves a predicate• R1 ⋈ R2 = (R1 R2)• Here can be any condition
21
Eq-join
• A theta join where is an equalityR1 ⋈A=B R2 = A=B (R1 R2)
• Example:– Employee ⋈SSN=SSN Dependents
• Most useful join in practice(difference to natural join?)
22
Semijoin
• R ⋉ S = A1,…,An (R ⋈ S)
• Where A1, …, An are the attributes in R• Example:
– Employee ⋉ Dependents
23
Semijoins in Distributed Databases
• Semijoins are used in distributed databases
SSN Name
. . . . . .
SSN Dname Age
. . . . . .
EmployeeDependents
network
Employee ⋈ssn=ssn (age>71 (Dependents))
T = SSN age>71 (Dependents)R = Employee T⋉
Answer = R ⋈ Dependents
24
Complex RA Expressions
Person Purchase Person Product
name=fred name=gizmo
pid ssn
seller-ssn=ssn
pid=pid
buyer-ssn=ssn
name
25
Application: Query Rewriting for Optimization
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
Reserves Sailors
sid=sid
bid=100
sname
rating > 5(Scan;write to temp T1)
(Scan;write totemp T2)
The earlier we process selections, less tuples we need to manipulatehigher up in the tree (predicate pushdown)Disadvantages?
26
Algebraic Laws (Examples)
• Commutative and Associative Laws– R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T– R S = S R, R (S T) = (R S) T
• Laws involving selection– C AND C’(R) = C( C’(R)) = C(R) ∩ C’(R)
– C (R S) = C (R) S • When C involves only attributes of R
• Laws involving projections– M(N(R)) = M,N(R)
⋈ ⋈ ⋈ ⋈ ⋈ ⋈⋈ ⋈
27
Operations on Bags
A bag = a set with repeated elementsAll operations need to be defined carefully on bags• {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}• {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b}• C(R): preserve the number of occurrences
• A(R): no duplicate elimination• Cartesian product, join: no duplicate eliminationImportant ! Relational Engines work on bags, not sets !
28
Finally: RA has Limitations !
• Cannot compute “transitive closure”
• Find all direct and indirect relatives of Fred• Cannot express in RA !!! Need to write C program
Name1 Name2 Relationship
Fred Mary Father
Mary Joe Cousin
Mary Bill Spouse
Nancy Lou Sister