introduction to data management cse...
TRANSCRIPT
![Page 1: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/1.jpg)
1
Introduction to Data Management CSE 344
Lecture 7: SQL Wrap-up Relational Algebra
CSE 344 - Winter 2016
![Page 2: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/2.jpg)
Announcements
• Webquiz 3 is open, due on Sunday
• Homework 3 is posted, due on Tuesday, 2/2 – We are using Microsoft Azure Cloud services! – Use the promotion code you received
CSE 344 - Winter 2016 2
![Page 3: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/3.jpg)
3
What We Learned Last Time
• Subqueries can occur in every clause: – SELECT – FROM – WHERE
• Monotone queries: SELECT-FROM-WHERE – Existential quantifier
• Non-monotone queries – Universal quantifier – Aggregation
CSE 344 - Winter 2016
![Page 4: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/4.jpg)
4
Practice these queries in SQL
Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer)
Find drinkers that frequent some bar that serves some beer they like.
Find drinkers that frequent only bars that serves some beer they like.
Find drinkers that frequent only bars that serves only beer they like.
x: ∃y. ∃z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)
x: ∀y. Frequents(x, y)⇒ (∃z. Serves(y,z)∧Likes(x,z))
x: ∀y. Frequents(x, y)⇒ ∀z.(Serves(y,z) ⇒ Likes(x,z))
Ullman’s drinkers-bars-beers example
Find drinkers that frequent some bar that serves only beers they like.
x: ∃y. Frequents(x, y)∧∀z.(Serves(y,z) ⇒ Likes(x,z))
![Page 5: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/5.jpg)
5
Example 1 Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer)
Find drinkers that frequent some bar that serves some beer they like.
SELECT DISTINCT X.drinker FROM Frequents X, Serves Y, Likes Z WHERE X.bar = Y.bar AND Y.beer = Z.beer AND X.drinker = Z.drinker
![Page 6: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/6.jpg)
6
Example 2 Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer)
Find drinkers that frequent some bar that serves only beers they don’t like.
SELECT DISTINCT Y.drinker FROM Frequents Y WHERE NOT EXISTS (SELECT *
FROM Serves Z, Likes U WHERE Y.bar=Z.bar AND Y.drinker=U.drinker AND Z.beer = U.beer)
![Page 7: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/7.jpg)
7
Example 3 Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer)
Find drinkers that frequent only bars that serves some beer they like.
SELECT X.drinker FROM Frequents X WHERE X.drinker NOT IN (SELECT Y.drinker
FROM Frequents Y WHERE NOT EXISTS ( SELECT * FROM Serves Z, Likes U WHERE Y.bar=Z.bar AND Y.drinker=U.drinker AND Z.beer = U.beer))
(Recall: In example 2, we found drinkers that frequent some bar that serves only beers they don’t like)
![Page 8: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/8.jpg)
Unnesting Aggregates
Find the number of companies in each city
SELECT DISTINCT X.city, (SELECT count(*) FROM Company Y WHERE X.city = Y.city) FROM Company X
SELECT city, count(*) FROM Company GROUP BY city
Equivalent queries
Note: no need for DISTINCT (DISTINCT is the same as GROUP BY)
CSE 344 - Winter 2016 8
Product (pname, price, cid) Company(cid, cname, city)
![Page 9: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/9.jpg)
Unnesting Aggregates
Find the number of products made in each city SELECT DISTINCT X.city, (SELECT count(*) FROM Product Y, Company Z WHERE Z.cid=Y.cid
AND Z.city = X.city) FROM Company X
9
Product (pname, price, cid) Company(cid, cname, city)
SELECT X.city, count(*) FROM Company X, Product Y WHERE X.cid=Y.cid GROUP BY X.city
NOT equivalent ! You should know why!
CSE 344 - Winter 2016
![Page 10: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/10.jpg)
GROUP BY v.s. Nested Queries
SELECT product, Sum(quantity) AS TotalSales FROM Purchase WHERE price > 1 GROUP BY product
SELECT DISTINCT x.product, (SELECT Sum(y.quantity) FROM Purchase y WHERE x.product = y.product AND y.price > 1) AS TotalSales FROM Purchase x WHERE x.price > 1
Why twice ? 10
Purchase(pid, product, quantity, price)
CSE 344 - Winter 2016
![Page 11: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/11.jpg)
More Unnesting
CSE 344 - Winter 2016 11
Author(login,name) Wrote(login,url)
Find authors who wrote ≥ 10 documents:
![Page 12: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/12.jpg)
More Unnesting
CSE 344 - Winter 2016 12
SELECT DISTINCT Author.name FROM Author WHERE (SELECT count(Wrote.url) FROM Wrote WHERE Author.login=Wrote.login) >= 10
This is SQL by a novice
Attempt 1: with nested queries
Author(login,name) Wrote(login,url)
Find authors who wrote ≥ 10 documents:
![Page 13: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/13.jpg)
More Unnesting
CSE 344 - Winter 2016 13
Attempt 1: with nested queries
Author(login,name) Wrote(login,url)
Find authors who wrote ≥ 10 documents:
SELECT Author.name FROM Author, Wrote WHERE Author.login=Wrote.login GROUP BY Author.name HAVING count(wrote.url) >= 10
This is SQL by
an expert
Attempt 2: using GROUP BY and HAVING
![Page 14: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/14.jpg)
Finding Witnesses
For each city, find the most expensive product made in that city
CSE 344 - Winter 2016 14
Product (pname, price, cid) Company(cid, cname, city)
![Page 15: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/15.jpg)
Finding Witnesses
SELECT x.city, max(y.price) FROM Company x, Product y WHERE x.cid = y.cid GROUP BY x.city;
Finding the maximum price is easy…
But we need the witnesses, i.e. the products with max price
CSE 344 - Winter 2016 15
For each city, find the most expensive product made in that city
Product (pname, price, cid) Company(cid, cname, city)
![Page 16: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/16.jpg)
Finding Witnesses To find the witnesses, compute the maximum price in a subquery
CSE 344 - Winter 2016 16
SELECT DISTINCT u.city, v.pname, v.price FROM Company u, Product v, (SELECT x.city, max(y.price) as maxprice FROM Company x, Product y WHERE x.cid = y.cid GROUP BY x.city) w WHERE u.cid = v.cid and u.city = w.city and v.price=w.maxprice;
Product (pname, price, cid) Company(cid, cname, city)
![Page 17: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/17.jpg)
Finding Witnesses
Or we can use a subquery in where clause
CSE 344 - Winter 2016 17
SELECT u.city, v.pname, v.price FROM Company u, Product v WHERE u.cid = v.cid and v.price >= ALL (SELECT y.price FROM Company x, Product y WHERE u.city=x.city and x.cid=y.cid);
Product (pname, price, cid) Company(cid, cname, city)
![Page 18: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/18.jpg)
Finding Witnesses
There is a more concise solution here:
CSE 344 - Winter 2016 18
SELECT u.city, v.pname, v.price FROM Company u, Product v, Company x, Product y WHERE u.cid = v.cid and u.city = x.city and x.cid = y.cid GROUP BY u.city, v.pname, v.price HAVING v.price = max(y.price);
Product (pname, price, cid) Company(cid, cname, city)
![Page 19: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/19.jpg)
Where We Are
• Motivation for using a DBMS for managing data • SQL, SQL, SQL
– Declaring the schema for our data (CREATE TABLE) – Inserting data one row at a time or in bulk (INSERT/.import) – Modifying the schema and updating the data (ALTER/UPDATE) – Querying the data (SELECT)
• Next step: More knowledge of how DBMSs work – Client-server architecture – Relational algebra, query execution, and physical tuning
CSE 344 - Winter 2016 19
![Page 20: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/20.jpg)
Where We Are
• Motivation for using a DBMS for managing data • SQL, SQL, SQL
– Declaring the schema for our data (CREATE TABLE) – Inserting data one row at a time or in bulk (INSERT/.import) – Modifying the schema and updating the data (ALTER/UPDATE) – Querying the data (SELECT)
• Next step: More knowledge of how DBMSs work – Client-server architecture – Relational algebra and query execution
CSE 344 - Winter 2016 20
![Page 21: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/21.jpg)
Where We Are
• Motivation for using a DBMS for managing data • SQL, SQL, SQL
– Declaring the schema for our data (CREATE TABLE) – Inserting data one row at a time or in bulk (INSERT/.import) – Modifying the schema and updating the data (ALTER/UPDATE) – Querying the data (SELECT)
• Next step: More knowledge of how DBMSs work – Client-server architecture – Relational algebra and query execution
CSE 344 - Winter 2016 21
![Page 22: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/22.jpg)
Query Evaluation Steps
Parse & Check Query
Decide how best to answer query: query
optimization
Query Execution
SQL query
Return Results
Translate query string into internal
representation
Check syntax, access control,
table names, etc.
Query Evaluation
CSE 344 - Winter 2016 22
Logical plan à physical plan
![Page 23: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/23.jpg)
The WHAT and the HOW
• SQL = WHAT we want to get form the data
• Relational Algebra = HOW to get the data we want
• The passage from WHAT to HOW is called query optimization – SQL -> Relational Algebra -> Physical Plan – Relational Algebra = Logical Plan
23 CSE 344 - Winter 2016
![Page 24: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/24.jpg)
Overview: SQL = WHAT
SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’
It’s clear WHAT we want, unclear HOW to get it
Product(pid, name, price) Purchase(pid, cid, store) Customer(cid, name, city)
24 CSE 344 - Winter 2016
![Page 25: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/25.jpg)
Overview: Relational Algebra = HOW
25
SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’
![Page 26: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/26.jpg)
Overview: Relational Algebra = HOW
Product Purchase
pid=pid
price>100 and city=‘Seattle’
x.name,z.name
δ
cid=cid
Customer
Π
σ
26
SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’
![Page 27: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/27.jpg)
Overview: Relational Algebra = HOW
Product Purchase
pid=pid
price>100 and city=‘Seattle’
x.name,z.name
δ
cid=cid
Customer
Π
σ
T1(pid,name,price,pid,cid,store)
27
SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’
![Page 28: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/28.jpg)
Overview: Relational Algebra = HOW
Product Purchase
pid=pid
price>100 and city=‘Seattle’
x.name,z.name
δ
cid=cid
Customer
Π
σ
T1(pid,name,price,pid,cid,store)
T2( . . . .) T3(. . . )
28
SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’
![Page 29: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/29.jpg)
Overview: Relational Algebra = HOW
Product Purchase
pid=pid
price>100 and city=‘Seattle’
x.name,z.name
δ
cid=cid
Customer
Π
σ
T1(pid,name,price,pid,cid,store)
T2( . . . .)
T4(name,name)
Final answer
T3(. . . )
29
SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’
![Page 30: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/30.jpg)
Overview: Relational Algebra = HOW
Product Purchase
pid=pid
price>100 and city=‘Seattle’
x.name,z.name
δ
cid=cid
Customer
Π
σ
T1(pid,name,price,pid,cid,store)
T2( . . . .)
T4(name,name)
Final answer
T3(. . . )
Execution order is now clearly specified
30
SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’
![Page 31: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/31.jpg)
Overview: Relational Algebra = HOW
Product Purchase
pid=pid
price>100 and city=‘Seattle’
x.name,z.name
δ
cid=cid
Customer
Π
σ
T1(pid,name,price,pid,cid,store)
T2( . . . .)
T4(name,name)
Final answer
T3(. . . )
Execution order is now clearly specified
31
Logical plan Many physical details are still left open!
SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’
![Page 32: Introduction to Data Management CSE 344courses.cs.washington.edu/courses/cse344/16wi/lectures/lec07-sqlw… · Introduction to Data Management CSE 344 Lecture 7: SQL Wrap-up Relational](https://reader034.vdocuments.site/reader034/viewer/2022043014/5fb350aa30198255871fcf21/html5/thumbnails/32.jpg)
Overview: Relational Algebra = HOW
Product Purchase
pid=pid
price>100 and city=‘Seattle’
x.name,z.name
δ
cid=cid
Customer
Π
σ
T1(pid,name,price,pid,cid,store)
T2( . . . .)
T4(name,name)
Final answer
T3(. . . )
Execution order is now clearly specified
32
SELECT DISTINCT x.name, z.name FROM Product x, Purchase y, Customer z WHERE x.pid = y.pid and y.cid = z.cid and x.price > 100 and z.city = ‘Seattle’
Logical plan Many physical details are still left open!
Physical plan Concrete algorithm for each operator