sql, data storage technologies, and web-data integration week 4
TRANSCRIPT
SQL, Data Storage Technologies, and Web-Data
IntegrationWeek 4
Today’s Agenda
• Review
• Intro to SQL Continued– SELECT, GROUP BY, HAVING, DELETE,
UPDATE
• “Advanced” SQL– Joins, Functions, Locking tables, Transactions
Week 3 Review• Physical database design
– Column options• NOT NULL, DEFAULT, AUTO_INCREMENT, PRIMARY
KEY
• Client/Server Architecture• Connecting to SQL
– command line mysql client
• Introduction to SQL– SHOW, USE, CREATE, INSERT
Selecting Data
• Syntax– SELECT column_name,… FROM TableName– Use a “*” in place of the column_name list to
retrieve all columns– Examples:
• Show me all the data stored about our donors•mysql> SELECT * FROM Donor;• What are all the names of all our donors?•mysql> SELECT name FROM Donor;
Being More Specific
• Syntax– SELECT column_name,… FROM TableName
WHERE statement– Example
• Show me all the donors with the name “Jake Johnson”
•mysql> SELECT * FROM Donor WHERE name = ‘Jake Johnson’
Conditional OperatorsSQL Symbol Definition
= Matches if the values are equal
!= Matches if the values are not equal
> Matches if the left value is greater than the right value
< Matches if the left value is less than the right value
>= Matches if the left value is greater than or equal to the right value
<= Matches if the left value is less than or equal to the right value
IN (value,…) Matches if value is among the values listed
BETWEEN value AND value
Matches if value is between value1 and value2 or equal to one of them.
LIKE Matches if value matches the pattern expressed in value1 using any series of wildcard characters and anchors.
The LIKE comparison
• Uses wildcard characters to match column data– ‘_’ represent any one character– SELECT name FROM Donor WHERE name
LIKE ‘_ob’• Matches for “Bob”, “Rob”, “Job”, etc.
– ‘%’ represents any number of characters• Select name FROM Fruit WHERE name LIKE
‘%apple’• Matches for “Pineapple” and “Apple”
More LIKE
• Find all donors whose name starts with a "J”:• mysql> SELECT * FROM Donor WHERE name LIKE ‘J%’;
• Use “AND” or “OR” to add multiple restrictions in your WHERE clause
• Find all donors whose name starts with a “J” and have a 206 area code
• mysql> SELECT * FROM Donor WHERE name LIKE “J%” AND phone_number LIKE “206%”;
Other Constraints
• GROUP BY– Fun with aggregates
• HAVING
Ordering Your Data
• Syntax– SELECT column_name,… FROM TableName
ORDER BY column_name, …
• Example– List all donors in alphabetical order– mysql> SELECT * FROM Donor ORDER BY name;
More Ordering
• Feel free to combine with other constraints, such as WHERE– mysql> SELECT * FROM Donor WHERE name like ‘J%’ ORDER BY name;
• You can order by more than one column– SELECT * FROM Donor ORDER BY lastname,
firstname
• Swap the order with DESC or ASC– mysql> SELECT * FROM Donor WHERE name like ‘J%’ ORDER BY name DESC;
Grouping your data
• The GROUP BY clause groups data together so that aggregate functions can be performed on it.
• Very common for reports and statistics
• More interesting with large sets of data
Piping SQL commands to MySQL
• Sometimes we have a big file of SQL commands that we want to run.
• Quit your mysql client application– mysql> quit;
• Retrieve, and then upload to your dante account:– https://courses.washington.edu/wtcampus/spring/examples/sql/d
onation.sql• Look at the big file of SQL command
– $ less donation.sql• Use a Unix “pipe” to send the file of commands to MySQL
– $ /usr/local/mysql-5.0.67-linux-i686/bin/mysql –u uwnetid –p uwnetid < donation.sql
• The “<“ operator takes all the lines of text from donations.sql, and sends them to MySQL
Back to Grouping
• What does the Donation table look like?• mysql> DESCRIBE Donation;
Column Name Type Options DONATIONID Int unsigned Primary key
auto_increment date Datetime Not null amount Decimal(5.2) Not null processorName Varchar(255) DONORID Int unsigned
Group By
• Syntax– SELECT column_name,… FROM TableName
GROUP BY column_name, …
• Example– What is the total amount donated by each
donor?– mysql> SELECT donorid, SUM(amount) FROM Donation GROUP BY donorid;
GROUP BY with other constraints
• GROUP BY must come after any WHERE clause
• GROUP BY must come before any LIMIT or ORDER BY clause
The GroupiesAggregate Function Definition
AVG(column) Returns the average of the column values.
COUNT(column) Returns the number of times the column was not null or had a value.
MAX(column) Returns the maximum value of the column.
MIN(column) Returns the minimum value of the column.
STD(column) Returns the standard deviation of the column values.
SUM(column) Returns the sum of the column values.
GROUP BY
• SELECT column_name (or aggregate function), … FROM TableName WHERE clause GROUP BY column_name, …
• You can GROUP BY multiple columns– Example:
• How many of our donors have the same name?• SELECT fname, lname, COUNT(*) FROM Donor
GROUP BY fname, lname;
HAVING clause
• Syntax– SELECT FROM column_name,… FROM
TableName HAVING statement
• “statement” is the same set of conditionals that the WHERE clause has– So what is the difference between HAVING
and WHERE?
HAVING vs. WHERE
• The WHERE clause happens as MySQL is looking through its table
• The HAVING clause happens on the rows returned by the WHERE clause
• mysql> SELECT * FROM Donor HAVING name = ‘Jake Johnson’;– Twice as slow! First scan the Donor table for
all the rows, then scan all the rows again for names matching ‘Jake Johnson’.
HAVING
• Let's say we're interested in sending a letter to our top donors – those who donated more than $150.
• Use the GROUP BY clause and the SUM aggregate function to get a list of the total amounts.
• Adding the HAVING clause we can further restrict the results.
• mysql> SELECT donorid, SUM(amount) FROM Donation GROUP BY donorid HAVING SUM(amount) > 150;
Deleting rows
• Syntax– DELETE FROM TableName [whereclause]
• Example– mysql> DELETE FROM Donation;
Deleting is too Easy!
• Rows are hard to create, easy to destroy!
• Always use a WHERE clause!
• Example:– mysql> DELETE FROM Donor WHERE name = ‘Jake Johnson’;
• Best to write a SELECT first
Updating data
• Syntax– UPDATE TableName SET column_name =
value [where_clause]
• Let’s learn our lesson from delete, and always use the WHERE clause
• Example:– mysql> UPDATE DONOR SET address = ‘123 Home Lane’, phone_number = ‘555-1212’ WHERE Donorid = 1;
Practice: Using the Aggregates
• How many donations has each donor made?• What is the maximum donation amount made by
each donor?• What are the donorIDs for the top ten donors?• Of those who donated in 2003, what are the
donorIDs of the ten worst donors in 2003?• What is the total amount of donations we’ve
received in 2004?• What are the donorIDs for the ten best and ten
worst donors?
“Advance” SQL
• Joining tables– Inner vs. Outer
• Built in functions
• Table locking
• Transactions
Joining• Our queries on the Donation table only return
the DonorID.• Typically, we want to know the Donor’s name,
not their ID.• We could do two selects and collate the data
– SELECT donorid, SUM(amount) FROM Donation GROUP BY donorid;
– SELECT donorid, name FROM Donor;– match these up in our code
• Or, we could simply do this with one query
Joining• So far we’ve seen SELECTs on a single
table– How is this any better than using a Berkley
DB or text file on a local computer?
• Joins allow us to select information from more than one table and model the relationships in the conceptual model– We don’t want to know the donorIDs, we want
to know the donor names!
Simple Joining
• SELECT * FROM Table1, Table2;– Listing multiple tables after “FROM” joins
those tables together– This effectively creates a new schema, with
new tuples.
• Donor(donorid, name, address)• Donation(donationid, amount, donorid)• DonorDonation(donorid, name, address,
donationid, amount, donorid)
Cartesian (Cross) Product
• New “virtual” schema:– DonorDonation(donorid, name, address, donationid, amount,
donorid)
• New tuples:– (1, “Bob”, “123 St.”, 8, 50.00, 1)– (2, “Sue”, “345 Pl.”, 8, 50.00, 1)– (3, “Joe”, “678 Rd.”, 8, 50.00, 1)
• Every row in table A is joined with every row in table B (A x B).
• mysql> SELECT * FROM Donor, Donation;• 500 Donors x 3000 Donations = 1.5 Million rows!!
Enforcing the relationship
– (1, “Bob”, “123 St.”, 8, 50.00, 1)– (2, “Sue”, “456 Pl.”, 8, 50.00, 1)– (3, “Joe”, “789 Rd.”, 8, 50.00, 1)
• Knowing we have a Donation of (8, 50.00, 1), we are only interested in the row where the Donor was (1, “Bob”, “123 St.”)
• Solution: Use a WHERE clause just like we did before
Enforcing the Relationship
• Our new, new “virtual” schema:– DonorDonation(Donor.donorid, Donor.name,
Donor.address, Donation.donationid, Donation.amount, Donation.donorid)
• Select just the tuples that have matching donorIDs:– mysql> SELECT * FROM Donor, Donation WHERE Donor.donorid = Donation.donorid;
Enforcing the relation
• Tuples now only have data where the donorIDs match– (1, “Bob”, “123 St.”, 8, 50.00, 1)– (2, “Sue”, “456 Pl.”, 19, 175.00, 2)– (2, “Sue”, “456 Pl.”, 33, 25.00, 2)
• Our Donor - Donation relationship is now successfully modeled– One Donor (i.e.: “Sue”) has one or many
Donations (i.e.: 175.00, 25.00)
Refining your join• Just like a single table SELECT statement, you
can refine your multiple table SELECT statement– AND, OR, GROUP BY, HAVING, ORDER BY, LIMIT
• Example: What are the names of the top five donors that have donated at least $150, and how much have they donated?
SELECT Donor.name, SUM(Donation.amount)
FROM Donor, Donation
WHERE Donor.donorid = Donation.donorid
GROUP BY Donor.donorid
HAVING SUM(Donation.amount) > 150
ORDER BY SUM(Donation.amount) DESC
LIMIT 5;
More than two tables
• SELECT * FROM Donor, Donation, Processor WHERE Donor.donorid = Donation.donorid AND Donation.processorid = Processor.processorid;
• Order of the tables in not important
The Equality Test
• Typically you need an equality test for each extra table you add to the FROM clause.
• The equality checks are almost always between the primary key and the foreign keys of tables. (That’s why the foreign keys are there!)
Outer Joins
• The joins we’ve looked at only return a Donor who has made a Donation
• What if we want to know which Donors have not made any Donations?
• The solution is to use an Outer Join (MySQL supports this through the Left Join command)
Outer Joins
• An Outer Join will take all the rows from the Left table (or the Right, depending on the SQL/RDBMS), without requiring a match on the other table.
Outer Joins• SELECT columns FROM Table1 LEFT
JOIN Table2 ON equality_test [WHERE|GROUP BY|etc.]
• Example:– mysql> SELECT Donor.name, Donation.donationid FROM Donor LEFT JOIN Donation ON Donor.donorid = Donation.donorid;
Outer Joins
• An outer join will Null fill any columns from Table2 where the ON statement doesn’t match.– (1, “Bob”, “123 St.”, 8, 50.00, 1)– (2, “Sue”, “456 Pl.”, 19, 175.00, 2)– (3, “Joe”, “789 Rd.”, NULL, NULL, NULL)
• If a tuple from Table1 can be joined with any tuple from Table2, it will not be Null filled.
Outer Joins
• What if we want to know which Donors have not made any Donations?
• mysql> SELECT Donor.name, Donation.amount FROM Donor LEFT JOIN Donation ON Donor.donorid = Donation.donorid WHERE Donation.amount IS NULL;
A Lot of Typing
• SELECT Donor.name, SUM(Donation.amount), Processor.name FROM Donor LEFT JOIN Donation ON Donor.donorid = Donation.donorid LEFT JOIN Processor ON Donation.processorid = Processor.processorid GROUP BY Donor.donorid
Less Typing with Aliases
• You can give your Tables nicknames:
• SELECT Dr.name, SUM(Dn.amount), P.name FROM Donor AS Dr LEFT JOIN Donation AS Dn ON Dr.donorid = Dn.donorid LEFT JOIN Processor AS P ON Dn.processorid = P.processorid GROUP BY Dr.donorid
Renaming Output
• You can give your selected columns nicknames too
• SELECT Donor.name, SUM(Donation.amount) AS total, Processor.name FROM Donor LEFT JOIN Donation ON Donor.donorid = Donation.donorid LEFT JOIN ON Processor ON Donation.processorid = Processor.processorid GROUP BY Donor.donorid ORDER BY total
• You can’t always use aggregate functions in your ORDER BY, and you can’t always use them in your HAVING clause
Joining to Other Databases
• Sometimes you may want to share a database with other databases
• Example: You have a “Users” database that is shared between two applications, each of which has its own database.
• SELECT C.name, CN.nickname FROM Users.Customer AS C, CustomerNickname AS CN WHERE C.customerid = CN.customerid;
SQL Functions
• MySQL provides a lot of functions to munge the results of a query
• Example, returning a date– SELECT date FROM Donation WHERE
donationid=1;• 2004-10-14 15:52:08• Not very “pretty” for a user to see
SQL Functions
• Use the FORMAT_DATE() function instead!– SELECT FORMAT_DATE(date, “%m/%d/%y”)
FROM Donation WHERE donationid=1;• 10/13/04• Prettier!
– SELECT FORMAT_DATE(date, “%M %D, %Y”) FROM Donation WHERE donationid=1;
• October 13th, 2004
Some Common FunctionsFunction Result
ABS returns the absolute value of the column CONCAT returns the string formed by joining together all of the function
arguments DATE_ADD returns the date formed by adding a given amount of time to the date DATE_SUB returns the date formed by subtracting a given amount of time from
the date DATE_FORMAT returns a date formatted as you specify in the format string. This is
one of the most useful functions for printing dates in the format you desire
FORMAT returns a neatly formatted number with commas and the specified number of decimal places
ISNULL returns 1 if the value is NULL, zero otherwise LENGTH returns the number of characters in a string NOW returns the current date and time.
Example using functions
• Return a nicely formatted list of donation dates made in the last 10 days
• mysql> SELECT DATE_FORMAT(date, ‘%m/%d/%Y’) FROM Donation WHERE date > DATE_SUB(NOW(), INTERVAL 10 DAY);
Practice: Joining Tables
• Who Processed the most Donations?• Which Donors have made no Donations?• Which Division received the most money?• Which Donor gave the most to Healthcare?
mysql> SOURCE non-profit.sql
Transactions and Table Locking
• Usually, a database is being used concurrently by many different users– Example: Multiple processors will be entering
donation information
• Very important to maintain data integrity
• Transactions or Table Locking can help provide that data integrity
Table Locking
• Example: Say one of our data integrity checks is to make sure no Donor has the same name and address
• In our application, we would probably do something like:
– Check for any "Donor" with the same name and address.
– If there are no matches, insert our new "Donor."
Table Locking• A two step process is okay for one user, very
bad for more than one user• What if this happens:
User 1 User 2
Check for any donor with the name "John Doe."
Check for any donor with the name "John Doe."
No matches
No matches
insert new record
insert new record
Table Locking
• One solution is to use table locking
• User 1 would lock the table so that only s/he can use the table
• They can then check and insert to the table while User 2 waits for the table to become available
Table LockingUser 1 User 2
Lock table
Check for any donor with the name "John Doe."
Wait for table to unlock
No matches Wait for table to unlock
insert new record Wait for table to unlock
Unlock table Wait for table to unlock
Lock table
Check for any donor with the name "John Doe."
Match exists – do NOT insert
Unlock table
Table Locking
• Why not lock everything?
• Slower– Processing overhead to lock tables– Users forced to wait for tables to be unlocked
• Potentially Dangerous– What if you forget to unlock your table?– What if your application crashes?
More on Table Locking
• Syntax:– LOCK TABLE TableName (WRITE|READ), …– UNLOCK TABLES
• Use WRITE for when you need to insert into the table, and READ for when you just need to query the table
• You must lock any table you plan to use between your LOCK TABLE and UNLOCK TABLES commands.– mysql> LOCK TABLE Donor WRITE, Donation READ;
Transactions
• Transactions work by isolating a set of commands such that no other command can alter the data currently being worked on.– Treats a set of commands as one command– Works like table locking in this sense– Also slow like table locking in this sense
Transactions
• Transactions aren’t automatically available on all of your database tables.– You have to be using the InnoDB or BDB
table types– Default table type is MyISAM
• Let’s create one!– mysql> CREATE TABLE Account
(ACCOUNTID INT UNSIGNED PRIMARY KEY AUTO_INCREMENT, balance DOUBLE) ENGINE = InnoDB;
Transactions
• Start a transaction with BEGIN• Transaction isn’t completed until a
COMMIT• Any commands in between are treated as
one command• If something goes wrong, you can
ROLLBACK– reverts the database to the state it was when
you began
Transactions
• Transactions are fairly complex– Not covered in-depth for “Intro to SQL” course
• Transactions are fairly powerful as well– Some external resources– MySQL: MySQL Manual
"1.8.5.3 Transactions and Atomic Operations"http://dev.mysql.com/doc/mysql/en/ANSI_diff_Transactions.html
– Sams Publishing"MySql Transactions Overview“ http://www.samspublishing.com/articles/article.asp?p=29312