sql introduction - imedmca10.files.wordpress.com · sql introduction sql is an acronym for...

SQL INTRODUCTION

SQL is an acronym for Structured Query Language and is a standard relational query language (SQL has been standardized by both ANSI and ISO) used for interaction with databases. SQL was developed by IBM in 1970s and has its roots in the relational algebra defined by Codd in 1972. SQL functionality goes beyond the relational algebra, allowing retrieving data, inserting data, modifying existing data and deleting data from/to a RDBMS. SQL features arithmetic operators like division, multiplication, subtraction and addition, and comparison operators (=, >=, <=). SQL also defines several aggregate functions like MAX, MIN, AVG, COUNT, and SUM. SQL defines many keywords, which can be divided into several categories. The first SQL keyword category is for keywords used for data retrieval like the SELECT keyword. The second category is for the SQL keywords used for data manipulation like the INSERT, UPDATE, and DELETE SQL keywords. The third category is the transactional SQL keywords category, featuring keywords like COMMIT and ROLLBACK. Another SQL keyword category is the SQL Data Definition Language category featuring words like CREATE and DROP. Yet another category of SQL keywords controls the authorization and permission aspects of RDBMS (GRANT and REVOKE keywords). SQL is pronounced as “S-Q-L” or “see-quill”. SQL uses -- character sequence as a single line comment identifier. SQL commands are not case sensitive and the following SQL queries are equivalent:

SELECT * FROM Users select * from Users There are many SQL implementations also called SQL dialects and SQL extensions. For example MS SQL Server specific version of the SQL is called Transact-SQL, Oracle version of SQL is called PL/SQL, MS Access version of SQL is called JET SQL. This SQL Tutorial will show you how to use SQL and its commands. You will be able to apply most of the knowledge gathered from this SQL tutorial to any Relational Database Management System.

RDBMS AND TABLES

RDBMS is an acronym for Relational Database Management System. The data in RDBMS is stored in database objects called tables. The database tables are the primary data storage for every RDBMS and essentially they are collections of related data entries. For example a table called Users might store information about many persons, and each entry in this table will represent one unique user. Even though all user entries in the Users table are unique, they are related in the sense that they describe similar objects. Table Users

FirstName LastName DateOfBirth John Smith 12/12/1969 David Stonewall 01/03/1954 Susan Grant 03/03/1970

Each database table consists of columns and rows. Each table column defines the type of data stored in it, and this data type is valid for all rows in this table. A table row is a collection of data having 1 entry for each column in this particular table. RDBMS store the data into group of tables, which might or might not be related by common fields (database table columns). RDBMS also provide relational operators to insert/update/delete information stored into the database tables. MS SQL Server, DB2, Oracle and MySQL are all Relational Database Management Systems. I’ll be using the RDBMS and database words interchangeably throughout this SQL Tutorial, so whenever I use the word database I mean RDBMS and the other way around.

SQL SELECT

SQL SELECT is without a doubt the most frequently used SQL command that’s why we are starting our tutorial with it. The SQL SELECT command is used to retrieve data from one or more database tables. To illustrate the usage of the SELECT command we are going to use the Users table defined in the previous chapter:

FirstName LastName DateOfBirth John Smith 12/12/1969 David Stonewall 01/03/1954 Susan Grant 03/03/1970 The SQL statement below shows a simple usage of the SQL SELECT command:

SELECT FirstName, LastName, DateOfBirth FROM Users Let’s examine this SQL statement. The statement starts with the SELECT keyword followed by a list of table columns. This list of columns specifies which columns you want to retrieve from your table. The list of columns is followed by the SQL keyword FROM and the table name (the table we are selecting data from). There is a special syntax that can be used with the SELECT command, if you want to retrieve all columns from a table. To do that replace the list of columns with the * symbol and voila, you’ve selected all columns from the table:

SELECT * FROM Users It’s a prudent programming practice to explicitly specify the list of columns after the SELECT command, as this will improve your query performance significantly. The SELECT INTO statement retrieves data from a database table and inserts it to another table. Consider the SELECT INTO example below:

SELECT FirstName, LastName, DateOfBirth INTO UsersBackup FROM Users The first part of the statement looks familiar and is just selecting several columns. The second part of this SQL statement is the important part, which specifies to insert the rows into the UsersBackup table. The last part specifies which table to get the rows from. This example assumes that both Users and UsersBackup tables have identical structure. You can use the following SQL query to make an exact copy of the data in the Users table:

SELECT * INTO UsersBackup FROM Users So far you learnt how to specify which columns to select and from which table, but you might be wondering how many rows of data will actually be returned from these SQL statements? The answer is simple – all of them. But what if you have a table with 5 million rows, and you only need to select a few rows satisfying certain criteria? Fortunately there is a way to conditionally select data from a table or several tables. Enter the SQL WHERE command.

SQL WHERE

The SQL WHERE keyword is used to select data conditionally, by adding it to already existing SQL SELECT query. The WHERE keyword can be used to insert, update and delete data from table(s), but for now we’ll stick with conditionally retrieving data, as we already know how to use the SELECT keyword. In order to illustrate better the WHERE keyword applications, we are going to add 2 columns to the Users table we used in the previous chapters and we’ll also add a few more rows with actual data entries:

FirstName LastName DateOfBirth Email City John Smith 12/12/1969 [email protected] New York David Stonewall 01/03/1954 [email protected] San Francisco Susan Grant 03/03/1970 [email protected] Los Angeles Paul O'Neil 09/17/1982 [email protected] New York Stephen Grant 03/03/1974 [email protected] Los Angeles Consider the following SQL query:

SELECT FirstName, LastName, City FROM Users WHERE City = 'Los Angeles' The result of the SQL expression above will be the following:

FirstName LastName City Susan Grant Los Angeles Stephen Grant Los Angeles Our SQL query used the "=" (Equal) operator in our WHERE criteria:

City = 'Los Angeles' As you can see we have selected only the users which entries have the value ‘Los Angeles’ in the City column. You may also have noticed that we put the Los Angeles string value into single quotes. Whenever you use string (character) values in your SQL queries, you have to put them between single quotes. For example the SQL query below will fail because it uses double quotes instead of single quotes for the string value:

SELECT FirstName, LastName FROM Users WHERE City = ”Los Angeles” But what to do if we want to retrieve all users having LastName O’Neil? The SQL statement below will fail:

SELECT FirstName, LastName FROM Users WHERE LastName = ‘O’Neil’ The reason for the failure is the single quote which is part of the string we used in our WHERE criteria. The SQL engine will try to interpret our SQL statement and will consider the single quote inside the string as the end of that string. The remaining part of the SQL statement will be Neil’, which cannot be interpreted correctly, thus we’ll get an error. So how do we deal with strings having single quotes then? The answer is simple – by replacing all single quotes in our string with two single quotes. When we have two single quotes together, they are interpreted by SQL as one single quote. Here is our improved SQL statement which will work correctly:

SELECT FirstName, LastName FROM Users WHERE LastName = 'O''Neil' We used the = (Equal) operator in the examples above, but you can use any of the following comparison operators in conjunction with the SQL WHERE keyword: <> (Not Equal)

SELECT FirstName, LastName FROM Users WHERE FirstName <> 'Jon' > (Greater than)

SELECT FirstName, LastName FROM Users WHERE DateOfBirth > '02/03/1970' >= (Greater or Equal)

SELECT FirstName, LastName FROM Users WHERE DateOfBirth >= '02/03/1970' < (Less than)

SELECT FirstName, LastName FROM Users WHERE DateOfBirth < '02/03/1970' <= (Less or Equal)

SELECT FirstName, LastName FROM Users WHERE DateOfBirth <= '02/03/1970' In addition to the comparison operators you can use WHERE along with logical operators. SQL logical operators are used to combine two or more criterions in the WHERE clause of an SQL statement. If we want to select all users from our Users table, which live in New York and are born after 10/10/1975 we will use the following SQL query:

SELECT FirstName, LastName, DateOfBirth, Email, City FROM Users WHERE City = 'New York' AND DateOfBirth > '10/10/1975' Here is the result of the above SELECT:

FirstName LastName DateOfBirth Email City Paul O'Neil 09/17/1982 [email protected] New York As you can see we now have to criteria concatenated with the AND logical operator, which means that both conditions have to be true. If we want to select all users from our Users table, which live in New York or are born after 10/10/1975 we will use the following SQL query:

SELECT FirstName, LastName, DateOfBirth, Email, City FROM Users WHERE City = 'New York' OR DateOfBirth > '10/10/1975' The result is:

FirstName LastName DateOfBirth Email City John Smith 12/12/1969 [email protected] New York Paul O'Neil 09/17/1982 [email protected] New York Stephen Grant 03/03/1974 [email protected] Los Angeles This time the two criteria are joined with OR, which means that all rows satisfying at least one of them will be returned. You can use the NOT logical operator in your SQL statements too. Consider the following example:

SELECT FirstName, LastName, DateOfBirth, Email, City FROM Users WHERE City NOT LIKE '%York%'

This statement will select all users whose city name doesn’t contain the string York. (I’ve explained the LIKE statement below). LIKE (similar to)

SELECT FirstName, LastName FROM Users WHERE FirstName LIKE 'S%' We'll talk about the LIKE keyword later, but for now it’s enough to know that the SQL statement above returns all users with first name starting with the letter S. When you use the % character inside a LIKE expression, the % is considered to be a wildcard (note that the syntax I’ve used is for SQL Server, and different SQL implementations may have different syntax for wildcard character %). You can use the WHERE keyword along with the BETWEEN keyword which defines a range:

SELECT FirstName, LastName FROM Users WHERE DateOfBirth BETWEEN '02/03/1970' AND '10/10/1972' You can use the WHERE keyword along with the IN keyword which defines a criteria list:

SELECT FirstName, LastName FROM Users WHERE City IN (‘Los Angeles’, ‘New York’) The SQL statement above will return all users from Los Angeles and New York.

SQL INSERT INTO

The SQL INSERT INTO clause is used to insert data into a SQL table. The SQL INSERT INTO is frequently used and has the following generic syntax:

INSERT INTO Table1 (Column1, Column2, Column3…) VALUES (ColumnValue1, ColumnValue2, ColumnValue3 …) The SQL INSERT INTO clause has actually two parts – the first specifying the table we are inserting into and giving the list of columns we are inserting values for, and the second specifying the values inserted in the column list from the first part. Consider the Users table from the previous chapter:

FirstName LastName DateOfBirth Email City John Smith 12/12/1969 [email protected] New York David Stonewall 01/03/1954 [email protected] San Francisco Susan Grant 03/03/1970 [email protected] Los Angeles Paul O'Neil 09/17/1982 [email protected] New York Stephen Grant 03/03/1974 [email protected] Los Angeles If we want to enter a new data row into the Users table, we can do it with the following SQL INSERT INTO statement:

INSERT INTO Users (FirstName, LastName, DateOfBirth, Email, City) VALUES ('Frank', 'Drummer', '10/08/1955', '[email protected]', 'Seattle') If we select all the data from the Users table after we have executed the SQL INSERT INTO above, we'll get the following result:

FirstName LastName DateOfBirth Email City John Smith 12/12/1969 [email protected] New York David Stonewall 01/03/1954 [email protected] San Francisco Susan Grant 03/03/1970 [email protected] Los Angeles Paul O'Neil 09/17/1982 [email protected] New York Stephen Grant 03/03/1974 [email protected] Los Angeles Frank Drummer 10/08/1955 [email protected] Seattle One interesting question about SQL INSERT INTO is, would it be possible to insert values for only part of the columns, instead for all of them? The answer is yes, as long as the columns that we are skipping can have NULL value or have default value specified. Here is an example of using SQL INSERT INTO to insert a new row and supply values for only the first 4 columns of the Users table:

INSERT INTO Users (FirstName, LastName, DateOfBirth, Email) VALUES ('Frank', 'Drummer', '10/08/1955', '[email protected]') In the above example we assumed that the values in the last column Email can be NULL values. The result of the SQL INSERT above will be:

FirstName LastName DateOfBirth Email City John Smith 12/12/1969 [email protected] New York David Stonewall 01/03/1954 [email protected] San Francisco Susan Grant 03/03/1970 [email protected] Los Angeles Paul O'Neil 09/17/1982 [email protected] New York Stephen Grant 03/03/1974 [email protected] Los Angeles Frank Drummer 10/08/1955 [email protected] NULL If you are inserting a new row and you are supplying values for all columns, then you can skip the entire column list in your statement. For example the following two SQL INSERT statements are equivalent:

INSERT INTO Users VALUES ('Frank', 'Drummer', '10/08/1955', '[email protected]', 'Seattle') INSERT INTO Users (FirstName, LastName, DateOfBirth, Email, City) VALUES ('Frank', 'Drummer', '10/08/1955', '[email protected]', 'Seattle') However if you skip the column list and do not supply value for all columns at the same time, then you'll get an error as a result of your SQL INSERT INTO execution:

INSERT INTO Users VALUES ('Frank', 'Drummer', '10/08/1955', '[email protected]') The above SQL INSERT will produce error, because we haven't specified value for the City column.

SQL DISTINCT

The SQL DISTINCT command used along with the SELECT keyword retrieves only unique data entries depending on the column list you have specified after it. To illustrate the usage of the DISTINCT keyword, we’ll use our Users table introduced in the previous chapters.

FirstName LastName DateOfBirth Email City John Smith 12/12/1969 [email protected] New York David Stonewall 01/03/1954 [email protected] San Francisco Susan Grant 03/03/1970 [email protected] Los Angeles Paul O'Neil 09/17/1982 [email protected] New York Stephen Grant 03/03/1974 [email protected] Los Angeles Our Users table has several users in it, and it would be interesting to retrieve a list with all cities where our users live. If we use the statement below, we will get our city list, but there will be repetitions in it, because some in some cases more than one user lives in certain city:

SELECT City FROM Users So how do we do get a list with all cities without repeating them? As you have guessed we’ll use the DISTINCT keyword:

SELECT DISTINCT City FROM Users The result of the SQL DISTINCT expression above will look like this:

City New York San Francisco Los Angeles Essentially what the DISTINCT keyword does is removing the duplicates from the result set returned by your SELECT SQL statement. You can use the DISTINCT keyword with more than one column. Please consider the example below:

SELECT DISTINCT LastName, City FROM Users What the above statement will do is to return all unique combinations between LastName and City columns. Here is what the result of this statement will be:

LastName City Smith New York Stonewall San Francisco Grant Los Angeles O'Neil New York If you have a look at the original table above, you’ll notice that there are two users with identical names (Grant), who happen to live in the same city (Los Angeles). Because the combination of

LastName and City values for both this users is not unique, we got only one row with it, when we used the DISTINCT keyword. On the other hand if we add one more column (Email) after the DISTINCT keyword:

SELECT DISTINCT LastName, Email, City FROM Users We’ll retrieve both users with last name Grant, simply because they have different emails and thus their entries are unique as far as our SQL statement is concerned:

LastName Email City Smith [email protected] New York Stonewall [email protected] San Francisco Grant [email protected] Los Angeles O'Neil [email protected] New York Grant [email protected] Los Angeles

SQL UPDATE

So far we only looked at retrieving data from SQL database, but we never talked about modifying/updating data. The SQL UPDATE command is used to modify data stored in database tables. If you want to update the email of one of the users in our Users table, you’ll use a SQL statement like the one below:

UPDATE Users SET Email = '[email protected]' WHERE Email = '[email protected]' Let’s examine the statement above. The first row has the keyword UPDATE followed by the name of the table we are updating. The second row is the row that defines the changes made to the database fields using the keyword SET followed by the column name, equal sign and the new value for this column. You can have more than one assignment of new value after the SET keyword, for example if you want to update both the email and the city you will use the SQL statement below:

UPDATE Users SET Email = '[email protected]', City = 'San Francisco' WHERE Email = '[email protected]' The third line is our WHERE clause, which specifies which record(s) to update. In our case it says to update the Email filed of the row having email [email protected]. What happens if you remove the WHERE clause and your SQL query looks like this:

UPDATE Users SET Email = '[email protected]' The answer is that all Email entries in the Users table will be changed to [email protected]. Most likely you will not want to do something like this, but you might have a case when you need to update several table rows at once. For example if one of the company’s offices has been moved from San Francisco to Los Angeles you might want to update

all users with City San Francisco to Los Angeles (we assume that the employees have moved too). To do that, use the following SQL statement:

UPDATE Users SET City = 'Los Angeles' WHERE City = 'San Francisco' In both UPDATE example having WHERE clause above, I’ve changed a table field to new value, using the same field in the WHERE clause criteria. This was purely coincidental and you can update different field(s) than the one used in your WHERE criteria, for example:

UPDATE Users SET Email = '[email protected]' WHERE FirstName = 'Stephen' AND LastName = 'Grant' When updating make sure that the WHERE clause criteria you have specified updates only the rows you want. Using the example above if you didn’t have FirstName = ‘Stephen’ in your WHERE criteria, you would have updated 2 records (Susan Grant and Stephen Grant), because both these users have one and the same last name.

SQL DELETE

You already know how to retrieve, insert and update data in SQL database table. In this chapter we’ll learn how to delete data from a table using the SQL DELETE command. Using our Users table we will illustrate the SQL DELETE usage. One of the users in the Users table (Stephen Grant) has just left the company, and your boss has asked you to delete his record. How do you do that? Consider the SQL statement below:

DELETE FROM Users WHERE LastName = 'Grant' The first line in the SQL DELETE statement above specifies the table that we are deleting the record(s) from. The second line (the WHERE clause) specifies which rows exactly do we delete (in our case all rows which has LastName of ‘Grant’). As you can see the DELETE SQL queries have very simple syntax and in fact are very close to the natural language. But wait, there is something wrong with the statement above! The problem is that we have more than one user having last name of ‘Grant’, and all users with this last name will be deleted. Because we don’t want to do that, we need to find a table field or combination of fields that uniquely identifies the user Stephen Grant. Looking at the Users table an obvious candidate for such a unique field is the Email column (it’s not likely that different users use one and the same email). Our improved SQL query which deletes only the record of Stephen Grant’s record will look like this:

DELETE FROM Users WHERE Email = '[email protected]' What happens if you don’t specify a WHERE clause in your DELETE query?

DELETE FROM Users The answer is that all records in the Users table will be deleted. The SQL TRUNCATE statement below will have the exact same effect as the last DELETE statement:

TRUNCATE TABLE Users

The TRUNCATE statement will delete all rows in the Users table, without deleting the table itself. Be very careful when using DELETE and TRUNCATE, because you cannot undo these statements, and once row(s) are deleted for your table they are gone forever if you don’t have a backup.

SQL ORDER BY

SQL ORDER BY clause is used to order the data sets retrieved from a SQL database. The ordering of the selected data can be done by one or more columns in a table. If we want to sort our Users table by the FirstName column, we’ll have to use the following ORDER BY SQL statement:

SELECT * FROM Users ORDER BY FirstName The result of the ORDER BY statement above will be the following:

FirstName LastName DateOfBirth Email City David Stonewall 01/03/1954 [email protected] San Francisco John Smith 12/12/1969 [email protected] New York Paul O'Neil 09/17/1982 [email protected] New York Stephen Grant 03/03/1974 [email protected] Los Angeles Susan Grant 03/03/1970 [email protected] Los Angeles As you can see the rows are ordered alphabetically by the FirstName column. You can use ORDER BY to order the retrieved data by more than one column. For example, if you want to order by both LastName and City columns, you would do it with the following ORDER BY statement:

SELECT * FROM Users ORDER BY LastName, DateOfBirth Here is the result of this ORDER BY statement:

FirstName LastName DateOfBirth Email City Susan Grant 03/03/1970 [email protected] Los Angeles Stephen Grant 03/03/1974 [email protected] Los Angeles Paul O'Neil 09/17/1982 [email protected] New York John Smith 12/12/1969 [email protected] New York David Stonewall 01/03/1954 [email protected] San Francisco When using ORDER BY with more than one column, you need to separate the list of columns following ORDER BY with commas. What will happen if we reverse the order of the columns specified after the ORDER BY statement like in the statement below?

SELECT * FROM Users ORDER BY DateOfBirth, LastName This ORDER BY statement will return the same results as the one with the reversed columns order, but they will be ordered differently. Here is the result:

FirstName LastName DateOfBirth Email City David Stonewall 01/03/1954 [email protected] San Francisco John Smith 12/12/1969 [email protected] New York Susan Grant 03/03/1970 [email protected] Los Angeles Stephen Grant 03/03/1974 [email protected] Los Angeles Paul O'Neil 09/17/1982 [email protected] New York The ORDER BY clause first sorts the retrieved data by the first column, then the next one, and so forth. In all the ORDER BY examples so far, we were sorting alphabetically for character columns (FirstName, LastName) and from earlier to later date for the DateOfBirth column. What do we do if we want to order our data alphabetically but this time backwards? In order to accomplish that we need to use the DESC SQL keyword:

SELECT * FROM Users ORDER BY FirstName DESC Here is the result:

FirstName LastName DateOfBirth Email City Susan Grant 03/03/1970 [email protected] Los Angeles Stephen Grant 03/03/1974 [email protected] Los Angeles Paul O'Neil 09/17/1982 [email protected] New York John Smith 12/12/1969 [email protected] New York David Stonewall 01/03/1954 [email protected] San Francisco When you add the keyword DESC after a column name in the ORDER BY clause, you are still ordering by this column but the result is retrieved backwards. The opposite of the DESC keyword is the ASC keyword which orders by the specified columns alphabetically. But how did our previous statements know to order the data alphabetically, when we didn’t specify the ASC keyword? The answer is simple, when you don’t specify ASC or DESC after a column in the ORDER BY column list, then the ordering is done ASC (alphabetically, from low to high) by default. It’s important to remember that whenever you are ordering by more than one column, you need to specify ASC and/or DESC after each column, if you need specific ordering. For example the statement below will order by both LastName and DateOfBirth but only LastName will be in descending order:

SELECT * FROM Users ORDER BY DateOfBirth, LastName DESC If you want to order descending by both columns you need to change the ORDER BY statement to this:

SELECT * FROM Users ORDER BY DateOfBirth DESC, LastName DESC.

SQL AGGREGATE FUNCTIONS

SQL Aggregate functions return a single value, using values in a table column. In this chapter we are going to introduce a new table called Sales, which will have the following columns and data:

OrderID OrderDate OrderPrice OrderQuantity CustomerName 1 12/22/2005 160 2 Smith 2 08/10/2005 190 2 Johnson 3 07/13/2005 500 5 Baldwin 4 07/15/2005 420 2 Smith 5 12/22/2005 1000 4 Wood 6 10/2/2005 820 4 Smith 7 11/03/2005 2000 2 Baldwin The SQL COUNT function returns the number of rows in a table satisfying the criteria specified in the WHERE clause. If we want to count how many orders has made a customer with CustomerName of Smith, we will use the following SQL COUNT expression:

SELECT COUNT (*) FROM Sales WHERE CustomerName = 'Smith' Let’s examine the SQL statement above. The COUNT keyword is followed by brackets surrounding the * character. You can replace the * with any of the table’s columns, and your statement will return the same result as long as the WHERE condition is the same. The result of the above SQL statement will be the number 3, because the customer Smith has made 3 orders in total. If you don’t specify a WHERE clause when using COUNT, your statement will simply return the total number of rows in the table, which in our case is 7:

SELECT COUNT(*) FROM Sales How can we get the number of unique customers that have ordered from our store? We need to use the DISTINCT keyword along with the COUNT function to accomplish that:

SELECT COUNT (DISTINCT CustomerName) FROM Sales The SQL SUM function is used to select the sum of values from numeric column. Using the Sales table, we can get the sum of all orders with the following SQL SUM statement:

SELECT SUM(OrderPrice) FROM Sales As with the COUNT function we put the table column that we want to sum, within brackets after the SUM keyword. The result of the above SQL statement is the number 4990. If we want to know how many items have we sold in total (the sum of OrderQuantity), we need to use this SQL statement:

SELECT SUM(OrderQuantity) FROM Sales

The SQL AVG function retrieves the average value for a numeric column. If we need the average number of items per order, we can retrieve it like this:

SELECT AVG(OrderQuantity) FROM Sales Of course you can use AVG function with the WHERE clause, thus restricting the data you operate on:

SELECT AVG(OrderQuantity) FROM Sales WHERE OrderPrice > 200 The above SQL expression will return the average OrderQuantity for all orders with OrderPrice greater than 200, which is 17/5. The SQL MIN function selects the smallest number from a numeric column. In order to find out what was the minimum price paid for any of the orders in the Sales table, we use the following SQL expression:

SELECT MIN(OrderPrice) FROM Sales The SQL MAX function retrieves the maximum numeric value from a numeric column. The MAX SQL statement below returns the highest OrderPrice from the Sales table:

SELECT MAX(OrderPrice) FROM Sales

SQL GROUP BY

The SQL GROUP BY statement is used together with the SQL aggregate functions to group the retrieved data by one or more columns. The GROUP BY concept is one of the most complicated concepts for people new to the SQL language and the easiest way to understand it, is by example. We want to retrieve a list with unique customers from our Sales table, and at the same time to get the total amount each customer has spent in our store.

OrderID OrderDate OrderPrice OrderQuantity CustomerName 1 12/22/2005 160 2 Smith 2 08/10/2005 190 2 Johnson 3 07/13/2005 500 5 Baldwin 4 07/15/2005 420 2 Smith 5 12/22/2005 1000 4 Wood 6 10/2/2005 820 4 Smith 7 11/03/2005 2000 2 Baldwin You already know how to retrieve a list with unique customer using the DISTINCT keyword:

SELECT DISTINCT CustomerName FROM Sales The SQL statement above works just fine, but it doesn’t return the total amount of money spent for each of the customers. In order to accomplish that we will use both SUM SQL function and the GROUP BY clause:

SELECT CustomerName, SUM(OrderPrice) FROM Sales GROUP BY CustomerName We have 2 columns specified in our SELECT list – CustomerName and SUM(OrderPrice). The problem is that SUM(OrderPrice), returns a single value, while we have many customers in our Sales table. The GROUP BY clause comes to the rescue, specifying that the SUM function has to be executed for each unique CustomerName value. In this case the GROUP BY clause acts similar to DISTINCT statement, but for the purpose of using it along with SQL aggregate functions. The result set retrieved from the statement above will look like this

CustomerName OrderPrice Baldwin 2500 Johnson 190 Smith 1400 Wood 1000 You do grouping using GROUP BY by more than one column, for example:

SELECT CustomerName, OrderDate, SUM(OrderPrice) FROM Sales GROUP BY CustomerName, OrderDate When grouping, keep in mind that all columns that appear in your SELECT column list, that are not aggregated (used along with one of the SQL aggregate functions), have to appear in the GROUP BY clause too.

SQL HAVING

The SQL HAVING clause is used in conjunction with the SELECT clause to specify a search condition for a group or aggregate. The HAVING clause behaves like the WHERE clause, but is applicable to groups - the rows in the result set representing groups. In contrast the WHERE clause is applied to individual rows, not to groups. To clarify how exactly HAVING works, we’ll use the Sales table:

OrderID OrderDate OrderPrice OrderQuantity CustomerName 1 12/22/2005 160 2 Smith 2 08/10/2005 190 2 Johnson 3 07/13/2005 500 5 Baldwin 4 07/15/2005 420 2 Smith 5 12/22/2005 1000 4 Wood 6 10/2/2005 820 4 Smith 7 11/03/2005 2000 2 Baldwin In the previous chapter we retrieved a list with all customers along with the total amount each customer has spent respectively and we use the following statement:

SELECT CustomerName, SUM(OrderPrice) FROM Sales GROUP BY CustomerName This time we want to select all unique customers, who have spent more than 1200 in our store. To accomplish that we’ll modify the SQL statement above adding the HAVING clause at the end of it:

SELECT CustomerName, SUM(OrderPrice) FROM Sales GROUP BY CustomerName HAVING SUM(OrderPrice) > 1200 The result of the SELECT query after we added the HAVING search condition is below: CustomerName OrderPrice Baldwin 2500 Smith 1400

CustomerName OrderPrice Baldwin 2500 Smith 1400 Another useful example of the HAVING clause, will be if we want to select all customers that have ordered more than 5 items in total from all their orders. OUR HAVING statement will look like this:

SELECT CustomerName, SUM(OrderQuantity) FROM Sales GROUP BY CustomerName HAVING SUM(OrderQuantity) > 5 You can have both WHERE and HAVING in one SELECT statement. For example you want to select all customers who have spent more than 1000, after 10/01/2005. The SQL statement including both HAVING and WHERE clauses will look like this:

SELECT CustomerName, SUM(OrderPrice) FROM Sales WHERE OrderDate > ‘10/01/2005’ GROUP BY CustomerName HAVING SUM(OrderPrice) > 1000 Here is something very important to keep in mind. The WHERE clause search condition is applied to each individual row in the Sales table. After that the HAVING clause is applied on the rows in the final result, which are a product of the grouping. The important thing to remember is that the grouping is done only on the rows that satisfied the WHERE clause condition.

RELATIONS, KEYS AND NORMALIZATION

So far in all SQL examples we had we were dealing with a single table. The truth is that in real life when dealing with databases you’ll have to work with many tables, which are interrelated. The true power of the Relational Database Management Systems is the fact that they are Relational. The relationships in a RDBMS ensure that there is no redundant data. What is redundant data, you might ask? I’ll answer you with example. An online store, offers computers for sale and the easiest way to track the sales will be to keep them in a database. You can have a table called Product, which will hold information about each computer - model name, price and the manufacturer. You also need to keep some details about the manufacturer like their website and their support email. If you store the manufacturer details in the Product table, you will have the manufacturer contact info repeated for each computer model the manufacturer produces:

model Price Manufacturer ManufacturerWebsite ManufacturerEmail Inspiron B120 $499 Dell http://www.dell.com [email protected] Inspiron B130 $599 Dell http://www.dell.com [email protected]

Inspiron E1705 $949 Dell http://www.dell.com [email protected] Satellite A100 $549 Toshiba http://www.toshiba.com [email protected] Satellite P100 $934 Toshiba http://www.toshiba.com [email protected] To get rid of the redundant manufacturer data in the Product table, we can create a new table called Manufacturer, which will have only one entry (row) for each manufacturer and we can link (relate) this table to the Product table. To create this relation we need to add additional column in the Product table that references the entries in the Manufacturer table. A relationship between 2 tables is established when the data in one of the columns in the first table matches the data in a column in the second table. To explain this further we have to understand SQL relational concepts – Primary Key and Foreign Key. Primary Key is a column or a combination of columns that uniquely identifies each row in a table. Foreign Key is a column or a combination of columns whose values match a Primary Key in a different table. In the most common scenario the relationship between 2 tables matches the Primary Key in one of the tables with a Foreign Key in the second table. Consider the new Product and Manufacturer tables below: Manufacturer ManufacturerID Manufacturer ManufacturerWebsite ManufacturerEmail 1 Dell http://www.dell.com [email protected] 2 Toshiba http://www.toshiba.com [email protected] Product model Price ManufacturerID Inspiron B120 $499 1 Inspiron B130 $599 1 Inspiron E1705 $949 1 Satellite A100 $549 2 Satellite P100 $934 2 The first table is Manufacturer which has 2 entries for Dell and Toshiba respectively. Each of these entries has a ManufacturerID value, which is unique integer number. Because the ManufacturerID column is unique for the Manufacturer table we can use it as a Primary Key in this table. The Product table retains the Model and the Price columns, but has a new column called ManufacturerID, which matches the values of the ManufacturerID column in the Manufacturer table. All values in the ManufacturerID column in the Product table have to match one of the values in the Manufacturer table Primary Key (for example you can’t have ManufacturerID with value of 3 in the Product table, simply because there is no manufacturer with this ManufacturerID defined in the Manufacturer table). I’m sure you’ve noticed that we used the same name for the Primary Key in the first table as for the Foreign Key in the second. This was done on purpose to show the relationship between the 2 tables based on these columns. Of course you can call the 2 columns with different names, but if somebody sees your database for a first time it won’t be immediately clear that these 2 tables are related. But how do we ensure that the Product table doesn’t have invalid entries like the last entry below:

model Price ManufacturerID Inspiron B120 $499 1 Inspiron B130 $599 1 Inspiron E1705 $949 1

Satellite A100 $549 2 Satellite P100 $934 2 ThinkPad Z60t $849 3 We do not have a manufacturer with ManufacturerID of 3 in our Manufacturer table, hence this entry in the Product table is invalid. The answer is that you have to enforce referential integrity between the 2 tables. Different RDBMS have different ways to enforce referential integrity, and I will not go into more details as this is not important to understand the concept of relationship between tables. There are 3 types of relations between tables – One-To-Many, Many-To-Many and One-To-One. The relation we created above is One-To-Many and is the most common of the 3 types. In One-To-Many relation a row in one of the tables can have many matching rows in the second table, but a row the second table can match only one row in the first table. In our example, each manufacturer (a row in the Manufacturer table) produces several different computer models (several rows in the Product table), but each particular product (a row in the Product table) has only one manufacturer (a row in the Manufacturer table). The second type is the Many-To-Many relation. In this relation many rows from the first table can match many rows in the second and the other way around. To define this type of relation you need a third table whose primary key is composed of the 2 foreign keys from the other 2 table. To clarify this relation lets review the following example. We have a Article table (ArticleID is primary key) and Category (CategoryID is primary key) table. Every article published in the Article table can belong to multiple categories. To accommodate that, we create a new table called ArticleCategory, which has only 2 columns – ArticleID and CategoryID (these 2 columns form the primary key for this table). This new table called sometimes junction table defines the Many-To-Many relationship between the 2 main tables. One article can belong to multiple categories, and every category may contain more than one article. In the One-To-One relation each row in the first table may match only one row in the second and the other way around. This relationship is very uncommon simply because if you have this type of relation you may as well keep all the info in one single table. By dividing the data into 2 tables we successfully removed the redundant manufacturer details from the initial Product table adding an integer column referencing the new Manufacturer table instead. The process of removing redundant data by creating relations between tables is known as Normalization. Normalization process uses formal methods to design the database in interrelated tables.

SQL JOIN

In all SQL examples so far we’ve selected data from a single table. In the previous chapter we learned how the concepts of Primary Key and Foreign Key, and how database table relate to one another. Using that knowledge we can move forward and learn how to select data from more than one table in one SQL statement.

The SQL JOIN clause is used to retrieve data from 2 or more tables joined by common fields. The most common scenario is a primary key from one of the tables matches a foreign key in second table. We will use the 2 related tables Product and Manufacturer from the previous chapter, to illustrate how to use JOIN. Consider the SQL JOIN statement below:

SELECT Manufacturer, ManufacturerWebsite, ManufacturerEmail, AVG(Price) AS AvgPrice FROM Manufacturer JOIN Product ON Manufacturer.ManufacturerID = Product.ManufacturerID GROUP BY Manufacturer, ManufacturerWebsite, ManufacturerEmail The first obvious thing about this SQL statement is that it contains columns from 2 different tables in the SELECT column list. Then the FROM clause is followed by a JOIN clause. The JOIN clause has 2 parts, the first one stating the tables we are joining:

Manufacturer JOIN Product And the second part, which specifies which columns we are joining on:

ON Manufacturer.ManufacturerID = Product.ManufacturerID Because the Price column is a parameter for the AVG function in our SQL statement, we need to use GROUP BY clause for the rest of the columns in the SELECT list. As you might already have guessed our SQL statement selects a list of all manufacturers and the average price of their product. The result will look like this:

Manufacturer ManufacturerWebsite ManufacturerEmail AvgPrice Dell http://www.dell.com [email protected] $682.33 Toshiba http://www.toshiba.com [email protected] $741.50 You can specify the JOIN condition in the WHERE clause instead of the FROM clause, without using the JOIN keyword like this:

SELECT Manufacturer, ManufacturerWebsite, ManufacturerEmail, AVG(Price) AS AvgPrice FROM Manufacturer, Product WHERE Manufacturer.ManufacturerID = Product.ManufacturerID GROUP BY Manufacturer, ManufacturerWebsite, ManufacturerEmail It’s a better programming practice to specify your JOIN conditions in the FROM clause. When joining tables you’ll have to make sure that there is no ambiguity in the column names. In our example both Manufacturer and Product tables have a column named ManufacturerID, that’s why we prefixed this columns name with the respective table name concatenated with dot. Another thing worth mentioning in our example is the following part of the SQL statement:

AVG(Price) AS AvgPrice Because the column produced by the AVG function doesn’t have its own name we made up a name for it – AvgPrice. If we put this in SQL terms – we created an alias for the new column. There are 2 main types of SQL JOINS – INNER JOINS and OUTER JOINS. In our example we didn’t specify what type was the JOIN, and by doing that we used INNER JOIN by default. The INNER JOIN and JOIN clauses are interchangeable in general (Keep in mind that different RDBMS have different syntax for their JOIN clause). The INNER JOIN clause will retrieve all rows from both tables as long as there is a match between

the columns we are joining on. If we add a new manufacturer to our Manufacturer table, but we don’t add any products for it in the Product table, and we run our JOIN statement from above, the result will be the same as it was before adding the new manufacturer. This simply happens because we don’t have a match for this new manufacturer in the Product table, and because we are using INNER JOIN, which returns only the matching rows. The final result is that this manufacturer with products doesn’t appear in the retrieved data. Wait you say, what if I want to get the list of all manufacturers no matter if they have any products listed in the Product table? How can I do that? The answer is – use OUTER JOIN. OUTER JOIN clause returns all rows from at least one of the joined tables, granted that these rows meet the search conditions specified in the WHERE and HAVING clause (if any). In order to get all manufacturers and their average product price, without worrying that some of the manufacturers do not have any products listed yet, we will use the following OUTER JOIN SQL statement:

SELECT Manufacturer, ManufacturerWebsite, ManufacturerEmail, AVG(Price) AS AvgPrice FROM Manufacturer LEFT OUTER JOIN Product ON Manufacturer.ManufacturerID = Product.ManufacturerID GROUP BY Manufacturer, ManufacturerWebsite, ManufacturerEmail The only difference in our new statement is that we added the keywords LEFT OUTER in front of the JOIN keyword. The SQL OUTER JOIN has 2 sub-types called LEFT OUTER JOIN (or simply LEFT JOIN) and RIGHT OUTER JOIN (or simply RIGHT JOIN). When we use LEFT OUTER JOIN clause we indicate that we want to get all rows from the left table listed in our FROM clause (we will also called it the first table), even if they don’t have a match in the right (second) table. What values will be returned for the columns selected from the second table, which do not have a match, you may ask? If we relate this question to our example it will sound like this: What average product price will our SQL query return for all manufacturers, which don’t have any products in the Product table? The answer is simple – NULL. The result of our LEFT OUTER JOIN query will be the following if we added Sony to the Manufacturer table, but we didn’t add any Sony products to the Product table:

Manufacturer ManufacturerWebsite ManufacturerEmail AvgPrice Dell http://www.dell.com [email protected] $682.33 Toshiba http://www.toshiba.com [email protected] $741.50 Sony http://www.sony.com [email protected] NULL The RIGHT OUTER JOIN or simply RIGHT JOIN does exactly the opposite the LEFT JOIN does. The RIGHT OUTER JOIN gets all rows from the right (second) table listed in our FROM clause, even if they don’t have a match in the left (first) table and returns NULL values for the columns from the left table we don’t have match for. Finally a table can be joined to itself, and to accomplish that you need to give the table 2 different aliases in the FROM clause.

SQL UNION

The SQL UNION is used to combine the results of two or more SELECT SQL statements into a single result. All the statements concatenated with UNION must have the same structure. This means that they need to have the same number of columns, and corresponding columns must have the same or compatible data types (implicitly convertible to the same type or explicitly converted to the same type). The columns in each SELECT statement must be in exactly the same order too. This is how a simple UNION statement looks like:

SELECT Column1, Column2 FROM Table1 UNION SELECT Column1, Column2 FROM Table2 The column names in the result of a UNION are always the same as the column names in the first SELECT statement in the UNION. The UNION operator removes by default duplicate rows from the result set. You have the option to use the ALL keyword after the UNION keyword, which will force all rows including duplicates to be returned in your result set.

SQL NESTED QUERIES

A SQL nested query is a SELECT query that is nested inside a SELECT, UPDATE, INSERT, or DELETE SQL query. Here is a simple example of SQL nested query:

SELECT Model FROM Product WHERE ManufacturerID IN (SELECT ManufacturerID FROM Manufacturer WHERE Manufacturer = 'Dell') The nested query above will select all models from the Product table manufactured by Dell:

Model Inspiron B120 Inspiron B130 Inspiron E1705 You can have more than one level of nesting in one single query.

CREATE DATABASE

The CREATE DATABASE statement is used to create a new SQL database and has the following syntax: CREATE DATABASE DatabaseName The CREATE DATABASE implementation and syntax varies substantially between different RDBMS implementations.

The CREATE TABLE statement is used to create a new database table. Here is how a simple CREATE TABLE statement looks like:

CREATE TABLE TableName ( Column1 DataType, Column2 DataType, Column3 DataType, …. ) The DataType specified after each column name is a placeholder fro the real data type of the column. The following CREATE TABLE statement creates the Users table we used in one of the first chapters:

CREATE TABLE Users ( FirstName CHAR(100), LastName CHAR(100), DateOfBirth DATE ) The CREATE TABLE statement above creates a table with 3 columns – FirstName of type CHAR with length of 100 characters, LastName of type CHAR with length of 100 characters and DateOfBirth of type DATE. The ALTER TABLE statement is used to change a table definition by adding, modifying or dropping columns. Below you can see the syntax of an ALTER TABLE statement, which adds a new column to the table:

ALTER TABLE TableName ADD ColumnName DataType If we want to delete the newly added ColumnName column we can do it with the following ALTER TABLE statement:

ALTER TABLE TableName DROP ColumnName

SQL VIEWS

A SQL View is a virtual table, which is based on SQL SELECT query. Essentially a view is very close to a real database table (it has columns and rows just like a regular table), except for the fact that the real tables store data, while the views don’t. The view’s data is generated dynamically when the view is referenced. A view references one or more existing database tables or other views. In effect every view is a filter of the table data referenced in it and this filter can restrict both the columns and the rows of the referenced tables. Here is an example of how to create a SQL view using already familiar Product and Manufacturer SQL tables:

CREATE VIEW vwAveragePrice AS SELECT Manufacturer, ManufacturerWebsite, ManufacturerEmail, AVG(Price) AS AvgPrice FROM Manufacturer JOIN Product ON Manufacturer.ManufacturerID = Product.ManufacturerID GROUP BY Manufacturer, ManufacturerWebsite, ManufacturerEmail A view can be referenced and used from another view, from a SQL query, and from stored procedure. You reference a view as you would reference any real SQL database table:

SELECT * FROM vwAveragePrice

SQL INDEXES

Indexes in databases are very similar to indexes in libraries. Indexes allow locating information within a database fast, much like they do in libraries. If all books in a library are indexed alphabetically then you don’t need to browse the whole library to find particular book. Instead you’ll simply get the first letter from the book title and you’ll find this letter’s section in the library starting your search from there, which will narrow down your search significantly. An Index can be created on a single column or a combination of columns in a database table. A table index is a database structure that arranges the values of one or more columns in a database table in specific order. The table index has pointers to the values stored in specified column or combination of columns of the table. These pointers are ordered depending on the sort order specified in the index. Here is how to use CREATE INDEX SQL statement to create an index on column Model in the Product table, called idxModel:

CREATE INDEX idxModel ON Product (Model) The syntax for creating indexes varies greatly amongst different RDBMS, that’s why we will not discuss this matter further. There are some general rules which describe when to use indexes. When dealing with relatively small tables, indexes do not improve performance. In general indexes improve performance when they are created on fields used in table joins. Use indexes when most of your database queries retrieve relatively small datasets, because if your queries retrieve most of the data most of the time, the indexes will actually slow the data retrieval. Use indexes for columns that have many different values (there are not many repeated values within the column). Although indexes improve search performance, they slow the updates, and this might be something worth considering.

SQL TRAINING

Our SQL Tutorial is a great way to kick start your SQL Training and learn about SQL. Why would I

need SQL training you may ask? If you are an IT professional then you have to know SQL. If you

don't believe me just go to Monster.com or any other job site and check any software or web

development posting requirements. Chances are that you will see SQL in there. Further more you

might see different database platforms and servers required like MS SQL Server, Oracle and DB2

http://www.sqltraining.org/

for example. These specific SQL technologies might require further reading, for example you can

take a SQL Server training course, or Oracle training.

There are many ways to train your SQL skills. There are hundreds of SQL books published, there

are many SQL training videos, there are SQL training audio books and of course SQL e-books.

SQL-Tutorial.com is a very useful form of SQL training simply because it’s online and you can

access it from any computer with Internet connection. Another good thing is that our site is free

(compare that with paying for SQL training DVDs priced several hundred dollars).

SQL HOSTING

What is SQL Hosting?

SQL Hosting term refers to an online hosting service which offers database backend for your website. Using web hosting company that offers SQL database hosting can help you learn SQL and develop dynamic websites with SQL database backend.

SQL hosting on Linux

Most Linux SQL Hosting providers offer MySQL database as a standard feature with their hosting plans. Sometimes you will see PostgreSQL offer with Linux plans as well. Oracle is another RDBMS that runs on Linux/UNIX, but SQL hosting plans with Oracle are rather rare.

SQL Hosting on Windows

If you prefer Windows as an operating system then you have more choices of SQL hosting back ends. The MS SQL Server is the most popular database server fro Windows, that’s why many hosting companies offer SQL hosting featuring MS SQL Server (these hosting plans are frequently called SQL Server Hosting). SQL Server runs on Windows only. You can see Windows SQL hosting with Oracle database support, but these hosting offerings tend to be expensive, that’s why not many companies offer Oracle SQL hosting. A good option for Windows SQL hosting is MySQL and most Windows hosting businesses offer MySQL database support. Another popular SQL hosting choice for Windows is MS Access, but this database is recommended only for small websites with limited number of daily visitors.

Getting SQL Hosting

The day of the static websites are long gone and behind every popular site nowadays sits a SQL

database backend. If you want to develop quality, easy to maintain website, you need a SQL hosting account, with enterprise level database server like MS SQL Server, Oracle or MySQL. The

SQL hosting plans have become more affordable in recent years, and you can order a SQL

hosting for less than $10 per month.

http://www.sqlhosting.net/

SQL REPLICATION

What is SQL Replication?

SQL replication is a technology designed to allow storing identical data in multiple locations.

First lets examine why replication may be useful and how it solves common data distribution

problems. There are several classical examples in which SQL replication solves business problems.

One of the most popular ones is the case when a business has mobile employees, who need to

access data from their portable computers while they are away from the office. Another example

is when the workforce of a business is distributed around the world and all employees need to

access one and the same set of data, but network connectivity has poor quality. In both the

above examples using SQL replication is the right thing to do. Replication is used in many other

scenarios as well for example as a backup solution, and for offloading database intensive

processing like reporting and data mining from main live databases.

Types of SQL Replication

MS SQL Server supports 3 main types of SQL replication – Snapshot replication, Transactional

replication and Merge replication. The Snapshot replication uses a fresh database copy every

time you run it, as it title suggests. This replication type is relatively simple, however it doesn’t

have many applications in real live and is mainly used to create an initial copy of a database,

which will be used in more complex replication types. The second SQL replication type is the

Transactional replication, which sits somewhere in between Snapshot and merge replications as

far as complexity is concerned. The Transactional replication uses Snapshot replication to get a

starting copy of the data that needs to be replicated, but from this point on it updates this copy

only the data that has been updated since its latest update. The Merge replication is used in

situations, which require updates to be made to any of the replication copies. When we have a

scenario like that, the data from the different replication locations needs to be ultimately merged

to a central location.

Ankit Gupta [email protected]

http://www.sql-tutorial.net/sql-replication.asp

sql introduction - imedmca10.files.wordpress.com · sql introduction sql is an acronym for...

Documents