[301] database 1 · names age score parker 26 21 heidy 22 22 shirly 27 22 arla 21 22 bella 22 22...

90
[301] Database 1 Tyler Caraza-Harter

Upload: others

Post on 07-Nov-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

[301] Database 1Tyler Caraza-Harter

Page 2: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

To download...

download

Page 3: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

301 Progress

Languages learned• Python [Programming Language]• HTML [Markup Language]

Data storage• CSV files• JSON files

Page 4: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

301 Progress

Languages learned• Python [Programming Language]• HTML [Markup Language]• SQL [Query Language]

Data storage• CSV files• JSON files• SQL databases

structured query language

Page 5: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Learning Objectives Today

SQL Data• schemas: tables, columns, types• advantages over JSON/CSV

SQL Queries• select, where, limit, sort by• sqlite3 module• Pandas/DB integration

Page 6: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Outline

Tabular Data: CSVs vs. Databases

Common SQL Databases

Example: Madison bus-route data

SQL: Structured Query Language

Querying from Python

Page 7: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Characteristics • one table

CSV SQL Database

State Capital Population AreaWI Madison 5795000 65498… … … …

State CapitalWI Madison… …

State PopulationWI 5795000… …

State AreaWI 65498… …

Characteristics • collection of tables, each named

County Pop un_empDane 536416 0.02

… … …

capitals populations

counties areas

Page 8: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Characteristics • one table• columns sometimes named

CSV SQL Database

State Capital Population AreaWI Madison 5795000 65498… … … …

State CapitalWI Madison… …

State PopulationWI 5795000… …

State AreaWI 65498… …

Characteristics • collection of tables, each named• columns always named

County Pop un_empDane 536416 0.02

… … …

capitals populations

counties areas

Page 9: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Characteristics • one table• columns sometimes named• everything is a string

CSV SQL Database

State Capital Population Areastring string string stringstring string string stringstring string string stringstring string string stringstring string string stringstring string string stringstring string string string

State Capitaltext texttext texttext texttext text

State Populationtext integertext integertext integertext integer

State Areatext integertext integertext integertext integer

Characteristics • collection of tables, each named• columns always named• types per column (enforced)

County Pop un_emptext integer realtext integer realtext integer realtext integer real

capitals populations

counties areas

no text allowed

Page 10: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Why use a database?

1. More Structure

Database

A B Ctext integer realtext integer realtext integer realtext integer real

same fields and sametypes in every column

CSV

A,B,C string,string,string string,string,string string,string,string string,string,string

JSON

[{"A":"val", "B":10, "C":3.14}, {"A":"val"}, {"A":"v2", "B": 9, "C":False},

everything is a string types, but…missing values

types may differ across columns

Page 11: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Why use a database?

1. More Structure

2. Sharing

Database

regular file

program 1 program 2

Page 12: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Why use a database?

1. More Structure

2. Sharing

Database

regular file

program 1 program 2

yikes!

this is OK

writes

writeswrites

writes

yikes!

Page 13: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Why use a database?

1. More Structure

2. Sharing

3. Queries

Databaseregular file

Python code to find actor who appeared

in most movies

which actor appeared in the most movies?

Christopher Lee

Page 14: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Why use a database?

1. More Structure

2. Sharing

3. Queries

Databaseregular file

Python code to find actor who appeared

in most moviesChristopher Lee

question formulated in SQL (structured query language)

Page 15: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Why use a database?

1. More Structure

2. Sharing

3. Queries

4. Performance

Let's play a game where we pretend to be a database!

Page 16: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Question 1:

How many people are 23 or younger?

Question 2:

How many people scored 23 or less?

names age scoreParker ? ?Heidy ? ?Shirly ? ?Arla ? ?Bella ? ?Bill ? ?Hollis ? ?Maurita ? ?Milda ? ?Pearline ? ?Teresa ? ?Ceola ? ?Milford ? ?Alisha ? ?Antonetta ? ?Ryan ? ?Karma ? ?Lashandra ? ?Breana ? ?Sara ? ?

Page 17: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Question 1:

How many people are 23 or younger?

Question 2:

How many people scored 23 or less?

names age scoreParker 26 21Heidy 22 22Shirly 27 22Arla 21 22Bella 22 22Bill 28 22Hollis 26 23Maurita 22 24Milda 22 25Pearline 29 25Teresa 25 25Ceola 30 26Milford 25 26Alisha 30 27Antonetta 28 28Ryan 25 28Karma 23 28Lashandra 24 29Breana 22 30Sara 28 30

Page 18: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Question 1:

How many people are 23 or younger?

Question 2:

How many people scored 23 or less?

names age scoreParker 26 21Heidy 22 22Shirly 27 22Arla 21 22Bella 22 22Bill 28 22Hollis 26 23Maurita 22 24Milda 22 25Pearline 29 25Teresa 25 25Ceola 30 26Milford 25 26Alisha 30 27Antonetta 28 28Ryan 25 28Karma 23 28Lashandra 24 29Breana 22 30Sara 28 30

Which question took longer to answer? Why?

Page 19: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

names age scoreParker 26 21Heidy 22 22Shirly 27 22Arla 21 22Bella 22 22Bill 28 22Hollis 26 23Maurita 22 24Milda 22 25Pearline 29 25Teresa 25 25Ceola 30 26Milford 25 26Alisha 30 27Antonetta 28 28Ryan 25 28Karma 23 28Lashandra 24 29Breana 22 30Sara 28 30

names age scoreArla 21 22Heidy 22 22Bella 22 22Maurita 22 24Milda 22 25Breana 22 30Karma 23 28Lashandra 24 29Teresa 25 25Milford 25 26Ryan 25 28Parker 26 21Hollis 26 23Shirly 27 22Sara 28 30Bill 28 22Antonetta 28 28Pearline 29 25Alisha 30 27Ceola 30 26

DBs can keep multiple copies of the same data• which organizations to use

are configured (indexing)• which copy to use is used is

automatically determined based on the question being asked

copy 1 copy 2

Page 20: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Why use a database?

1. More Structure

2. Sharing

3. Queries

4. Performance

Why not use a database?

It’s often overkill.For many situations, a simple JSON or CSV is easier to use.

Page 21: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Outline

Tabular Data: CSVs vs. Databases

Common SQL Databases

Example: Madison bus-route data

SQL: Structured Query Language

Querying from Python

Page 22: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Popular SQL Databases

There are minor differences in how you use these (e.g., what column types are available and how you query for data).

Most experience with one DB will translate to work with other DBs.

Page 23: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Popular SQL Databases

Why learn SQLite?• easy to install/use• sqlite3 module comes with Python• it’s public domain• several billion deployments

There are minor differences in how you use these (e.g., what column types are available and how you query for data).

Most experience with one DB will translate to work with other DBs.

in CS 301

Page 24: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Popular SQL Databases

Why learn SQLite?• easy to install/use• sqlite3 module comes with Python• it’s public domain• several billion deployments

in CS 301

https://www.sqlite.org/mostdeployed.html - Every Android device - Every iPhone and iOS device - Every Mac - Every Windows 10 machine - Every Firefox, Chrome, and Safari web browser - Every instance of Skype - Every instance of iTunes - Every Dropbox client

Page 25: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Outline

Tabular Data: CSVs vs. Databases

Common SQL Databases

Example: Madison bus-route data

SQL: Structured Query Language

Querying from Python

Page 26: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Madison Bus Data: http://data-cityofmadison.opendata.arcgis.com/datasets/metro-transit-ridership-by-route-weekday

"Metro Transit ridership by route weekday. March, 2015. Caution should be used with this data. Daily bus stop boardings were estimated using a 12-day sample of weekday farebox records and AVL logs, and the GTFS file, from March 2015 from Metro Transit."

Page 27: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQLite DatabaseFile: bus.db

routes Table

Page 28: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQLite Databaseboarding

TableFile: bus.db

routes Table

Page 29: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

ty-mac:lec-31$ sqlite3 bus.dbSQLite version 3.23.1 2018-04-10 17:39:29Enter ".help" for usage hints.sqlite> .tables.tablesboarding routes sqlite> select * from routes;select * from routes;0|63|8052|1|http://www.cityofmadison.com/Metro/schedules/Route01/|323791|64|8053|2|http://www.cityofmadison.com/Metro/schedules/Route02/|96906…sqlite> select route_url from routes;select route_url from routes;http://www.cityofmadison.com/Metro/schedules/Route01/http://www.cityofmadison.com/Metro/schedules/Route02/http://www.cityofmadison.com/Metro/schedules/Route03/...sqlite> select * from boarding;select * from boarding;0|1163|27|43.073655|-89.385427|1.031|1163|47|43.073655|-89.385427|0.112|1163|75|43.073655|-89.385427|0.34...sqlite> select * from boarding limit 5;select * from boarding limit 5;0|1163|27|43.073655|-89.385427|1.031|1163|47|43.073655|-89.385427|0.112|1163|75|43.073655|-89.385427|0.343|1164|6|43.106465|-89.340021|10.594|1167|3|43.077867|-89.369993|3.11sqlite> .schema.schemaCREATE TABLE IF NOT EXISTS "boarding" (...

sqlite3 tool [DEMO]

commands pasted for later review

https://github.com/tylerharter/caraza-harter-com/raw/

master/tyler/cs301/fall18/materials/code/lec-31/bus.db

Download bus.db:

Page 30: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3 prompt

sqlite> a SQLite command

sqlite> a SQL query

OR

ty-mac:lec-31$ sqlite3 bus.db

Page 31: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3 prompt

sqlite> a SQLite command

sqlite> a SQL query

OR

ty-mac:lec-31$ sqlite3 bus.db

Page 32: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3 prompt

sqlite> .help

.archive ... Manage SQL archives: ".archive --help" for details

.auth ON|OFF Show authorizer callbacks

.backup ?DB? FILE Backup DB (default "main") to FILE

.bail on|off Stop after hitting an error. Default OFF

.binary on|off Turn binary output on or off. Default OFF

.cd DIRECTORY Change the working directory to DIRECTORY…

get list of SQLite commands (all begin with a period)

ty-mac:lec-31$ sqlite3 bus.db

Page 33: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3 prompt

sqlite> .tables

boarding routes

print list of tables in the database

ty-mac:lec-31$ sqlite3 bus.db

Page 34: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3 prompt

sqlite> .schema

CREATE TABLE IF NOT EXISTS "boarding" ("index" INTEGER, "StopID" INTEGER, "Route" INTEGER, "Lat" REAL, "Lon" REAL, "DailyBoardings" REAL);CREATE INDEX "ix_boarding_index"ON "boarding" ("index");CREATE TABLE IF NOT EXISTS "routes" ("index" INTEGER, "OBJECTID" INTEGER, "trips_routes_route_id" INTEGER, "route_short_name" INTEGER, "route_url" TEXT, "ShapeSTLength" REAL);CREATE INDEX "ix_routes_index"ON "routes" ("index");

prints SQL code to create those tables

ty-mac:lec-31$ sqlite3 bus.db

Page 35: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

CREATE TABLE IF NOT EXISTS "boarding" ("index" INTEGER, "StopID" INTEGER, "Route" INTEGER, "Lat" REAL, "Lon" REAL, "DailyBoardings" REAL);CREATE INDEX "ix_boarding_index"ON "boarding" ("index");CREATE TABLE IF NOT EXISTS "routes" ("index" INTEGER, "OBJECTID" INTEGER, "trips_routes_route_id" INTEGER, "route_short_name" INTEGER, "route_url" TEXT, "ShapeSTLength" REAL);CREATE INDEX "ix_routes_index"ON "routes" ("index");

Page 36: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

CREATE TABLE IF NOT EXISTS "boarding" ("index" INTEGER, "StopID" INTEGER, "Route" INTEGER, "Lat" REAL, "Lon" REAL, "DailyBoardings" REAL);CREATE INDEX "ix_boarding_index"ON "boarding" ("index");CREATE TABLE IF NOT EXISTS "routes" ("index" INTEGER, "OBJECTID" INTEGER, "trips_routes_route_id" INTEGER, "route_short_name" INTEGER, "route_url" TEXT, "ShapeSTLength" REAL);CREATE INDEX "ix_routes_index"ON "routes" ("index");

table names

Page 37: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

CREATE TABLE IF NOT EXISTS "boarding" ("index" INTEGER, "StopID" INTEGER, "Route" INTEGER, "Lat" REAL, "Lon" REAL, "DailyBoardings" REAL);CREATE INDEX "ix_boarding_index"ON "boarding" ("index");CREATE TABLE IF NOT EXISTS "routes" ("index" INTEGER, "OBJECTID" INTEGER, "trips_routes_route_id" INTEGER, "route_short_name" INTEGER, "route_url" TEXT, "ShapeSTLength" REAL);CREATE INDEX "ix_routes_index"ON "routes" ("index");

look for column names in parens

columns • index• StopID• Route• Lat• Lon• Daily Boardings

Page 38: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

CREATE TABLE IF NOT EXISTS "boarding" ("index" INTEGER, "StopID" INTEGER, "Route" INTEGER, "Lat" REAL, "Lon" REAL, "DailyBoardings" REAL);CREATE INDEX "ix_boarding_index"ON "boarding" ("index");CREATE TABLE IF NOT EXISTS "routes" ("index" INTEGER, "OBJECTID" INTEGER, "trips_routes_route_id" INTEGER, "route_short_name" INTEGER, "route_url" TEXT, "ShapeSTLength" REAL);CREATE INDEX "ix_routes_index"ON "routes" ("index");

types...

Page 39: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

CREATE TABLE IF NOT EXISTS "boarding" ("index" INTEGER, "StopID" INTEGER, "Route" INTEGER, "Lat" REAL, "Lon" REAL, "DailyBoardings" REAL);CREATE INDEX "ix_boarding_index"ON "boarding" ("index");CREATE TABLE IF NOT EXISTS "routes" ("index" INTEGER, "OBJECTID" INTEGER, "trips_routes_route_id" INTEGER, "route_short_name" INTEGER, "route_url" TEXT, "ShapeSTLength" REAL);CREATE INDEX "ix_routes_index"ON "routes" ("index");

Note: we’re glossing over lots of details because we'll use pandas to create these

Page 40: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

columns in table boarding:• index• StopID• Route• Lat• Lon• DailyBoardings

columns in table routes:• index• OBJECTID• trips_routes_route_id,• route_short_name• route_url• ShapeSTLength

Page 41: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

columns in table boarding:• index• StopID• Route• Lat• Lon• DailyBoardings

columns in table routes:• index• OBJECTID• trips_routes_route_id,• route_short_name• route_url• ShapeSTLength

Once we identify the table and column names, how can ask questions?

Page 42: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3 prompt

sqlite> a SQLite command (starts with ".")

sqlite> a SQL query;

OR

ty-mac:lec-31$ sqlite3 bus.db

Page 43: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3 prompt

sqlite> a SQLite command (starts with ".")

sqlite> a SQL query;

OR

ty-mac:lec-31$ sqlite3 bus.db

Page 44: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Outline

Tabular Data: CSVs vs. Databases

Common SQL Databases

Example: Madison bus-route data

SQL: Structured Query Language

Querying from Python

Page 45: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Overview: Narrowing Down

table 1 table 2 table 3

Page 46: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Overview: Narrowing Down

table 1 table 2 table 3

FROM: which table?

col1 col2 col3

Page 47: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Overview: Narrowing Down

table 1 table 2 table 3

FROM: which table?SELECT: which columns?

col1 col2 col3

Page 48: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Overview: Narrowing Down

table 1 table 2 table 3

FROM: which table?SELECT: which columns?WHERE: which rows?

col1 col2 col3

Page 49: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Overview: Narrowing Down

table 1 table 2 table 3

FROM: which table?SELECT: which columns?WHERE: which rows?LIMIT: how many rows?

col1 col2 col3

Page 50: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Overview: Narrowing Down

A B

C D

table 1 table 2 table 3

FROM: which table?SELECT: which columns?WHERE: which rows?LIMIT: how many rows?

A B

C D

col1 col2 col3

col1 col3

a query result looks like a table

Page 51: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

SELECT

Syntax for SELECT (case and spacing don’t matter):

FROM ;

Page 52: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select

Syntax for SELECT (case and spacing don’t matter):

from ;

Page 53: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select

Syntax for SELECT (case and spacing don’t matter):

from

optional stuff ;

Page 54: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select

Syntax for SELECT (case and spacing don’t matter):

from table name ;

Page 55: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select

Syntax for SELECT (case and spacing don’t matter):

from boarding;

Page 56: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select

Syntax for SELECT (case and spacing don’t matter):

from boarding;

which columns

Page 57: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select *

Syntax for SELECT (case and spacing don’t matter):

from boarding;

star means all of them

Result:

Page 58: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select Route, DailyBoardings

Syntax for SELECT (case and spacing don’t matter):

from boarding;

Result:

Page 59: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select *

Syntax for SELECT (case and spacing don’t matter):

from routes;

Result:

Page 60: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select route_url

Syntax for SELECT (case and spacing don’t matter):

from routes;

Result:

Page 61: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select

Syntax for SELECT (case and spacing don’t matter):

from

optional stuff ;

Page 62: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select

Syntax for SELECT (case and spacing don’t matter):

from

optional stuff ;

where order by limit

Page 63: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

Syntax for SELECT (case and spacing don’t matter):

Result:

select *from boarding;

Page 64: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select *from boardingwhere Route = 80;

Syntax for SELECT (case and spacing don’t matter):

Result:

Page 65: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select *from boardingwhere Route = 80;

Syntax for SELECT (case and spacing don’t matter):

Result:

note SQL only has oneequal sign for equality!

Page 66: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select *from boardingwhere Route = 80order by StopID;

Syntax for SELECT (case and spacing don’t matter):

Result:

Page 67: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select *from boardingwhere Route = 80order by StopID DESC;

Syntax for SELECT (case and spacing don’t matter):

Result:

descending meansbiggest first

Page 68: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select *from boardingwhere Route = 80order by StopID ASC;

Syntax for SELECT (case and spacing don’t matter):

Result:

ascending meansbiggest first

Page 69: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select *from boardingwhere Route = 80order by StopID ASClimit 3;

Syntax for SELECT (case and spacing don’t matter):

Result:

only show the top N results

3 results

Page 70: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select *from boardingwhere Route = 80order by StopID ASClimit 3;

Syntax for SELECT (case and spacing don’t matter):

Result:

Page 71: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

SQL Queries: How to ask a DB questions

select *from boardingwhere Route = 80order by StopID ASClimit 3;

Syntax for SELECT (case and spacing don’t matter):

Result:

You can use any combination of where, order by, and limit.But whichever you use, they must appear in that order!

Page 72: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Outline

Tabular Data: CSVs vs. Databases

Common SQL Databases

Example: Madison bus-route data

SQL: Structured Query Language

Querying from Python

Page 73: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Modules we’ve learned this semester

• math• collections• json• csv• sys• os• copy• recordclass• requests• bs4 (BeautifulSoup)• pandas• sqlite3 directly access SQLite databases (comes with Python)

integrates with SQLite

Page 74: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3conn = sqlite3.connect("file.db")

database filename• represented as a string• will create if doesn’t already exist

Page 75: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3conn = sqlite3.connect("file.db")

connect for databases is analogous to open for files

Page 76: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3conn = sqlite3.connect("file.db")

a connection object for databases is analogous to file object for files

Page 77: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3conn = sqlite3.connect("file.db")

conn.close()

close it at the end

Page 78: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3conn = sqlite3.connect("file.db")

results = conn.execute(query_string)

conn.close()

instead of reading (as to a file), we execute SQL queries on the database

Page 79: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3conn = sqlite3.connect("file.db")

results = conn.execute("select * from boarding")

conn.close()

the SQL query is just a stringthat is passed to the execute

Page 80: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3conn = sqlite3.connect("file.db")

results = conn.execute("select * from boarding")

conn.close()

results is an iterator over tupleswe get by running the query

Page 81: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3conn = sqlite3.connect("file.db")

results = conn.execute("select * from boarding")

for row in results: print(row)

conn.close()

(0, 1163, 27, 43.073655, -89.385427, 1.03)(1, 1163, 47, 43.073655, -89.385427, 0.11)(2, 1163, 75, 43.073655, -89.385427, 0.34)(3, 1164, 6, 43.106465, -89.340021, 10.59)(4, 1167, 3, 43.077867, -89.36999300000001, 3.11)(5, 1167, 4, 43.077867, -89.36999300000001, 2.23)...

Output:

Page 82: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3conn = sqlite3.connect("file.db")

results = conn.execute("select * from boarding")

for row in results: print(row)

conn.close()

(0, 1163, 27, 43.073655, -89.385427, 1.03)(1, 1163, 47, 43.073655, -89.385427, 0.11)(2, 1163, 75, 43.073655, -89.385427, 0.34)(3, 1164, 6, 43.106465, -89.340021, 10.59)(4, 1167, 3, 43.077867, -89.36999300000001, 3.11)(5, 1167, 4, 43.077867, -89.36999300000001, 2.23)...

Output:

select * from sqlite_master

(sqlite_master describes other tables)

Page 83: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3conn = sqlite3.connect("file.db")

results = conn.execute("select * from boarding")

for row in results: print(row)

conn.close()

Instead of looping over tuples, Pandas can execute the query

and give a DataFrame of results

Page 84: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3import pandas as pd conn = sqlite3.connect("file.db")

results = conn.execute("select * from boarding")

for row in results: print(row)

conn.close()

Instead of looping over tuples, Pandas can execute the query

and give a DataFrame of results

Page 85: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

sqlite3import sqlite3import pandas as pd conn = sqlite3.connect("file.db")

df = pd.read_sql("select * from boarding", conn)

conn.close()

df:

run this query on this database

Page 86: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Demo 1: How Many People Ride the Bus

Goal: add up all boardings across all bus stops/routes

Input:• bus.db• use DailyBoardings column in boarding table• Challenge: use conn.execute call (instead of pandas)

Output:• total riders

Page 87: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Demo 2: West-most Bus Route

Goal: which Madison bus goes farthest west?

Input:• bus.db

Output:• route number of bus

that goes farthest west

smallerlongitude

biggerlongitude

bigger latitude

smaller latitude

N

S

W E

Page 88: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Demo 3: Heart of Madison

Goal: what is the central-most location of all bus pickups?

Input:• bus.db

Output:• a latitude and longitude

Page 89: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Demo 4: Fifa

Goal: load Fifa.csv to a SQLite DB, then query it

Queries:• who are the youngest players?• who are the oldest players?• who are the five oldest players?• how many players are from Brazil?• who are the oldest players from Brazil?• who are the 5 oldest players from Brazil?• what percent of leagues have players from Brazil? DISTINCT

Page 90: [301] Database 1 · names age score Parker 26 21 Heidy 22 22 Shirly 27 22 Arla 21 22 Bella 22 22 Bill 28 22 Hollis 26 23 Maurita 22 24 Milda 22 25 Pearline 29 25 Teresa 25 25

Demo 5: Vocabulary Quiz

Goal: quiz user on words looked up while reading a Kindle

Input (vocab.db):• table of kindle words lookups• table of definitions

Output:• random word• real definition• fake definitions