building a sql database that works

Post on 18-Dec-2014

380 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Building a SQL Databasethat works

Josh BerkusPostgreSQL Experts, Inc.OpenSourceBridge 2009

How Not To Do Itfour popular methods

1. One Big Spreadsheet

2. EAV & E-Blob

ID Property Setting

407 Eyes Brown

407 Height 73in

407 Married? TRUE

408 Married? FALSE

408 Smoker FALSE

408 Age 37

409 Height 66in

ID Properties

407 <eyes=”brown”><height=”73”> <married=”1”><smoker=”1”>

408 <hair=”brown”><age=”49”> <married=”0”><smoker=”0”>

409 <age=”37”><height=”66”> <hat=”old”><teeth=”gold”>

3. Incremental Development

4. Leave It to the ORM

E.F. CoddDatabase Engineer, IBM 1970

IBM Databases Run Amok

1.losing data

2.duplicate data

3.wrong data

4.crappy performance

5.downtime for database redesign whenever anyone made an

application change

The Relational Model

All the Relational ModelYou Need to Know

in less than 10 minutes

Set (Bag) Theory

Relations

Relation(table, view, rowset)

Tuples

Relation(table, view, rowset)

Tuple (row)Tuple (row) Tuple (row)

Tuple (row)Tuple (row)

Attributes

Relation(table, view, rowset)

Tuple (row) Tuple (row)

Tuple (row)Tuple (row)

Tuple (row)Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

AttributeAttribute

Attribute Attribute

Domains (types)

Relation(table, view, rowset)

Tuple (row) Tuple (row)

Tuple (row)Tuple (row)

Tuple (row)INT

DATE

TEXT

DATE

TEXT

Attribute

Attribute

DATE

TEXT

Attribute

Attribute

DATE

TEXT

Attribute

Attribute

DATE

TEXT

INTINT

INT INT

Keys

Relation(table, view, rowset)

Tuple (row) Tuple (row)

Tuple (row)Tuple (row)

Tuple (row)Attribute

Attribute

Attribute

Attribute

AttributeKey

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

KeyKey

Key Key

Constraints

Relation(table, view, rowset)

Tuple (row) Tuple (row)

Tuple (row)Tuple (row)

Tuple (row)Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

Attribute

AttributeAttribute

Attribute Attribute

Attribute > 5

Foreign Key Constraint

Derived Relation (query)

Problem 1:Losing Data

?

The Atomic Age

Users

Simple User Table

Admins

● name (text)● email (text)● login (text)● password (text)● status (char)

(user, inactive, admin)

Users

Non-Atomic Attributes

Admins

● name (text)● email (text)● login (text)● password (text)● status (char)

(user, inactive, admin)

What's Atomic?

The simplest form of a datum, which is not divisible without loss of information.

nameJosh Berkus

Statusa

SELECT SUBSTR(name,STRPOS(name, ' ')) ...

… WHERE status = 'a' or status = 'u' ...

Statusi

… WHERE ??? ...

What's Atomic?

The simplest form of a datum, which is not divisible without loss of information.

first_nameJosh

activeTRUE

last_nameBerkus

accessa

Users

Table Atomized!

Admins

● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean) ● access (char)

Atomic, Shmomic. Who Cares?

● Atomic Values:– retain data

– make joins easier

– make constraints easier

● Non-atomic Values:– make data loss more likely

– increase CPU usage

– make you more likely to forget something

Transactions

Splitting a Bulletin Board Threadthe hard way

INSERT INTO threads VALUES ( .... );If $dbh('success') then

for $these_posts.date > $cutdate loopUPDATE posts SET thread = $newthreadWHERE id = $these_posts.id;

if not $dbh('success') thenfor $these_posts.id > $last_id loop

UPDATE posts SET thread = $oldthreadWHERE id = $these_posts.id;DELETE FROM threadsWHERE id = $newthread;

Splitting a Bulletin Board Threadthe transactional way

BEGIN;INSERT INTO threads VALUES ( .... );$newthread = curval();UPDATE posts SET thread = $newthread

WHERE thread = $oldthreadAND date > $cutdate;

END;

Problem 2:Duplicate Data

nameJosh Berkus

Joshua Berkus

Josh Berkus

Berkus, Josh

user_nameJosh BerkusJosh Berkus

adminJosh BerkusBerkus

user_nameJosh Berkus

Joshua Berkus

Josh

Where are my Keys?

Key

Users

Candidate (Natural) Keys

Admins

● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean) ● access (char)

A Good Key

● Should have to be unique because the application requires it to be.

● Expresses a unique predicate which describes the tuple (row):

– user with login “jberkus”

– post from “jberkus” on “2009-05-02 13:41:22” in thread “Making your own wine”

● If you can't find a good key, your table design is missing data.

Key

Surrogate Key

Users

Admins

● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean) ● access (char)

● user_id (serial)

We All Just Want to Be Normal

Abby Normal

login level last_name

jberkus u Berkus

selena a Deckelman

login title posted level

jberkus Dinner? 09:28 u

selena Dinner? 09:37 u

jberkus Dinner? 09:44 a

How can I be “Normal”?1. Each piece of data only appears in one relation

– except as a “foreign key” attribute

● No “repeated” attributes

login level last_name

jberkus u Berkus

selena a Deckelman

login title posted

jberkus Dinner? 09:28

selena Dinner? 09:37

jberkus Dinner? 09:44

Problem 3:Wrong Data

Constraints

Users Run Amok

● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean) ● access (char)

Ensure Your Data is Consistentno matter where it came from

first_name last_name email login password active levelJosh Berkus TRUE aNULL Kelley k NULL TRUE uMark Twain samuel NULL I

S F gavin twitter FALSE x

josh@pgexperts.com jberkus jehosaphatkelley@ucbwww.pm.org halleysgavin@sf.gov

Users Under Constraint

● first_name (text) length() > 1● last_name (text) length() > 1● email (text) ILIKE '%@%.%'● login (text) length() > 5● password (text) length() > 5● active (boolean) NOT NULL● access (char) IN ( 'a','u' )

note: email and other validators would, of course, be more complex

Foreign Keys

Users

Admins

Posts

attributelogin

Posts Table

● title (text) NOT NULLREFERENCES threads ( title )ON DELETE CASCADE ON UPDATE CASCADE

● posted (timestamp) NOT NULL● user (text) NOT NULL

REFERENCES users ( login )ON DELETE CASCADE ON UPDATE CASCADE

● content (text) NOT NULL

Beautiful Cascades

users.loginJosh Berkusjberkus

selena

posts.contentJosh Berkus

I'm going crazy!

What's up?

Why?

OSB! It's too much!

I told you so ...

jerkyboywww.pornking.com

www.whitehouse.com

www.whiteslavery.com

www.lolcats.com

Problem 4:Crappy Performance

Things We've Already Done

● Atomicization– less CPU on parsing, calculations

● Normalization– less data duplication

– smaller tables

● Transactions– more batches, less iteration

– less locking

Denormalized Derived Relationsmaterialized views for the win

Users

Admins

Posts

user_postcount

Problem 5:Database Changes Cause

Application Downtimeand vice-versa

Stuff We've Already Doneto make our data “agile”

● Atomicization

– data isn't in specific interface version formats● Normalization

– where to extend data is more obvious

– create a new table if you have to● Transactions

– prevent partial failures from changed schema

Views

Extending the Users Table● first_name (text)● last_name (text)● email (text)● login (text)● password (text)● active (boolean) ● access (char)● created (timestamp)● last_login (timestamp)

CREATE VIEW oldapp.users ASSELECT first_name || ' ' || last_name, email, login, password, active, accessFROM users;

The Restyou already know

● Write Migrations– deploy these in transactions, if supported

– if not, write rollback scripts

● Write Tests– use a realistic staging environment

Some Other Tips

● Pick a naming scheme– and stick to it

● Don't do your joins in the application– the database does them better

● Repurposing fields will bite you– sure you'll remember when that changed

● Don't micro-optimize– INT3, anyone?

More Information● me

– josh@pgexperts.com

– www.pgexperts.com

– it.toolbox.com/blogs/database-soup

● postgresql: www.postgresql.org● tutorial at OSCON

– monday, 8:30 am!

– see you in San Jose!

This presentation copyright 2009 Josh Berkus, licensed for distribution under the Creative Commons Attribution License, except for photos, most of which were stolen from other people's websites via images.google.com. Thanks, Google!

top related