database design, a practical guide

73
Database Design, a Practical Guide Gus Björklund ([email protected]) Wizard, Progress Software Corporation

Upload: august-lester

Post on 30-Dec-2015

41 views

Category:

Documents


1 download

DESCRIPTION

Database Design, a Practical Guide. Gus Bj örklund ([email protected]). Wizard, Progress Software Corporation. Ask questions as we go if I am not being clear. Warning: there is a mistake in these slides. Rules are made to be broken. To every rule, there is an exception!. Topics. Theory: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Database Design, a Practical Guide

Database Design, a Practical Guide

Gus Björklund ([email protected])Wizard, Progress Software Corporation

Page 2: Database Design, a Practical Guide

© 2007 Progress Software Corporation2 Database Design: A Practical Guide

Ask questions as we goif I am not being clear.

Warning: there is a mistake in these slides.

Page 3: Database Design, a Practical Guide

© 2007 Progress Software Corporation3 Database Design: A Practical Guide

To every rule, there is an exception!

Rules are made to be broken

Page 4: Database Design, a Practical Guide

© 2007 Progress Software Corporation4 Database Design: A Practical Guide

Topics

Theory:• What is Database Design

• Basic Elements

• Representing the data model as Tables

Practice• An Example

Some Other Topics

Page 5: Database Design, a Practical Guide

© 2007 Progress Software Corporation5 Database Design: A Practical Guide

First, a little theory

Page 6: Database Design, a Practical Guide

© 2007 Progress Software Corporation6 Database Design: A Practical Guide

What do we mean by database design?

A process for defining a model of a subset of the “real1 “ world, then representing it as data in tables in a relational database

At least, that’s the definition we will use for the purposes of this talk.

1: Well, for small values of real, anyway.

Page 7: Database Design, a Practical Guide

© 2007 Progress Software Corporation7 Database Design: A Practical Guide

Basic Elements

Just 3 Things:• Entities

• Attributes

• Relationships

What do we put in our model?

The “entity-relationship model” was described by Peter Chen in 1976.

See http://bit.csc.lsu.edu/~chen/chen.html

Page 8: Database Design, a Practical Guide

© 2007 Progress Software Corporation8 Database Design: A Practical Guide

Basic Elements 1: Entities

Can be thought of as nouns• People

– author, composer, performer, seller, buyer

• Places– home, IP address, URL, destination, factory,

store

• Things– song, recording, instrument, car, invoice

Is “telephone number” a place or a thing?

Page 9: Database Design, a Practical Guide

© 2007 Progress Software Corporation9 Database Design: A Practical Guide

Basic Elements 2: Attributes

Can be thought of as adjectives (but only loosely):• Length• Color• Horsepower• Part number• Song Title• Publication Date• Size• Fabric• Owner

Is “telephone number” a attribute or an entity?

Entities have attributes

Page 10: Database Design, a Practical Guide

© 2007 Progress Software Corporation10 Database Design: A Practical Guide

Basic Elements 3: Relationships

Can be thought of as verbs:• has a• owns• contains• supervises• performs• called• sold• purchased• proved

Entities are connected by relationships

Is “telephone number” a relationship?

Page 11: Database Design, a Practical Guide

© 2007 Progress Software Corporation11 Database Design: A Practical Guide

In May 1995,

Andrew Wiles

published

a proof

of Fermat’s Last Theorem

Relationships have attributes too

Page 12: Database Design, a Practical Guide

© 2007 Progress Software Corporation12 Database Design: A Practical Guide

In May 1995,

Andrew Wiles

published

a proof

of Fermat’s Last Theorem

Relationships have attributes too

entity

entity

relationship

attribute

Page 13: Database Design, a Practical Guide

© 2007 Progress Software Corporation13 Database Design: A Practical Guide

What goes in an entity ?

Identifying attributes• Must be able to uniquely identify the entity

• Can have more than one way to id

• Id can be composite

Descriptive attributes• the values you need to keep track of

• generally should be simple, not complex

Page 14: Database Design, a Practical Guide

© 2007 Progress Software Corporation14 Database Design: A Practical Guide

What to include in your model

The things your application has to keep track of• Telephones, wires, switches

The actions your application or its users perform• Make calls, send telephone bills, collect payments

Some attributes of the things and actions• Originating number, date and time of call, duration, called

number

Keep it simple Be accurate Keep it up to date

Page 15: Database Design, a Practical Guide

© 2007 Progress Software Corporation15 Database Design: A Practical Guide

What to include in your model

Consider the goals of the system Everything you include should be there for a

reason you can state• in no more than two sentences

Everything should have a clear name• if you can’t name it, it doesn’t belong

Talk to the stakeholders !!!

Page 16: Database Design, a Practical Guide

© 2007 Progress Software Corporation16 Database Design: A Practical Guide

What to leave out of your model

The real world has properties that don’t matter (to your application)

The real world has relationships that don’t matter

Things happen in the real world that don’t matter

Keep it simple• If you can’t say why you need it, leave it out

Page 17: Database Design, a Practical Guide

© 2007 Progress Software Corporation17 Database Design: A Practical Guide

Logical vs Physical Data Models

Logical entities often require multiple tables to represent them• Tables can be thought of as logical or physical• It depends on your point of view

There is also the physical storage database layout• storage areas• data extents• disks• etc.

We aren’t going to talk about the physical database layout

We will talk about tables

Page 18: Database Design, a Practical Guide

© 2007 Progress Software Corporation18 Database Design: A Practical Guide

Mapping Your Model to a Database

Entities become tables• Identifiers become indexes

Attributes become columns• Data types: pick appropriate

Relationships become tables or foreign keys

Simply put,

Page 19: Database Design, a Practical Guide

© 2007 Progress Software Corporation19 Database Design: A Practical Guide

“In theory, there is no difference betweentheory and practice, but in practice there is.”

Jan van de Snepscheut

Page 20: Database Design, a Practical Guide

© 2007 Progress Software Corporation20 Database Design: A Practical Guide

Now for some practice.

Page 21: Database Design, a Practical Guide

© 2007 Progress Software Corporation21 Database Design: A Practical Guide

An example

A Music store (perhaps the pTunes store)• Buys compact disc recordings from

distributors

• Has inventory

• Allows customers to search for what they want– Maybe in an in-store kiosk or on the web

• Sells compact discs to customers

Page 22: Database Design, a Practical Guide

© 2007 Progress Software Corporation22 Database Design: A Practical Guide

What should we do first?

Page 23: Database Design, a Practical Guide

© 2007 Progress Software Corporation23 Database Design: A Practical Guide

Activities

We buy discs from a distributor Orders are sent to a distributor Orders are delivered to the store Orders may be cancelled We sell discs to customers in sales transactions Customers buy discs in sales transactions Customers search for what they want to buy

Which of these must be remembered by the system?

Page 24: Database Design, a Practical Guide

© 2007 Progress Software Corporation24 Database Design: A Practical Guide

What do we need to keep track of

Discs we have Discs we sold Discs we know about and can get Discs we have ordered Information needed to do our income tax

• what we paid for stock• when we bought it• what we sold it for• when we sold it

Page 25: Database Design, a Practical Guide

© 2007 Progress Software Corporation25 Database Design: A Practical Guide

Disc entities

UPC Code: 8697-07416-2 Manufacturer: Sony BMG Cost to us: $ 2.00 Price charged: $ 17.95 Tax charged: $ 0.80 Date purchased: March 19, 2007 Date sold: June 9, 2007

Page 26: Database Design, a Practical Guide

© 2007 Progress Software Corporation26 Database Design: A Practical Guide

Disc table might look like this

upc manufacturer cost price tax datePurch dateSold

8697-07416-2 Sony BMG 2.00 17.95 0.90 2007-03-19 2007-06-09

8697-07416-2 Sony BMG 2.00 ? ? 2007-06-09 ?

314-510347-2 Island Records 2.21 15.95 0.80 2006-01-12 2007-02-14

314-510347-2 Island Records 2.21 ? ? 2006-01-12

Page 27: Database Design, a Practical Guide

© 2007 Progress Software Corporation27 Database Design: A Practical Guide

What’s wrong?

Is upc a unique identifier? Might have bought from a distributor Have no information about what is on the disc

• How do customers search? Don’t know when disc was made Could be more than one tax jurisdiction

• provincial tax, city tax Don’t know if disc is on order Don’t know who bought it Duplicated data Etc., etc.

Page 28: Database Design, a Practical Guide

© 2007 Progress Software Corporation28 Database Design: A Practical Guide

Disc entities take 2

UPC Code: 8697-07416-2 Manufacturer: Sony BMG Distributor: Bob’s Wholesale CD’s Cost to us: $ 2.00 Price charged: $ 17.95 Tax charged: $ 0.80 Date ordered: March 19, 2007 Date received: March 20, 2007 Date sold: June 9, 2007 Disc Title: “The Essential Joshua Bell” Artist: Joshua Bell Track 1: “Danse Russe” Track 2: “Violin Concerto in E Minor” Track 3: “Nocturne in C-sharp Minor” etc.

Page 29: Database Design, a Practical Guide

© 2007 Progress Software Corporation29 Database Design: A Practical Guide

Example: Now What’s wrong?

This is getting messy Activities combined with disc’s attributes Have duplicated information How many tracks can there be? What if there is more than one artist? Don’t have all the information a customer

might want to use to search

Page 30: Database Design, a Practical Guide

© 2007 Progress Software Corporation30 Database Design: A Practical Guide

Discs revisited

Discs have titles Discs have pictures on the cover Discs contain tracks Discs are made by manufacturers Discs are purchased from distributors Discs are ordered from distributors Discs are delivered to the store Discs are sold to customers

Page 31: Database Design, a Practical Guide

© 2007 Progress Software Corporation31 Database Design: A Practical Guide

“Discs contain tracks …”

Tracks contain songs Tracks occur in order Tracks have a duration Songs are performed in performances Songs have performers (usually) Songs have composers Songs have names (titles) Songs have a key (but not always) Performances are done by performers Performers can be groups (bands, orchestras, etc.) Performances are performed in a location or venue

Page 32: Database Design, a Practical Guide

© 2007 Progress Software Corporation32 Database Design: A Practical Guide

We seem to need these entities

Discs Manufacturers Distributors Orders Customers Inventory

Tracks Songs Performers Groups ?

Page 33: Database Design, a Practical Guide

© 2007 Progress Software Corporation33 Database Design: A Practical Guide

Songs have names (titles).

Are names properties of songs?

Or are they entities related to songs?

Or are they something else?

Page 34: Database Design, a Practical Guide

© 2007 Progress Software Corporation34 Database Design: A Practical Guide

Song data (track 1)

Title “Danse Russe” from Swan Lake, Op.20

Time 4:30

Composer Peter Tchaikovsky

Category Classical, violin, orchestra

Performers Joshua Bell, Michael Tilson Thomas, Berlin Philharminic Orchestra

Track number 1

Disc upc 8697-07416-2

Page 35: Database Design, a Practical Guide

© 2007 Progress Software Corporation35 Database Design: A Practical Guide

Song data (track 2)

Title Violin Concerto in E Minor, Op. 64

Time 6:27

Composer Felix Mendelssohn

Category Classical, violin, orchestra

Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg

Track number 2

Disc upc 8697-07416-2

Page 36: Database Design, a Practical Guide

© 2007 Progress Software Corporation36 Database Design: A Practical Guide

Performance data

Title Violin Concerto in E Minor, Op. 64

Time 6:27

Composer Felix Mendelssohn

Category Classical, violin, orchestra

Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg

Page 37: Database Design, a Practical Guide

© 2007 Progress Software Corporation37 Database Design: A Practical Guide

Performance data take 2

Title Violin Concerto in E Minor, Op. 64

Time 6:27

Composer Felix Mendelssohn

Category Classical, violin, orchestra

Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg

Performance Date

?

Performance Location

?

Page 38: Database Design, a Practical Guide

© 2007 Progress Software Corporation38 Database Design: A Practical Guide

Performer data

id name

1 Joshua Bell

2 Sir Roger Norrington

3 Camerata Salzburg

4 Michael Tilson Thomas

5 Berlin Philharmonic

6 Bono

7 The Edge

8 Adam Clayton

9 Larry Mullen

Page 39: Database Design, a Practical Guide

© 2007 Progress Software Corporation39 Database Design: A Practical Guide

Performance to Performer Relationship

performance id performer id

1 1

1 2

1 3

1 …

2 1

2 4

2 5

2 …

325 6

325 7

325 8

325 9

Page 40: Database Design, a Practical Guide

© 2007 Progress Software Corporation40 Database Design: A Practical Guide

Performance data take 3

Performance id 2

Title Violin Concerto in E Minor, Op. 64

Time 6:27

Composer Felix Mendelssohn

Category Classical, violin, orchestra

Page 41: Database Design, a Practical Guide

© 2007 Progress Software Corporation41 Database Design: A Practical Guide

Track to Performance Relationship

Disc upc Track Num Performance id

8697-07416-2 1 1

8697-07416-2 2 2

… … …

314-510347-2 1 325

Page 42: Database Design, a Practical Guide

© 2007 Progress Software Corporation42 Database Design: A Practical Guide

Relationships (so far):

disctrack

track

track

trackperformance

performance

performance

performance

performer

performerone to one

one to many

many to many

Page 43: Database Design, a Practical Guide

© 2007 Progress Software Corporation43 Database Design: A Practical Guide

What happened to Songs?

Page 44: Database Design, a Practical Guide

© 2007 Progress Software Corporation44 Database Design: A Practical Guide

Relationships (take 2):

disctrack

track

track

tracksong

performance

performance

performance

performer

performer

one to one

one to many

many to many

performance

performance

performance

song

one to many

Page 45: Database Design, a Practical Guide

© 2007 Progress Software Corporation45 Database Design: A Practical Guide

Relationships (take 3):

disc

track

track

track

performance

performance

performance

performer

performer

song

song

song

Page 46: Database Design, a Practical Guide

© 2007 Progress Software Corporation46 Database Design: A Practical Guide

What about“business entities”

?

Where are they?

Page 47: Database Design, a Practical Guide

© 2007 Progress Software Corporation47 Database Design: A Practical Guide

Business entities

disc

track

track

track

performance

performance

performance

performer

performer

song

song

song

Here is one kind of “business entity”

Page 48: Database Design, a Practical Guide

© 2007 Progress Software Corporation48 Database Design: A Practical Guide

Business entities

disc

track

track

track

performance

performance

performance

performer

performer

song

song

song

Here is a different kind of “business entity”

Page 49: Database Design, a Practical Guide

© 2007 Progress Software Corporation49 Database Design: A Practical Guide

Business entities

disc

track

track

track

performance

performance

performance

performer

performer

song

song

song

Here is still another kind of “business entity”

Page 50: Database Design, a Practical Guide

© 2007 Progress Software Corporation50 Database Design: A Practical Guide

Should you use arrays?

Page 51: Database Design, a Practical Guide

© 2007 Progress Software Corporation51 Database Design: A Practical Guide

Indexes

Enforce uniqueness Make searches faster Enable fast retrieval of entities by their

identities Enable finding entities with certain attributes

Page 52: Database Design, a Practical Guide

© 2007 Progress Software Corporation52 Database Design: A Practical Guide

What indexes do we needfor the music store database?

Page 53: Database Design, a Practical Guide

© 2007 Progress Software Corporation53 Database Design: A Practical Guide

Tables

0) Discs1) Tracks2) Songs3) Performers4) Performances5) Tracks of discs6) Performances of songs7) Performers of performances

Page 54: Database Design, a Practical Guide

© 2007 Progress Software Corporation54 Database Design: A Practical Guide

What indexes do we need

0) Indexes for identifying attributes1) A unique row identifier2) Indexes for the queries you will do

Page 55: Database Design, a Practical Guide

© 2007 Progress Software Corporation55 Database Design: A Practical Guide

What should we do next ?

Page 56: Database Design, a Practical Guide

© 2007 Progress Software Corporation56 Database Design: A Practical Guide

Other Topics

Normalization Unique keys Word indexes Naming Customisation

Page 57: Database Design, a Practical Guide

© 2007 Progress Software Corporation57 Database Design: A Practical Guide

Normalization

Oversimplified, it means:• Don’t duplicate data

Attributes should be simple• have only one value• be necessary• not derived data• don’t repeat

Complicated attributes are often entities in their own right• For example, addresses might be

Page 58: Database Design, a Practical Guide

© 2007 Progress Software Corporation58 Database Design: A Practical Guide

Unique keys

EVERY table must have a unique key EVERY row needs a unique identifier

• that never changes even if moved to another database (i.e. if you replicate)

Often, users don’t need to see it Use a UUID or sequence or maybe datetime Unique key is the ONLY way to identify rows

unambiguously ROWID’s are temporary and can change Use the same method throughout

• You’ll be glad you did

Page 59: Database Design, a Practical Guide

© 2007 Progress Software Corporation59 Database Design: A Practical Guide

Word indexes

Can be used to hold multiple status or attribute values• Conflicts with normalisation• Flexible

Easy to add new ones Queries are fast

Example:• Category: classical, violin, orchestral, concerto

Page 60: Database Design, a Practical Guide

© 2007 Progress Software Corporation60 Database Design: A Practical Guide

Naming

• What is in the column “GL01262” ?

Good names are crucial to understanding

Page 61: Database Design, a Practical Guide

© 2007 Progress Software Corporation61 Database Design: A Practical Guide

Naming

Table and column names should have clear meanings everyone can understand• “GL01262” vs “dateEntered”

Names with dashes cause inconvenience with SQL• “order-date”

Booleans should be named for truth value• “backOrdered”

No double negations• “notOutOfStock”

Good names are crucial to understanding

Page 62: Database Design, a Practical Guide

© 2007 Progress Software Corporation62 Database Design: A Practical Guide

Making tables customizable

Spare columns Separate table with spare columns Separate table with name/value pairs Name/value pairs in word-indexed column

We will look at 4 ways:

Page 63: Database Design, a Practical Guide

© 2007 Progress Software Corporation63 Database Design: A Practical Guide

Table and columns

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

Page 64: Database Design, a Practical Guide

© 2007 Progress Software Corporation64 Database Design: A Practical Guide

Spare columns in table

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

extra1 extra2 extra3

frozen ? 0.0

? 125.46 0.12

? ? ?

Page 65: Database Design, a Practical Guide

© 2007 Progress Software Corporation65 Database Design: A Practical Guide

Spare columns in table

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

extra1 extra2 extra3

frozen ? 0.0

? 125.46 0.12

? ? ?

What data types should you use?How many spare columns?Wasted columns when not usedHow do you know what each spare got used for?How do you know how many unused spares you have?

Page 66: Database Design, a Practical Guide

© 2007 Progress Software Corporation66 Database Design: A Practical Guide

Separate table for spare columns

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

custnum extra1 extra2 extra3

001 frozen ? 0.0

002 ? 125.46 0.12

Page 67: Database Design, a Practical Guide

© 2007 Progress Software Corporation67 Database Design: A Practical Guide

Separate table for spare columns

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

custnum status owed discount

001 frozen ? 0.0

002 ? 125.46 0.12

Page 68: Database Design, a Practical Guide

© 2007 Progress Software Corporation68 Database Design: A Practical Guide

Separate table with name/value pairs

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

custnum name value

001 status frozen

002 owed 125.46

002 discount 0.12

Page 69: Database Design, a Practical Guide

© 2007 Progress Software Corporation69 Database Design: A Practical Guide

Name/value pairs in word-indexed column

custnum name city

001 Bob Phoenix

002 Alice Boston

003 Eve Denver

extra1

status: frozen

owed: 125.46, discount: 0.12

?

Page 70: Database Design, a Practical Guide

© 2007 Progress Software Corporation70 Database Design: A Practical Guide

Modeling Tools

PCase Enterprise Architect Power Designer ConceptDraw Erwin Rational

Pencil and paper !

Blackboard !

Page 71: Database Design, a Practical Guide

© 2007 Progress Software Corporation71 Database Design: A Practical Guide

Summary

Understand the requirements Leave out what is not needed Review the design with stakeholders Evolve the design as changes come up Test to make sure it works

• Can it do everything that is needed?

• Does it perform adequately?

Expect changes to come

Page 72: Database Design, a Practical Guide

© 2007 Progress Software Corporation72 Database Design: A Practical Guide

Homework

Papers• Wiles, A.: "Modular elliptic curves and Fermat's Last

Theorem”, Annals of Mathematics 141 (3): 443-551• Chen, P.: “The Entity-Relationship Model -- Toward a

Unified View of Data”, ACM TODS Vol 1, No 1, 1976 Wikipedia articles to start from:

• entity-relationship model• data model

Books:• Teorey, Lightstone, Nadeau: “Database Modeling and

Design”, Morgan Kaufmann.

Page 73: Database Design, a Practical Guide

© 2007 Progress Software Corporation73 Database Design: A Practical Guide

Questions