database design, a practical guide
DESCRIPTION
Database Design, a Practical Guide. Gus Bj örklund ([email protected]). Wizard, Progress Software Corporation. Ask questions as we go if I am not being clear. Warning: there is a mistake in these slides. Rules are made to be broken. To every rule, there is an exception!. Topics. Theory: - PowerPoint PPT PresentationTRANSCRIPT
Database Design, a Practical Guide
Gus Björklund ([email protected])Wizard, Progress Software Corporation
© 2007 Progress Software Corporation2 Database Design: A Practical Guide
Ask questions as we goif I am not being clear.
Warning: there is a mistake in these slides.
© 2007 Progress Software Corporation3 Database Design: A Practical Guide
To every rule, there is an exception!
Rules are made to be broken
© 2007 Progress Software Corporation4 Database Design: A Practical Guide
Topics
Theory:• What is Database Design
• Basic Elements
• Representing the data model as Tables
Practice• An Example
Some Other Topics
© 2007 Progress Software Corporation5 Database Design: A Practical Guide
First, a little theory
© 2007 Progress Software Corporation6 Database Design: A Practical Guide
What do we mean by database design?
A process for defining a model of a subset of the “real1 “ world, then representing it as data in tables in a relational database
At least, that’s the definition we will use for the purposes of this talk.
1: Well, for small values of real, anyway.
© 2007 Progress Software Corporation7 Database Design: A Practical Guide
Basic Elements
Just 3 Things:• Entities
• Attributes
• Relationships
What do we put in our model?
The “entity-relationship model” was described by Peter Chen in 1976.
See http://bit.csc.lsu.edu/~chen/chen.html
© 2007 Progress Software Corporation8 Database Design: A Practical Guide
Basic Elements 1: Entities
Can be thought of as nouns• People
– author, composer, performer, seller, buyer
• Places– home, IP address, URL, destination, factory,
store
• Things– song, recording, instrument, car, invoice
Is “telephone number” a place or a thing?
© 2007 Progress Software Corporation9 Database Design: A Practical Guide
Basic Elements 2: Attributes
Can be thought of as adjectives (but only loosely):• Length• Color• Horsepower• Part number• Song Title• Publication Date• Size• Fabric• Owner
Is “telephone number” a attribute or an entity?
Entities have attributes
© 2007 Progress Software Corporation10 Database Design: A Practical Guide
Basic Elements 3: Relationships
Can be thought of as verbs:• has a• owns• contains• supervises• performs• called• sold• purchased• proved
Entities are connected by relationships
Is “telephone number” a relationship?
© 2007 Progress Software Corporation11 Database Design: A Practical Guide
In May 1995,
Andrew Wiles
published
a proof
of Fermat’s Last Theorem
Relationships have attributes too
© 2007 Progress Software Corporation12 Database Design: A Practical Guide
In May 1995,
Andrew Wiles
published
a proof
of Fermat’s Last Theorem
Relationships have attributes too
entity
entity
relationship
attribute
© 2007 Progress Software Corporation13 Database Design: A Practical Guide
What goes in an entity ?
Identifying attributes• Must be able to uniquely identify the entity
• Can have more than one way to id
• Id can be composite
Descriptive attributes• the values you need to keep track of
• generally should be simple, not complex
© 2007 Progress Software Corporation14 Database Design: A Practical Guide
What to include in your model
The things your application has to keep track of• Telephones, wires, switches
The actions your application or its users perform• Make calls, send telephone bills, collect payments
Some attributes of the things and actions• Originating number, date and time of call, duration, called
number
Keep it simple Be accurate Keep it up to date
© 2007 Progress Software Corporation15 Database Design: A Practical Guide
What to include in your model
Consider the goals of the system Everything you include should be there for a
reason you can state• in no more than two sentences
Everything should have a clear name• if you can’t name it, it doesn’t belong
Talk to the stakeholders !!!
© 2007 Progress Software Corporation16 Database Design: A Practical Guide
What to leave out of your model
The real world has properties that don’t matter (to your application)
The real world has relationships that don’t matter
Things happen in the real world that don’t matter
Keep it simple• If you can’t say why you need it, leave it out
© 2007 Progress Software Corporation17 Database Design: A Practical Guide
Logical vs Physical Data Models
Logical entities often require multiple tables to represent them• Tables can be thought of as logical or physical• It depends on your point of view
There is also the physical storage database layout• storage areas• data extents• disks• etc.
We aren’t going to talk about the physical database layout
We will talk about tables
© 2007 Progress Software Corporation18 Database Design: A Practical Guide
Mapping Your Model to a Database
Entities become tables• Identifiers become indexes
Attributes become columns• Data types: pick appropriate
Relationships become tables or foreign keys
Simply put,
© 2007 Progress Software Corporation19 Database Design: A Practical Guide
“In theory, there is no difference betweentheory and practice, but in practice there is.”
Jan van de Snepscheut
© 2007 Progress Software Corporation20 Database Design: A Practical Guide
Now for some practice.
© 2007 Progress Software Corporation21 Database Design: A Practical Guide
An example
A Music store (perhaps the pTunes store)• Buys compact disc recordings from
distributors
• Has inventory
• Allows customers to search for what they want– Maybe in an in-store kiosk or on the web
• Sells compact discs to customers
© 2007 Progress Software Corporation22 Database Design: A Practical Guide
What should we do first?
© 2007 Progress Software Corporation23 Database Design: A Practical Guide
Activities
We buy discs from a distributor Orders are sent to a distributor Orders are delivered to the store Orders may be cancelled We sell discs to customers in sales transactions Customers buy discs in sales transactions Customers search for what they want to buy
Which of these must be remembered by the system?
© 2007 Progress Software Corporation24 Database Design: A Practical Guide
What do we need to keep track of
Discs we have Discs we sold Discs we know about and can get Discs we have ordered Information needed to do our income tax
• what we paid for stock• when we bought it• what we sold it for• when we sold it
© 2007 Progress Software Corporation25 Database Design: A Practical Guide
Disc entities
UPC Code: 8697-07416-2 Manufacturer: Sony BMG Cost to us: $ 2.00 Price charged: $ 17.95 Tax charged: $ 0.80 Date purchased: March 19, 2007 Date sold: June 9, 2007
© 2007 Progress Software Corporation26 Database Design: A Practical Guide
Disc table might look like this
upc manufacturer cost price tax datePurch dateSold
8697-07416-2 Sony BMG 2.00 17.95 0.90 2007-03-19 2007-06-09
8697-07416-2 Sony BMG 2.00 ? ? 2007-06-09 ?
314-510347-2 Island Records 2.21 15.95 0.80 2006-01-12 2007-02-14
314-510347-2 Island Records 2.21 ? ? 2006-01-12
© 2007 Progress Software Corporation27 Database Design: A Practical Guide
What’s wrong?
Is upc a unique identifier? Might have bought from a distributor Have no information about what is on the disc
• How do customers search? Don’t know when disc was made Could be more than one tax jurisdiction
• provincial tax, city tax Don’t know if disc is on order Don’t know who bought it Duplicated data Etc., etc.
© 2007 Progress Software Corporation28 Database Design: A Practical Guide
Disc entities take 2
UPC Code: 8697-07416-2 Manufacturer: Sony BMG Distributor: Bob’s Wholesale CD’s Cost to us: $ 2.00 Price charged: $ 17.95 Tax charged: $ 0.80 Date ordered: March 19, 2007 Date received: March 20, 2007 Date sold: June 9, 2007 Disc Title: “The Essential Joshua Bell” Artist: Joshua Bell Track 1: “Danse Russe” Track 2: “Violin Concerto in E Minor” Track 3: “Nocturne in C-sharp Minor” etc.
© 2007 Progress Software Corporation29 Database Design: A Practical Guide
Example: Now What’s wrong?
This is getting messy Activities combined with disc’s attributes Have duplicated information How many tracks can there be? What if there is more than one artist? Don’t have all the information a customer
might want to use to search
© 2007 Progress Software Corporation30 Database Design: A Practical Guide
Discs revisited
Discs have titles Discs have pictures on the cover Discs contain tracks Discs are made by manufacturers Discs are purchased from distributors Discs are ordered from distributors Discs are delivered to the store Discs are sold to customers
© 2007 Progress Software Corporation31 Database Design: A Practical Guide
“Discs contain tracks …”
Tracks contain songs Tracks occur in order Tracks have a duration Songs are performed in performances Songs have performers (usually) Songs have composers Songs have names (titles) Songs have a key (but not always) Performances are done by performers Performers can be groups (bands, orchestras, etc.) Performances are performed in a location or venue
© 2007 Progress Software Corporation32 Database Design: A Practical Guide
We seem to need these entities
Discs Manufacturers Distributors Orders Customers Inventory
Tracks Songs Performers Groups ?
© 2007 Progress Software Corporation33 Database Design: A Practical Guide
Songs have names (titles).
Are names properties of songs?
Or are they entities related to songs?
Or are they something else?
© 2007 Progress Software Corporation34 Database Design: A Practical Guide
Song data (track 1)
Title “Danse Russe” from Swan Lake, Op.20
Time 4:30
Composer Peter Tchaikovsky
Category Classical, violin, orchestra
Performers Joshua Bell, Michael Tilson Thomas, Berlin Philharminic Orchestra
Track number 1
Disc upc 8697-07416-2
© 2007 Progress Software Corporation35 Database Design: A Practical Guide
Song data (track 2)
Title Violin Concerto in E Minor, Op. 64
Time 6:27
Composer Felix Mendelssohn
Category Classical, violin, orchestra
Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg
Track number 2
Disc upc 8697-07416-2
© 2007 Progress Software Corporation36 Database Design: A Practical Guide
Performance data
Title Violin Concerto in E Minor, Op. 64
Time 6:27
Composer Felix Mendelssohn
Category Classical, violin, orchestra
Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg
© 2007 Progress Software Corporation37 Database Design: A Practical Guide
Performance data take 2
Title Violin Concerto in E Minor, Op. 64
Time 6:27
Composer Felix Mendelssohn
Category Classical, violin, orchestra
Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg
Performance Date
?
Performance Location
?
© 2007 Progress Software Corporation38 Database Design: A Practical Guide
Performer data
id name
1 Joshua Bell
2 Sir Roger Norrington
3 Camerata Salzburg
4 Michael Tilson Thomas
5 Berlin Philharmonic
6 Bono
7 The Edge
8 Adam Clayton
9 Larry Mullen
© 2007 Progress Software Corporation39 Database Design: A Practical Guide
Performance to Performer Relationship
performance id performer id
1 1
1 2
1 3
1 …
2 1
2 4
2 5
2 …
325 6
325 7
325 8
325 9
© 2007 Progress Software Corporation40 Database Design: A Practical Guide
Performance data take 3
Performance id 2
Title Violin Concerto in E Minor, Op. 64
Time 6:27
Composer Felix Mendelssohn
Category Classical, violin, orchestra
© 2007 Progress Software Corporation41 Database Design: A Practical Guide
Track to Performance Relationship
Disc upc Track Num Performance id
8697-07416-2 1 1
8697-07416-2 2 2
… … …
314-510347-2 1 325
© 2007 Progress Software Corporation42 Database Design: A Practical Guide
Relationships (so far):
disctrack
track
track
trackperformance
performance
performance
performance
performer
performerone to one
one to many
many to many
© 2007 Progress Software Corporation43 Database Design: A Practical Guide
What happened to Songs?
© 2007 Progress Software Corporation44 Database Design: A Practical Guide
Relationships (take 2):
disctrack
track
track
tracksong
performance
performance
performance
performer
performer
one to one
one to many
many to many
performance
performance
performance
song
one to many
© 2007 Progress Software Corporation45 Database Design: A Practical Guide
Relationships (take 3):
disc
track
track
track
performance
performance
performance
performer
performer
song
song
song
© 2007 Progress Software Corporation46 Database Design: A Practical Guide
What about“business entities”
?
Where are they?
© 2007 Progress Software Corporation47 Database Design: A Practical Guide
Business entities
disc
track
track
track
performance
performance
performance
performer
performer
song
song
song
Here is one kind of “business entity”
© 2007 Progress Software Corporation48 Database Design: A Practical Guide
Business entities
disc
track
track
track
performance
performance
performance
performer
performer
song
song
song
Here is a different kind of “business entity”
© 2007 Progress Software Corporation49 Database Design: A Practical Guide
Business entities
disc
track
track
track
performance
performance
performance
performer
performer
song
song
song
Here is still another kind of “business entity”
© 2007 Progress Software Corporation50 Database Design: A Practical Guide
Should you use arrays?
© 2007 Progress Software Corporation51 Database Design: A Practical Guide
Indexes
Enforce uniqueness Make searches faster Enable fast retrieval of entities by their
identities Enable finding entities with certain attributes
© 2007 Progress Software Corporation52 Database Design: A Practical Guide
What indexes do we needfor the music store database?
© 2007 Progress Software Corporation53 Database Design: A Practical Guide
Tables
0) Discs1) Tracks2) Songs3) Performers4) Performances5) Tracks of discs6) Performances of songs7) Performers of performances
© 2007 Progress Software Corporation54 Database Design: A Practical Guide
What indexes do we need
0) Indexes for identifying attributes1) A unique row identifier2) Indexes for the queries you will do
© 2007 Progress Software Corporation55 Database Design: A Practical Guide
What should we do next ?
© 2007 Progress Software Corporation56 Database Design: A Practical Guide
Other Topics
Normalization Unique keys Word indexes Naming Customisation
© 2007 Progress Software Corporation57 Database Design: A Practical Guide
Normalization
Oversimplified, it means:• Don’t duplicate data
Attributes should be simple• have only one value• be necessary• not derived data• don’t repeat
Complicated attributes are often entities in their own right• For example, addresses might be
© 2007 Progress Software Corporation58 Database Design: A Practical Guide
Unique keys
EVERY table must have a unique key EVERY row needs a unique identifier
• that never changes even if moved to another database (i.e. if you replicate)
Often, users don’t need to see it Use a UUID or sequence or maybe datetime Unique key is the ONLY way to identify rows
unambiguously ROWID’s are temporary and can change Use the same method throughout
• You’ll be glad you did
© 2007 Progress Software Corporation59 Database Design: A Practical Guide
Word indexes
Can be used to hold multiple status or attribute values• Conflicts with normalisation• Flexible
Easy to add new ones Queries are fast
Example:• Category: classical, violin, orchestral, concerto
© 2007 Progress Software Corporation60 Database Design: A Practical Guide
Naming
• What is in the column “GL01262” ?
Good names are crucial to understanding
© 2007 Progress Software Corporation61 Database Design: A Practical Guide
Naming
Table and column names should have clear meanings everyone can understand• “GL01262” vs “dateEntered”
Names with dashes cause inconvenience with SQL• “order-date”
Booleans should be named for truth value• “backOrdered”
No double negations• “notOutOfStock”
Good names are crucial to understanding
© 2007 Progress Software Corporation62 Database Design: A Practical Guide
Making tables customizable
Spare columns Separate table with spare columns Separate table with name/value pairs Name/value pairs in word-indexed column
We will look at 4 ways:
© 2007 Progress Software Corporation63 Database Design: A Practical Guide
Table and columns
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
© 2007 Progress Software Corporation64 Database Design: A Practical Guide
Spare columns in table
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
extra1 extra2 extra3
frozen ? 0.0
? 125.46 0.12
? ? ?
© 2007 Progress Software Corporation65 Database Design: A Practical Guide
Spare columns in table
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
extra1 extra2 extra3
frozen ? 0.0
? 125.46 0.12
? ? ?
What data types should you use?How many spare columns?Wasted columns when not usedHow do you know what each spare got used for?How do you know how many unused spares you have?
© 2007 Progress Software Corporation66 Database Design: A Practical Guide
Separate table for spare columns
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
custnum extra1 extra2 extra3
001 frozen ? 0.0
002 ? 125.46 0.12
© 2007 Progress Software Corporation67 Database Design: A Practical Guide
Separate table for spare columns
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
custnum status owed discount
001 frozen ? 0.0
002 ? 125.46 0.12
© 2007 Progress Software Corporation68 Database Design: A Practical Guide
Separate table with name/value pairs
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
custnum name value
001 status frozen
002 owed 125.46
002 discount 0.12
© 2007 Progress Software Corporation69 Database Design: A Practical Guide
Name/value pairs in word-indexed column
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
extra1
status: frozen
owed: 125.46, discount: 0.12
?
© 2007 Progress Software Corporation70 Database Design: A Practical Guide
Modeling Tools
PCase Enterprise Architect Power Designer ConceptDraw Erwin Rational
Pencil and paper !
Blackboard !
© 2007 Progress Software Corporation71 Database Design: A Practical Guide
Summary
Understand the requirements Leave out what is not needed Review the design with stakeholders Evolve the design as changes come up Test to make sure it works
• Can it do everything that is needed?
• Does it perform adequately?
Expect changes to come
© 2007 Progress Software Corporation72 Database Design: A Practical Guide
Homework
Papers• Wiles, A.: "Modular elliptic curves and Fermat's Last
Theorem”, Annals of Mathematics 141 (3): 443-551• Chen, P.: “The Entity-Relationship Model -- Toward a
Unified View of Data”, ACM TODS Vol 1, No 1, 1976 Wikipedia articles to start from:
• entity-relationship model• data model
Books:• Teorey, Lightstone, Nadeau: “Database Modeling and
Design”, Morgan Kaufmann.
© 2007 Progress Software Corporation73 Database Design: A Practical Guide
Questions