relational data modeling mis2502 data analytics. what is a model? representation of something in the...

31
RELATIONAL DATA MODELING MIS2502 Data Analytics

Upload: abigail-nash

Post on 06-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

Modeling a database A representation of the structure of the data Describes the data contained in the database Explains how the data interrelates A student is part of a section, which is part of a course

TRANSCRIPT

Page 1: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

RELATIONAL DATA MODELINGMIS2502Data Analytics

Page 2: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

What is a model?• Representation of something in the real world

Page 3: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Modeling a database• A representation of the structure of the data

• Describes the data contained in the database

• Explains how the data interrelates• A student is part of a section, which is part of a course

Page 4: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Why bother modeling?• Creates a blueprint before you start building the database

• Gets the story straight: easy for non-technical people to understand

• Minimize having to go back and make changes in the implementation stage

Page 5: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

The process of analysis and design• Systems Analysis

• Analysis of complex, large-scale systems and the interactions within those systemshttp://en.wikipedia.org/wiki/Systems_analysis

• Systems Design• The process of defining the hardware

and software architectures, components, models, interfaces, and data for a computer system to satisfy specified requirementshttp://en.wikipedia.org/wiki/Systems_design

Notice that they are not the

same!

Page 6: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Basically…

• Systems Analysis is the process of modeling the problem• Requirements-oriented • What should we do?

• Systems Design is the process of modeling a solution• Functionality-oriented• How should we do it?

This is where we define and understand the business scenario.

This is where we implement that scenario as a

database.

In the context of database development.

Page 7: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Start with a problem statement• “We want a database to track orders.”

• That’s too vague to create a useful system, so we then gather requirements to learn more

• Gather documentation• About the business process• About existing systems

• Conduct interviews• Employees directly involved in the process• Other stakeholders (i.e., customers)• Management

Why are each of these

important?

Are there others?

Page 8: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Start with a problem statement• Refine the problem statement

• Getting iterative feedback from the client

• End up with a scenario like this:• The system must track customer orders• Multiple products can go into an order• A customer is described by their name, address, and a unique

Customer ID number• An order is described by the date in which it was placed, what was

bought, and how much it costs

The specification “what was bought” is a little vague, and that will cause us a problem a little later.

But let’s leave it for now…

Page 9: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

The Entity Relationship Diagram (ERD)

• The primary way of modeling a relational database• Part of the “analysis” process

• Implemented as a picture with three key elements

Entity

Relationship

A uniquely identifiable thing (i.e., person, order)

Describes how two entities relate to one another (i.e., makes)

AttributeA characteristic of an entity or relationship (i.e., first name, order number)

Page 10: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

A very simple example

Customer

First name

makes Order

Last name

City

State Zip PriceProduct name

Order Date

Order number

CustomerID

Page 11: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

The primary key• Entities need to be uniquely identifiable

• So you can tell them apart when you retrieve them• Use a primary key

• An attribute (or a set of attributes) that uniquely identifies an entity

Order number

CustomerID

Uniquely identifies a customer

Uniquely identifies an order

How about these as primary keys for Customer:

First name and/or last name?

Social security number?

Page 12: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Last component: Cardinality• Defines the rules of the association between entities

Customer makes Order

This is a one-to-many relationship:One customer can have many orders

One order can only belong to one customer

at least – oneat most - many

at least – oneat most - one

Page 13: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Crows Feet Notation

Customer makes Order

There are other ways of denoting cardinality, but this one is pretty standard.

So called because this…

…looks something like this

There are also variations of the crows feet notion!

Page 14: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Cardinality is defined by business rules

• What would the cardinality be in these situations?

Order contains Product? ?

Course has Section? ?

Employee has Office? ?

Page 15: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

But we have a problem with our ERD

This assumes every order contains only one product.So if I want two products, I have to make two orders!

The problem: Product is defined as an attribute, not an entity.(Because we didn’t define our requirements clearly enough?)

Customer

First name

makes Order

Last name

City

State Zip PriceProduct name

Order Date

Order number

CustomerID

Page 16: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Here’s a solution

• Now• A customer can make multiple orders• An order can contain multiple products• A product can be part of multiple

orders

Customer

First name

makes Order

Last name

City

State Zip

Price

Product name

Order Date

Order number

Product

contains

CustomerID

Quantity

Page 17: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Implementing the ERD• As a database schema

• A map of the tables and fields in the database• This is what is implemented in the database

management system• Part of the “design” process

• A schema actually looks a lot like the ERD• Entities become tables• Attributes become fields• Relationships can become additional tables

Page 18: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Structure of a databaseData element DescriptionCharacter Single letter or number

(“A”, “Z”, “1”)Field Set of related characters

(first name)Record Set of related fields

(all information about a customer)Table Set of related records

(all customers in the company)Database Set of related tables

(all information about the company)

Page 19: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

The Rules

• Primary key field of “1” table put into “many” table as foreign key field

1:many relationships

• Create new table• 1:many relationships with original tables

many:many relationships

• Primary key field of one table put into other table as foreign key field

1:1 relationships

1. Create a table for every entity

2. Create table fields for every entity’s attributes

3. Implement relationships between the tables

Page 20: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Our Order Database schema

• Order-Product is a decomposed many-to-many relationship• Order-Product has a 1:n relatonship with Order and Product• Now an order can have multiple products, and a product can be

associated with multiple orders

Original 1:n relationship

Original n:n relationship

Page 21: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

What the Customer and Order tables look like

CustomerID FirstName LastName City State Zip

1001 Greg House Princeton NJ 09120

1002 Lisa Cuddy Plainsboro NJ 09123

1003 James Wilson Pittsgrove NJ 09121

1004 Eric Foreman Warminster PA 19111

OrderNumber

OrderDate CustomerID

101 3-2-2011 1001

102 3-3-2011 1002

103 3-4-2011 1001

104 3-6-2011 1004

Note that there are no repeating records

Every customer is uniqueEvery order is unique

This is an example of normalization.

Customer Table

Order Table

Page 22: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Normalization• Organizing data to minimize redundancy (repeated data)

• This is good for two reasons• The database takes up less space• You have a lower chance of inconsistencies in your data

• If you want to make a change to a record, you only have to make it in one place• The relationships take care of the rest

• But you will usually need to link the separate tables together in order to retrieve information

Page 23: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

To figure out who ordered what• Match the Customer IDs of the two tables, starting with the table

with the foreign key (Order):

• We now know which order belonged to which customer• This is called a join

• But it’s an inefficient way to store data (redundancies)• So we normalize

Order Number

OrderDate Customer ID

Customer ID

FirstName LastName City State Zip

101 3-2-2011 1001 1001 Greg House Princeton NJ 09120

102 3-3-2011 1002 1002 Lisa Cuddy Plainsboro NJ 09123

103 3-4-2011 1001 1001 Greg House Princeton NJ 09120

104 3-6-2011 1004 1004 Eric Foreman Warminster PA 19111

Order Table Customer Table

Page 24: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Now the many:many relationship

OrderNumber

OrderDate Customer ID

101 3-2-2011 1001

102 3-3-2011 1002

103 3-4-2011 1001

104 3-6-2011 1004

Order Table

ProductID ProductName Price

2251 Cheerios 3.99

2282 Bananas 1.29

2505 Eggo Waffles 2.99

Product Table

OrderProductID

Order number

Product ID Quantity

1 101 2251 2

2 101 2282 3

3 101 2505 1

4 102 2251 5

5 102 2282 2

6 103 2505 3

7 104 2505 8

Order-Product Table

This table relates Order and Product to

each other!

Page 25: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

To figure out what each order contains

• Match the Product IDs and Order IDs of the tables, starting with the table with the foreign keys (Order-Product):

OrderProductID

OrderNumber

Product ID

Quantity Order Number

Order Date

Customer ID

Product ID

Product Name

Price

1 101 2251 2 101 3-2-2011 1001 2251 Cheerios 3.99

2 101 2282 3 101 3-2-2011 1001 2282 Bananas 1.29

3 101 2505 1 101 3-2-2011 1001 2505 Eggo Waffles 2.99

4 102 2251 5 102 3-3-2011 1002 2251 Cheerios 3.99

5 102 2282 2 102 3-3-2011 1002 2282 Bananas 1.29

6 103 2505 3 103 3-4-2011 1001 2505 Eggo Waffles 2.99

7 104 2505 8 104 3-6-2011 1004 2505 Eggo Waffles 2.99

Order-Product Table Order Table Product Table

Now there is redundant product data as a result of the join!

Page 26: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Why redundant data is a big dealCustomer ID

Product ID

Product Name

Price

1001 2251 Cheerios 3.99

1001 2282 Bananas 1.29

1001 2505 Eggo Waffles 2.99

1002 2251 Cheerios 3.99

1002 2282 Bananas 1.29

1001 2505 Eggo Waffles 2.99

1004 2505 Eggo Waffles 2.99

The redundant data seems harmless, but:

What if the price of “Eggo Waffles”

changes?

And what if Greg House changes his

address?

And if there are 1,000,000 records?

Order Number

Order Date Customer ID

Customer ID

First Name Last Name City State Zip

101 3-2-2011 1001 1001 Greg House Princeton NJ 09120

102 3-3-2011 1002 1002 Lisa Cuddy Plainsboro NJ 09123

103 3-4-2011 1001 1001 Greg House Princeton NJ 09120

104 3-6-2011 1004 1004 Eric Foreman Warminster PA 19111

Page 27: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Best practices for normalization• Create new entities when

there are collections of related attributes, especially when they would repeat

• For example, consider a modified Product entity

Price

Product nameProduct

Vendor Name

Vendor Address

Vendor Phone

Price

Product nameProduct

Vendor Name

Vendor Address

Vendor Phone

Vendor

sells

Don’t do this…

…do this.

Then you won’t have to repeat vendor

information for each product.

Vendor ID

? Why did we introduce VendorID?

Page 28: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

Best practices for normalization• Create new entities to

enforce data entry standards

Customer

First name

Last name

City

State Zip

CustomerID

Customer

First name

Last name

City ID

Zip

CustomerID

City

City Name

StateID

State

State Name

This is fine…

…but this can be even better.

! The city name is entered only once in the City table; CityID is used in Customer table

Page 29: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

City and State as “lookup tables”• Why this can be a better way of doing it

CustomerID FirstName LastName CityID StateID Zip

1001 Greg House 1 1 09120

1002 Lisa Cuddy 2 1 09123

1003 James Wilson 3 1 09121

1004 Eric Foreman 4 2 19111

CityID CityName

1 Princeton

2 Plainsboro

3 Pittsgrove

4 Warminster

StateID StateName Abbr

1 New Jersey NJ

2 Pennsylvania PA

This helps prevent inconsistent spellings

(Pennsylvania is always entered as “2”)

Customer

City

State

Customer

First name

Last name

City ID

Zip

CustomerID

City

City Name

StateID

State

State Name

Page 30: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

The three-way relationship• Sometimes

three entities are necessary to capture what happens in a transaction

• This would be modeled as an many-to-many-to-many relationship

Mechanic

Model

Make

Salary

Description

Car

Performs Repair dateRepair

Charge

NameEmployee ID

Repair code

VIN

Page 31: RELATIONAL DATA MODELING MIS2502 Data Analytics. What is a model? Representation of something in the real world

The many:many:many table• The many-to-many-to-many relationship would still be

represented as a separate table• Just with three foreign keys, instead of two

RepairID RepairCode EmployeeID VIN Repair date

1 101 9112 10192919201 2011-3-1

2 201 2313 19292919291 2011-3-2

3 302 1231 102010023 2011-3-3

4 223 2132 393848383 2011-3-4

Mechanic

Model

Make

Salary

Description

Car

Performs Repair dateRepair

Charge

NameEmployee ID

Repair code

VIN