data modeling fundamentals
TRANSCRIPT
Data Modeling Fundamentals
Version 1.1Cristi Salcescu
Subjects
• Relational Modeling• Dimensional Modeling• Object Modeling
What is data modeling?
• Apply structure• Organize
Relational Modeling
• Tables– Columns and– Rows
• Keys– Primary Key– Foreign key (Referential Integrity)– Surrogate Key– Composite Key• is a key that contains more than one column
Types of Relations
• One-to-Many• Many-to-One• Many-to-Many• One-to-One• Recursive
One-to-Many
PersonsId
LastName
FirstName
PoliciesId
Serial
Number
IssuedDate
BeginDate
EndDate
IdPerson
IdPolicyType
IdUser
Many-to-Many
One-to-One
PolciesHouseholdId
IdAddress
Age
Surface
RoomsNo
PoliciesId
Serial
Number
IssuedDate
BeginDate
EndDate
IdPerson
IdPolicyType
IdUser
PoliciesMotorId
ConstructionYear
CylCap
ChassisNo
PlateNo
Many-to-One
Self-Referencing
_CategoriesIdCategory
Name
IdParent
Normalization
• creates granularity• remove duplication• is a set of cumulative rules (Normal) Forms :
1st, 2nd, 3rd Normal Form• good for saving space, but I/O costs are cheap• bad for performance : Joins
1st Normal Form
• creates Many-to-One relation• removes duplication that occurs horizontally
2nd Normal Form
• Creates One-to-Many relation• removes duplication that occurs vertically
3rd Normal Form
• Creates Many-to-Many relation
4th Normal Form
• Creates a One-to-One relation• Separates NULL values
Insurance Policies - Car, Home and Life
Resources
Library of data modelsNormalizationSqlRelationship
OLTP vs OLAP
• OLTP : On-line Transaction Processing• OLAP : On-line Analytical Processing
Why Relational Model fails for Reporting?• too granular
• high concurrency (lots of users sharing small pieces at the same time)• too many tables : Joins are too big, SQL code too slow
OLTP
– recent data– daily basis– hundreds millions of users– high concurrency– designed for working with a single record/entity at
a time– highly “normalized”– getting data for a report involves many joins
OLAP
– huge amout of (historical) data– high speed to access huge amount of data– access many tables– low concurency : few users (top executives)– number of tables are reduced, reducing number of
joins– Data is de-normalized
Dimensional Modeling• Data Warehouse
– A gigantic storehouse of data– All data– Provides a long term storage of data– Aggregation of data from multiple systems – Reduce the load on the production system
• Facts– Transactional information– Hold numeric measures
• Dimensions– Hold the values that describe facts– Static information, or Slowly changing– Answer questions like : who, what, when, where?– Look up values
Fact table example
Denormalization
• removing Normal Forms• removes granularity• uses lots of space : I/O costs • good for performance• reduces the number of Joins• good for large database
3rd Normal Form
Denormalized
Relational Model
Denormalize facts tables
Snowflake Schema
Star Schema
Resources
• http://oracledba.ezpowell.com/oracle/papers/TheVeryBasicsOfDataWarehouseDesign.htm
Object Modeling
• a layer of objects that model the business area you're working in
UML
Unified Modeling LanguageThe most basic of UML diagrams is the Class Diagram. It describes classes and shows the relationships among them.
Types of Relations
• Inheritance• Association• Aggregation• Composition
Inheritance
class Relations
A
B
InheritanceA generalizes BB derives from A
Association
AssociationA uses B
Class fieldMethode parameterMethode Return TypeLocal variable
class Relations
A B
Aggregation
AggregationShared Association
A aggregates BB is part of A
class Relations
A B
class Relations
Airport Aircraft
Composition
CompositionNot-Shared Association
A is composed of B
class Relations
A B
class Relations
Person Le g
Domain Layer
Domain Layer– Introduced by Eric Evans, in his book “Domain Driven
Design – Tackling Complexity in the Heart of Software” @2003
– Entities• An object that is not defined by its attributes, but
rather by its identity– Value Objects
• An object that contains attributes but has no conceptual identity
Insurance – Relational Model
PersonsId
LastName
FirstName
PolciesHouseholdId
IdAddress
Age
Surface
RoomsNo
PoliciesId
Serial
Number
IssuedDate
BeginDate
EndDate
IdPerson
IdPolicyType
IdUser
PoliciesMotorId
ConstructionYear
CylCap
ChassisNo
PlateNo
Insurance – Object Model
Resources
• http://aviadezra.blogspot.com/2009/05/uml-association-aggregation-composition.html
Data Flow between the 3 Modelspkg Models
Domain M odel
Relational Model Dimens ional Model
Tables
Fac ts
Dimensions
Enti ties
ValueObjects
«flow»
«flow» «flow»
ORM/ ETL
• ORM (Object-relational mapping) http://www.agiledata.org/essays/mappingObjects.html
• ETL (Extract, transform and load)
Summary
• Relational Modeling– Tables (columns, rows)– Types of Relations– Normal Forms
• Dimensional Modeling– Facts and Dimensions– De-Normalization
• Object Modeling– Entities and Values Objects– Inheritance, Aggregation, Association
Resources
• VTC – Data Modeling• Pluralsight - Introduction to Data Warehousing