Lecture #2October 5th, 2000
Conceptual Modeling• Administration:
– HW1 available– Details on projects– Exam date– XML comment
Building an Application with a Database System
• Requirements modeling (conceptual, pictures)– Decide what entities should be part of the application and
how they should be linked.
• Schema design and implementation– Decide on a set of tables, attributes.– Define the tables in the database system.– Populate database (insert tuples).
• Write application programs using the DBMS– way easier now that the data management is taken care
of.
Database Design
• Why do we need it?– Agree on structure of the database before
deciding on a particular implementation.
• Consider issues such as:– What entities to model– How entities are related– What constraints exist in the domain– How to achieve good designs
Database Design Formalisms
• Object Definition Language (ODL):– Closer in spirit to object-oriented models
• Entity/Relationship model (E/R):– More relational in nature.
• Both can be translated (semi-automatically) to relational schemas (with varying amount of pain).
• ODL to OO-schema: direct transformation (C++ or Smalltalk based system).
Outline
• ODL (rather briefly)
• E/R diagrams
• Some high-level design principles
• Modeling constraints
• Introduction to the relational model
• From E/R & ODL to relations
Object Definition Language
• Is part of ODMG, which also gave us OQL. • Resembles C++ (and Smalltalk).• Basic design paradigm in ODL:
– Model objects and their properties.
• For abstraction purposes:– Group objects into classes.
• What qualifies as a good class?– Objects should have common properties.
ODL Class Declarations
Interface <name> {
attributes: <type> <name>;
relationships <range type> <name>;
methods
}
Method example:
float gpa(in: Student) raises (noGrades)
Arbitrary function can compute the value of gpa, based on a
student object given as input.
ODL DeclarationsInterface Product { attribute string name; attribute float price; attribute enum Categories {electronics, communications, sports …} category }
Interface Company { attribute string name; attribute float stockprice; }Interface Person { attribute integer ssn; attribute string name; attribute Struct Address {string street, string city} address; }
ODL Example Extended
Product
Person
Company
category
name
price
namestockprice
name
address ssn
buys
worksFor
madeBy
ODL Declarations, ExtendedInterface Product { attribute string name; attribute float price; attribute enum Categories {electronics, communications, sports …} category; relationship <Company> madeBy; }
Interface Person { attribute integer ssn; attribute string name; attribute Struct Address {string street, string city} address; relationship set <Product> buys; relationship set <Company> worksFor;}
ODL Example, Extended Again
Product
Person
Company
category
name
price
namestockprice
name
address ssn
buys
worksFor
madeBy
employs
makes
ODL Declarations, Extended Again
Interface Company { attribute string name; attribute float stockprice;
relationship set <Product> makes inverse Product::madeBy;
relationship set <Person> employs inverse Person::worksFor; }
Types in ODL
Basic types: Atomic types (e.g., string, integer, …) Interface types (e.g., Person, Product, Company)
Constructors:
Set: (1, 5, 6) Bag: (1, 1, 5, 6, 6 ) List: (1, 5, 6, 1, 6 ) Array: Integer[17] Struct: {string street, string city, integer zipcode}
Allowable Types in ODLFor attributes: start with atomic or struct, and apply a collection type.
OK: string, set of integer, bag of Address. Not OK: Product, set of set of integer.
For relationships: start with interface type and apply a collection type.
OK: Product, set of Product, list of Person.
Not OK: struct {pname Product, cname Company} set of bag of Product integer
Entity / Relationship Diagrams
Objects entitiesClasses entity sets
Attributes are like in ODL.
Relationships: like in ODL except
- not associated with classes (I.e., first class citizens)
- not necessarily binary
Product
address
buys
Multi-way Relationships
Purchase
How do we model a purchase relationship between buyers,products and stores?
Product
Person
Store
Roles in Relationships
Purchase
What if we need an entity set twice in one relationship?
Product
Person
Store
salesperson buyer
Roles in Relationships
Purchase
Product
Person
Store
salesperson buyer
Note the multiplicity of the relationships: we cannot express all possibilities
What’s Wrong?
Purchase
Product
Person
Store
dateDates
Moral: don’t complicate life more than it already is.
Do we really need 3-way relationships?
Purchase
Person
Store
Product
StoreOf
ProductOf
BuyerOf
Moral: Find a nice way to say things.
Modeling Subclasses
The world is not flat!
Some objects in a class may have properties not shared by other members:
Products
Software products
Educational products
So --- we define subclasses (in ODL and in E/R).
Subclasses in ODL
Interface SoftwareProduct: Product{
attribute Set<string> platform; attribute Set<integer> requiredMemory;
}
Interface EducationalProduct: Product{
attribute Struct Interval {integer begin, integer end} ageGroup; attribute string topic}
The two classes also inherit all the properties of Product.
Product
name category
price
isa isa
Educational ProductSoftware Product
Age Groupplatforms
Subclasses in E/R Diagrams
Multiple Inheritance
Product
Educational Product
Educ-softwareProduct
Software Product
ageGrouptopic
Platformsrequired memory
Educational-method
How do we resolve conflicts?
Product
Educational Product
Educ-softwareProduct
Software Product
ageGrouptopic
Platformsrequired memory
Educational-method
Rating(ATA)
Rating(ASA) Rating?
Product
name category
price
isa isa
Educational ProductSoftware Product
Age Groupplatforms
In ODL: Every object belongs to a single class
In E/R: An entity may be spread out in multiple sets.
Modeling ConstraintsExtracting constraints is what modeling is all about. But how dowe express them?
Examples:
Keys: social security number uniquely identifies a person.
Single-value constraints: a person can have only one father.
Referential integrity constraints: if you work for a company, it must exist in the database.
Domain constraints: peoples’ ages are between 0 and 150.
Why are these constraints useful in the implementation?
KeysA set of attributes that uniquely identify an object or entity:
Person: social security number name name + address name + address + age
Perfect keys are often hard to find, so organizations usuallyinvent something.
An object may have multiple keys:
employee number, social-security number
Keys in ODL
Interface Person (key ssn) { properties… }
Defining multiple keys:
(key ssn employeID (name address age))
Keys in E/R Diagrams
address name ssn
Person
Product
name category
price
No formal way to specify multiple keys in E/R diagrams
Single Value ConstraintsAn entity (or object) may have at most one value for a given attribute or relationship.
Person: name, social-security numberCompany: stock price
How do we do this in ODL?
In E/R, every attribute has at most one value. Arrows tell us about multiplicity of relations.
If we have a single-valued constraint, we can either: 1. Require that the value exist (see referential integrity shortly) 2. Allow null values.
Referential Integrity ConstraintsA relationship has one value and the value must exist.
Example: Product madeBy Company: company must exist.
How do we enforce referential integrity constraints? (otherwise, we get dangling pointers)
- forbid to delete a reference object, or - delete the objects that reference an object we’re deleting.
CompanyProduct makes
In E/R diagrams:
Weak Entity SetsEntity sets are weak when their key attributes come from otherclasses to which they are related.
This happens if:
- part-of hierarchies - splitting n-ary relations to binary.
UniversityTeam affiliation
numbersport name
The Relational Data ModelDatabase Model(ODL, E/R)
Relational Schema
Physicalstorage
ODL definitions
Diagrams (E/R)
Tables: row names: attributes rows: tuples
Complexfile organizationand index structures.
Terminology
Name Price Category Manufacturer
gizmo $19.99 gadgets GizmoWorks
Power gizmo $29.99 gadgets GizmoWorks
SingleTouch $149.99 photography Canon
MultiTouch $203.99 household Hitachi
tuples
Attribute names
What can’t you say in the relational model?
More Terminology
Every attribute has an atomic type.
Relation Schema: relation name + attribute names + attribute types
Relation instance: a set of tuples. Only one copy of any tuple!
Database Schema: a set of relation schemas.
Database instance: a relation instance for every relation in the schema.
More on Tuples
Formally, a mapping from attribute names to (correctly typed) values:
name gizmo price $19.99 category gadgets manufacturer GizmoWorks
Sometimes we refer to a tuple by itself: (note order of attributes)
(gizmo, $19.99, gadgets, GizmoWorks) or
Product (gizmo, $19.99, gadgets, GizmoWorks).
Updates
The database maintains a current database state.
Updates to the data:
1) add a tuple 2) delete a tuple 3) modify an attribute in a tuple
Updates to the data happen very frequently.
Updates to the schema: relatively rare. Rather painful. Why?
From ODL to Relational Schema
Start simple: a class definition has only single valued attributes
Interface product{ float price; string name; Enum {telephony, gadgets, books} category}
Class becomes a relation, and every attribute becomes a relation attribute:
Name Price Category
Gizmo $19.99 gadgets
Product
Adding Non atomic Attributes
Name Currency Amount Category
Gizmo US$ 19.99 gadgets
Power Gizmo US$ 29.99 gadgets
Price is a record: {string currency, float amount}
Product
Set Attributes
Name SSN Phone Number
Fred 123-321-99 (201) 555-1234
Fred 123-321-99 (206) 572-4312Joe 909-438-44 (908) 464-0028Joe 909-438-44 (212) 555-4000
One option: have a tuple for every value in the set:
Disadvantages?
Modeling Collection Types
How can we model bags?
Lists?
Fixed length arrays?
The problem becomes even more significant if a class has several attributes that are set types? Question: how bad is the redundancy for n set type attributes, each with possibly up to m values?
Questions:
Modeling RelationshipsInterface Product { attribute string name;
attribute float price; relationship <Company> madeBy; }Interface Company {
attribute string name; attribute float stock-price; attribute string address;
} How do we incorporate the relationship madeBy into the schema?
Option #1 Name Price made-by-name made-by-stock-price made-by-address
Gizmo $19.99 gizmoWorks 0.0001$ Montezuma
What’s wrong?
HintInterface Product { attribute string name;
attribute float price; relationship <Company> madeBy; }Interface Company {
attribute string name; attribute float stock-price; attribute string address; relationship set <Product> makes; }
Better Solution
Name Price made-by-name
Gizmo $19.99 gizmoWorks
Product relation: (assume: name is a key for company)
Company relation:
Name Stock Price Address
Gizmo $0.00001 Montezuma
Additional Issues
1. What if there is no key?
2. What if the relationship is multi-valued?
3. How do we represent a relationship and its inverse?
From E/R Diagrams to Relational Schema
Easier than ODL
- relationships are already independent entities
- only atomic types exist in the E/R model.
Entity sets relations
Relationships relations
Special care for weak entity sets.
Entity Sets to Relations
Product
name category
price
Product:
Name Category Price
gizmo gadgets $19.99
Relationships to Relations
makes CompanyProduct
name category
Stock price
name
Relation Makes (watch out for attribute name conflicts)
Product-name Product-Category Company-name Starting-year
gizmo gadgets gizmoWorks 1963
Start Year
Handling Weak Entity Sets
UniversityTeam affiliation
numbersport name
Relation Team:
Sport Number University-name
mud wrestling 15 Montezuma State U.
- need all the attributes that contribute to the key of Team - don’t need a separate relation for Affiliation.
Modeling Subclass Structure
Product
Educational Product
Educ-softwareProduct
Software Product
ageGrouptopic
Platformsrequired memory
Educational-method
Option #1: the ODL Approach
4 tables: each object can only belong to a single class
Product(name, price, category, manufacturer)
EducationalProduct( name, price, category, manufacturer, ageGroup, topic)
SoftwareProduct( name, price, category, manufacturer, platforms, requiredMemory)
EducationalSoftwareProduct( name, price, category, manufacturer, ageGroup, topic, platforms, requiredMemory)
Option #2: the E/R Approach
Product(name, price, category, manufacturer)
EducationalProduct( name, ageGroup, topic)
SoftwareProduct( name, platforms, requiredMemory)
No need for a relation EducationalSoftwareProduct
Unless, it has a specialized attribute: EducationalSoftwareProduct(name, educational-method)
Option #3: The Null Value Approach
Have one table:
Product ( name, price, manufacturer, age-group, topic, platforms, required-memory, educational-method)
Some values in the table will be NULL, meaning that the attribute not make sense for the specific product.
How many more meanings will NULL have??