obj-reldbv2.ppt - rjl 020117 - # 1 database design &implementation n one thing is paramount in...

28
Obj-RelDBv2.ppt - RJL 020117 - # 1 Database Design &Implementation One thing is paramount in military, commercial or industrial applications: Never lose the content of an operational database. This requires persistence. Hybrid object-relational databases (ORDB’s) are one way to solve the problem of writing object-oriented applications with persistent data content. The COOL framework includes GEN which generates C/C++ code for a hybrid ORDB, and LCP which supports method delegation between prototype object instances by interpreting a database of function names and/or function pointers. Understanding ORDBs requires more details about database architecture (more slides).

Upload: russell-fletcher

Post on 26-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Obj-RelDBv2.ppt - RJL 020117 - # 1

Database Design &Implementation

One thing is paramount in military, commercial or industrial applications: Never lose the content of an operational database. This requires persistence.

Hybrid object-relational databases (ORDB’s) are one way to solve the problem of writing object-oriented applications with persistent data content.

– The COOL framework includes GEN which generates C/C++ code for a hybrid ORDB, and LCP which supports method delegation between prototype object instances by interpreting a database of function names and/or function pointers.

Understanding ORDBs requires more details about database architecture (more slides).

Obj-RelDBv2.ppt - RJL 020117 - # 2

Relational Databases An RBD is a set of ‘tuples’; each tuple represents a simple

object with scalar attributes. Tuples are stored externally as records in a file and viewed conceptually as rows of a table, or geometrically as points in a multi-typed coordinate space.

Complex structured data types (and object instances) are decomposed or ‘normalized’ into simple parts or Second Normal Form (2NF): (no structured attributes or repeating groups are allowed).

For maintenance and reliability reasons, the design is further normalized (3NF): (There are no redundant or indirectly computable field values and all properties are stored in only one place.) [Ref: Sanders Ch. 3 and Appndx A.]

Other database types include object-oriented OODB’s, and object-relational ORDB’s. (next slide)

Obj-RelDBv2.ppt - RJL 020117 - # 3

Composite pkeys in RDB’s

Every tuple must have a unique field (or set of fields) called its primary key (pkey) which uniquely identifies it.

A composite pkey for a child or component tuple is often built by concatenating multiple key fields from a chain of ancestors. This complicates pkey-to-fkey matching.

Example: Dept--->Course--->Section (ERD on next slide) – CS has Dept# = 91 and OOAD has course# = 91.522. – Almost every course has a section # 201, so 201 is only a unique

identifier within the child set of sections of a particular course, just as 522 is only a unique course# within a particular Department.

– (In my syllabus I renamed this 01f522 - Dept 91 is assumed. 01f adds a new ‘term=Fall 2001’ component to this identifier. I teach only one section of CS Dept courses over multiple terms.)

Obj-RelDBv2.ppt - RJL 020117 - # 4

Composite Pkeys (Example)

Example: CS Dept View of SIS Database ERD: The unique pkey which selects my section of OOAD in the

Student Information System (SIS) Database is a composite of Dept, Course and Section number: 91.522.201.

Departmentpkey: 91

Course pkey: 91+522

Sectionpkey: 91+522+201

(This is an’instance diagram’, not an ERD. It shows field values in a single table row, whereas an ERD shows only entity types.)

Obj-RelDBv2.ppt - RJL 020117 - # 5

Surrogate Keys in RDB’s

The unique pkey which selects this section of OOAD in the Student Information System (SIS) Database is a composite of Dept, Course and Sectio number: 91.522.201.

For IBM’s RDB, EFCodd advocated a hidden ‘surrogate’ pkey to replace the user-defined composite keys. This improves code quality and performance (by expediting the fundamental RDB operation ‘join’: match pkeys to fkeys).

Example: Entity with old and new key name and value: Entity: alternate (old pkey): surrogate (name = value): Dept deptNo = 91 DEid = DE000001 Course courseNo = 91+522 COid = CO000220 Section sectNo= 91+522+201 SEid = SE002601

Obj-RelDBv2.ppt - RJL 020117 - # 6

RDB with Surrogate pkeys: A GEN Example: CS Dept View of SIS Database ERD: Entity: alternate (old pkey): surrogate (name = value): Dept deptNo = 91 DEid = DE000001 Course courseNo = 91+522 COid = CO000220 Section sectNo= 91+522+201 SEid = SE002601

Department DEDE00000191 Course CO

CO000220DE000001522

Section SESE002601CO000220201

A Persistence Requirement (WHY?): Each table has a mnemonic abbreviation (DE,CO,SE) encoded into the pkey value of its objects.

(Note that the fkey only references the immediate ancestor or container of an object or tuple.)

Obj-RelDBv2.ppt - RJL 020117 - # 7

Surrogate Keys in COOL/GEN

GEN uses surrogate pkeys and matching fkeys, but does not hide them. (OK for CAD/CASE tools with hi-tech users.)

Pkeys can never be re-used for new objects, as long as fkeys exist that can reference their former object (in old but still-in-use database versions).

Obj-RelDBv2.ppt - RJL 020117 - # 8

Persistent Object Identifiers

C++ and Java objects have an object-id (oid), typically represented by its virtual memory address. This oid corresponds at least conceptually to the pkey of an RDB tuple. This type of oid is not visible and not persistent, because it disappears when the program terminates.

One way to avoid loss of information and achieve persistence is to have the RDBMS take over or duplicate OS memory-mapping functions: moving large segments of virtual memory to/from mass storage in a fail-safe manner.

Another way to achieve persistence is to convert pkey/fkey relationships to/from object references during import/export data flows. (This is done by COOL/GEN.)

Obj-RelDBv2.ppt - RJL 020117 - # 9

Persistent Databases Persistence means that pkeys and fkeys are preserved

during export to mass storage or remote sites and re-import by the same or another DataBase Management System (DBMS)

A relational database (RBD) supports inter-object relationships by foreign key (fkey) fields. These are both user-visible and persistent: they get saved in mass storage if the program terminates.

The process of mapping RDB pkey-fkey associations to and from C++ pointers is called ‘pointer swizzling’.

Database in Main Memory

Databasein MassStorage

import

export

Obj-RelDBv2.ppt - RJL 020117 - # 10

Referential Integrity

The principle of ’Referential Integrity’: – To maintain valid database content, all fkey values

must match the unique primary key or object identifier of another tuple, or else have the reserved ‘null’ (unknown or undefined) value.

N-ary relations (N-way associations) can be implemented by a new associative entity, whose tuples contain exactly N fkeys (plus optional non-key attributes).

Most relations are binary (N = 2). Note that fkeys may refer to the same or different types.

Example: see next slide

Obj-RelDBv2.ppt - RJL 020117 - # 11

N-ary Relation (ERD Styles) N-ary relations are many-to-many associations among N

object instances (of the same or different types). N-way associations can be implemented by introducing a new

associative entity, whose tuples contain exactly N fkeys (plus optional non-key attributes). (Most relations are binary: N = 2).

The diamond indicats a ternary relation among types AA, BB and CC. [It is superfluous if N=2, if the relation is one to many, or if an associative entity replaces it.]

Optional attributes

AA

BB

CC

AA BB CC

AABBCCNew Entity AABBCC gives these atributes a home, and replaces the diamond.

(3 fkeys inside)

Examplefor N=3:

Obj-RelDBv2.ppt - RJL 020117 - # 12

Extended ER Diagrams When an RDB implements an Extended ERD (EERD), a

tuple’s fkeys or inter-object cross-references can identify either a super-class object or an associated parent or container object (instance of a class).

– Both types of fkeys share the same integer key value range, although they have distinct semantic meaning.

To improve readability, EERD’s should use different styles for inheritance than for instance-level associations

AA BB

CC0..*

In this example, CC both inherits from AA and is a component of thecomposite entity BB.It contains two fkeys,(say) AAid and BBid.

Obj-RelDBv2.ppt - RJL 020117 - # 13

Multiple Inheritance on EERD’s Multiple inheritance requires an fkey to each superclass

object whose properties (atttributes or methods) are inherited.

In a prototype implementation of multiple inheritance, superclass object[s] actually exist apart from their corresponding subclass object[s]. Each sub-object has fkeys to each of its direct ancestor objects.

For a C or C++ implementation, only one of possibly divergent inheritance hierarchies can be mapped into pre-compiled method inheritance. Avoid divergence if possible!

For an ORDB, fkeys also support dynamic mapping of method inheritance. The COOL/LCP interpreter implements such a dynamic map (from a concrete object to its generic Active Instance, from object class to generic Active Class).

Obj-RelDBv2.ppt - RJL 020117 - # 14

ORDB via Prototype Delegation An Extended ERD (EERD) can be implemented as either a

relational RDB, object-oriented OODB, or object-relational ORDB. An OODB is supported by its own class-based data representations.

An ORDB can be class-based or prototype-based with delegation. (GEN is prototype-based.)

Prototype delegation does not rely on Class membership for method inheritance - it creates object-level relationships to support method delegation: ANY client object can ‘delegate’ any of its behavior to another server object via the oid equivalent of an fkey.

To make disciplined use of delegation requires some policy other than anarchy.

Obj-RelDBv2.ppt - RJL 020117 - # 15

GEN Database: Persistence

Our GEN tool imports an external RDB to a memory-resident object-relational database (ORDB):

Its external persistent RDB format is a union of records representing tuples of different types.

During import, fkeys are augmented or replaced by parent and first-child and next-sibling object reference pointers, which follow strict GEN naming conventions.

During export, pointers are removed but fkeys are preserved or restored for persistent storage in external RDB tuples.

Obj-RelDBv2.ppt - RJL 020117 - # 16

GEN Database: Schema Constraints

The external RDB schema (or EER Diagram) is first converted to Third Normal Form.

Other attributes that would normally comprise a user-defined (and typically composite) primary key can be removed during schema or EERD conversion to Third Normal Form.

This eliminates redundant attributes that functionally depend on some fkey instead of the pkey attribute.

Obj-RelDBv2.ppt - RJL 020117 - # 17

GEN Database: External Format

Our GEN tool imports an external RDB to a memory-resident object-relational database (ORDB):

Its external RDB format is a union of records representing tuples of different types:

Every tuple record has an integral and immutable ‘surrogate’ primary key attribute (and object id).

Different tuple types have pairwise disjoint pkey ranges. All foreign keys (fkeys) use this surrogate pkey value to

refer to their parent (container or superclass) record type.

Obj-RelDBv2.ppt - RJL 020117 - # 18

GEN Database: Internal Format During import, fkeys are augmented or replaced by direct

parent object pointers plus first-child and next-sibling object reference pointers. These are constructed from fkey names following strict GEN naming conventions.

This results in an internal ORDB format which is a set of multiply-threaded linked lists of parent-to-children and super-to-subclass object (tuple instance) reference pointers.

Parent-pointers support direct access to parent table attributes, replacing pair-wise join queries in an RDB.

For each 1-to-many parent-child relationship, chgen provides a child_loop macro while gencpp provides a for-each iterator.

Obj-RelDBv2.ppt - RJL 020117 - # 19

GEN Database: Import/ExportGEN creates two schema-based import/export utilities: pr_load parses tuples and imports an external RDB into a

memory-resident object-relational database (ORDB); pr_dump exports the modified ORDB back to the persistent

external RDB. During import, fkeys are augmented or replaced by direct

parent pointers plus first-child and next-sibling object reference pointers. These are constructed from fkey names following strict GEN naming conventions, Super- and sub-class objects are also connected in the same way.

This results in an internal ORDB format which is a set of multiply-threaded linked lists from each parent through each of its child-sets, that supports parent-child JOINs.

Obj-RelDBv2.ppt - RJL 020117 - # 20

Importing RDB’s to C++/Java

If the RDB is imported to an object-relational database implemented in C++ or Java, then during import the fkey fields of RDB tuple types should be converted to corresponding C++/Java object reference types.

Caveat/pre-condition: All fkeys implied by links on the RDB’s data model or EERD must conform to inheritance and type constraints of the language (C++ or Java).

Fkeys in an RDB can also support non-exhaustive or over-lapping subclasses (going beyond C++ constraints).

Fkeys and object references can also support dynamic migration (of an object among the subclasses of its class).– Example: An object may make transitions among OLC

states (states become subclasses of the object’s class).

Obj-RelDBv2.ppt - RJL 020117 - # 21

Object-Relational Databases - Prototypes and Delegation

The last few slides were inspired by Shlaer-Mellor-User Group email related to Divergent Inheritance (parallel hierarchies). This motivates the use of prototypes and delegation to explain the static information architecture that is supported by COOL’s chGEN/GENcpp code generator, and illustrates concurrent sub-state machine models for dynamic behavior.

– To: [email protected]– Subject: Re: (SMU) Polymorphic events and other

paranormal activity– Message 10/734 From [email protected] – Sep 04, 01 08:45:33 AM– responding to Fontana: . . .

Obj-RelDBv2.ppt - RJL 020117 - # 22

Divergent Hierarchies responding to Fontana: > I think Jay was driving at divergent hierarchies, not multiple

inheritance, eg: > relationship S1 - supertype Dog, subtypes BigDog and SmallDog > relationship S2 - supertype Dog, subtypes BlackDog and WhiteDog

Relationship S1:(BLACK xor WHITE)(Mutex and exhaustive):

DOG CLASS

Relationship S2: (BIG xor SMALL)

(Mutex and exhaustive):

Black Dog White Dog Big Dog Small Dog

Divergent Hierarchies Example: > relationship S1 - supertype Dog, subtypes BigDog xor SmallDog> relationship S2 - supertype Dog, subtypes BlackDog xor WhiteDog

Obj-RelDBv2.ppt - RJL 020117 - # 23

OLC’s with Concurent Sub-states > Assume each of the 4 subclasses has its own ‘object lifecycle’ (OLC): > BigDog: Woofing <--> Sleeping > SmallDog: Yipping <--> Skittering > BlackDog: Panting <--> Drooling > WhiteDog: Shedding <--> Scratching > Now create one instance of Dog - let's say it is a big black dog, with a > dogId = 13. It must be in one of the BigDog states (Woofing or Sleep-

> ing), > AND in one of the BlackDog states (Panting or Drooling).

Dog #13 (Big and Black):Woofing

SleepingPanting Drooling

Big:

Black:

Obj-RelDBv2.ppt - RJL 020117 - # 24

Merging OLC Behaviors of Concurrent Subclases:

Each of the 4 subclasses has its own ‘object lifecycle’ (OLC); E.g. every Big&Black Dog must be in one of the BigDog states (Woofing

or Sleeping), AND in one of the BlackDog states (Panting or Drooling). Dog #13 (Big and Black) has the behavior/activity of both BigDogs and

BlackDogs:

Woofing

Sleeping

Panting Drooling

Big Dog OLC:

Black Dog OLC:

Woof&Pant

Woof&Drool

Sleep&Pant

Sleep&Drool

Obj-RelDBv2.ppt - RJL 020117 - # 25

Divergent Hierarchies - revisited (1)

C++ does not support divergent class hierarchies. One alternate is prototype objects with delegation. RDB’s can support prototypes and delegation: In our example, each dog object belongs to one subclass for color,

and simultaneously to another subclass for size. That is, a ‘real’ dog object simultaneously belongs to, and inherits

from, exactly one of the subclasses in each inheritance tree above. The next slide shows (by its messiness) that multiple inheritance is

best avoided.

Partition S1:(BLACK xor WHITE)(Mutex and exhaustive):

DOG CLASSPartition S2:

(BIG xor SMALL) (Mutex and exhaustive):

Black Dog White Dog Big Dog Small Dog

Obj-RelDBv2.ppt - RJL 020117 - # 26

Divergent Hierarchies - revisited (2)

Level 3 includes concrete ‘leaf’ objects or ‘real’ dogs, which simultaneously belong to a distinct pair of subclasses at level 2 of the inheritance tree (compositional inheritance of properties).

So there are really 4 leaf classes at level 3, below level 2 above. Each leaf class instance at level 3 has exactly two paths up to level

1; both paths must end up at the same root object (Dog instance).

Partition S1:(BLACK xor WHITE)(Mutex and exhaustive):

DOG CLASSPartition S2:

(BIG xor SMALL) (Mutex and exhaustive):

Black Dog White Dog Big Dog Small Dog

Big Black Dog Big White Dog Small Black Dog Small White Dog

Obj-RelDBv2.ppt - RJL 020117 - # 27

Composition or Implementation Inheritance

With compositional inheritance, dogs will inherit from two ‘component’ classes: Color and Size.

This is ‘impure’ multiple inheritance in C++ ( impure because the two ancestor classes have nothing in common with animals, which may not behave well as clients of Color or Size ancestor methods).

Java does not have multiple inheritance - but any class may ‘implement’ the interfaces Color’ and ‘Size’ instead.

Dogs must then be eligible to inherit (C++) or implement (Java) all the methods of the Color and Size classes - an undesirable compromise. Over-riding only hides the mis-match between class Dog and Color or Size classes.

Obj-RelDBv2.ppt - RJL 020117 - # 28

References

Frank & Ulrich: ”Delegation: An Important Concept for the Appropriate Design of Object Models”, JOOP June 2000 (pp13-17, 44)

Eliens: Principles of OO Software Dev. 2ed., AWL 2000 (Sect. 5.4: Prototypes - delegation vs. inheritance)

Kilov/Ross: Information Models, PH 1994 (Not about delegation, but covers multiple/concurrent/overlapping subclass membership.)

Lee &Tepfenhart: UML and C++: A Practical Guide to OO Dev, 2ed, PH 2001(pp206-210) (Multiple Inheritance examples Fig. 12-4,12-5)

Sanders: Data Modeling, Boyd-Fraser/ITP 1995 (Ch. 3 and Appendix)