systematic approach for information systems reengineering shi-ming huang

45
Systematic Approach for Systematic Approach for Information Systems Information Systems Reengineering Reengineering Shi-Ming Huang

Post on 21-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Systematic Approach for Systematic Approach for Information Systems Information Systems

ReengineeringReengineering

Shi-Ming Huang

ReferencesReferences

T. Cheung, J. Fong, and B. Siu, “Database Reengineering and Interoperability”, Plenum, 1995, ISBN 0-306-45288-X

R.S. Arnold, “Software Reengineering”, IEEE Press 1993, ISBN 0-8186-3272-0

Fong and S. Huang, “Information Systems Reengineering”, Springer Verlag, 1997, ISBN 981-3083-15-8

DataBase Reengineering

DatabaseRe-engineering

ProgramConversion

SchemaTranslation

DataConversion

DirectTranslation

IndirectTranslation

Rewrite

Bridge Program

Emulation

Decompilation

Co-existence

PhysicalConversion

LogicalConversion

Bridge Program

Database Reengineering- Database Reengineering- Schema TranslationSchema Translation Direct translation –

One can directly translate a nonrelational schema to relational schema. However, such translation may cause loss of information because of its primitive method that cannot recover all the original nonrelational schema semantics. Certain advanced semantics are lost once they are mapped from a conceptual schema such as an entity-relationship model to a logical schema such as hierarchical or network schema. Thus, users input are needed to recover the lost semantics.

Database Reengineering- Database Reengineering- Schema TranslationSchema Translation Indirect translation –

Indirect translations can be accomplished by mapping logical hierarchical or network schema into a conceptual entity-relationship model schema in a reverse engineering. The translated conceptual schema must have all the original logical schema semantics. Users must provide information of advanced semantics in the logical schema. Then the conceptual schema can be automatically mapped to another logical relational schema. Similarly, in order to translate a relational schema to an object-oriented schema, we can map the entity-relationship model, a conceptual model for relational schema, to OMT, a conceptual model for object-oriented model in a peer-to-peer translation. Then the OMT model can be mapped automatically to an object-oriented model(database).

Database Reengineering-Data Database Reengineering-Data ConversionConversion Physical conversion –

The physical data of the nonrelational database is directly converted to the physical data of the relational database. This can be done in interpreter approach or generator approach. The former is a direct translation from a data item to another. The latter is to provide a generator that generates program to accomplish the physical data conversion.

Logical conversion – The logical approach is to unload the nonrelational database to sequential file in the logical sequence similar to the relational model. The sequential files can then be reloaded back to a target relational database. This approach concerns with the logical sequence of the data rather the physical attributes of each data item.

Bridge Program – Each nonrelational file requires a bridge program to convert it to a relational file.

Database Reengineering-Database Reengineering-Database Program Translation Database Program Translation Rewrite Rewrite ––

One can translate the nonrelational schema into relational, map a nonrelational database into a relational database, and rewrite all the application programs to run on the relational database.

Bridge programBridge program – – One can map the nonrelational schema into a relational schema, and then add relational interface software layer on the top of nonrelational DBMS. The relational interface layer translate the relational program DML into nonrelational program DML statements to access an existing nonrelational database. The relational interface is transparent to the users as a relation DBMS, but actually, the physical database is still nonrelational.

Database Reengineering-Database Reengineering-Database Program Translation Database Program Translation EmulationEmulation – –

It is the technique of providing software or firmware in the target system which map source program commands into functionally equivalent commands in the target system. Each nonrelational DML is substituted by relational DML statements to access the converted relational database.

DecompilationDecompilation – – One can translate schema from nonrelational to relational, convert data from nonrelational to relational, and then convert application programs from nonrelational to relational by decompilation. Decompilation is the process of transforming a program written in a low level language into an equivalent but more abstract version and the implementation of the new programs to meet the new environmental, database files and DBMS requirements.

Data ModelData Model A data model is a general structure for data A data model is a general structure for data

organization.organization. It enables us to capture, partially, the meaning of It enables us to capture, partially, the meaning of

data as related to the complete meaning of the data as related to the complete meaning of the world. world.

It is the primary tool for designing a database. It is the primary tool for designing a database. The basic components of such a data model include:The basic components of such a data model include:

1.1. a set of rules (i.e. schema description) to describe a set of rules (i.e. schema description) to describe the structure and meaning of data in a database the structure and meaning of data in a database

2.2. and the atomic operations (i.e. data language) that and the atomic operations (i.e. data language) that may be performed on the data in that database.may be performed on the data in that database.

Data ModelData Model Schema is a term used to represent the name of a Schema is a term used to represent the name of a

class together with the properties of that class. class together with the properties of that class.

The schema description includes two parts: The schema description includes two parts: 1.1. one is the structure specification part which one is the structure specification part which

represents objects, attributes, and the relationship represents objects, attributes, and the relationship between objects; between objects;

2.2. the other is rule specification for the inferences and the other is rule specification for the inferences and constraints.constraints.

Data Model: Hierarchy Data Data Model: Hierarchy Data ModelModel There is a set of relationships connecting all record types in onThere is a set of relationships connecting all record types in on

e data structure diagram.e data structure diagram. The relationships expressed in the data structure diagram form The relationships expressed in the data structure diagram form

a tree with all edges pointing towards the leaves.a tree with all edges pointing towards the leaves. Each relationship is 1:n and it is total. That is, if Ri is the parent Each relationship is 1:n and it is total. That is, if Ri is the parent

of Rj in the hierarchy then for every record occurrence of Rj theof Rj in the hierarchy then for every record occurrence of Rj there is exactly one Ri record connected to it.re is exactly one Ri record connected to it.

The linkage between record types is in automatic fixed set meThe linkage between record types is in automatic fixed set membership. mbership.

The database access path of hierarchical database follows the The database access path of hierarchical database follows the hierarchical path from parent to child record. The default path ihierarchical path from parent to child record. The default path is a hierarchical sequence of top-to-bottom, left-to-right and frons a hierarchical sequence of top-to-bottom, left-to-right and front-to-back.t-to-back.

Data Model: Hierarchy Data Data Model: Hierarchy Data ModelModel

9LoanContracts 1

14

10LoanDrawdown 2

11LoanInterest 3

7LoanRepayment 6

15LoanBalence 8

12FixedRate 4

13IndexRate 5

Hierarchical Data Manipulation Hierarchical Data Manipulation LanguageLanguage

Hierarchical data manipulation language(HDML) is a recoHierarchical data manipulation language(HDML) is a record-at-a-time language for manipulating hierarchical databrd-at-a-time language for manipulating hierarchical databases. ases.

The commands of a HDML must be embedded in a generThe commands of a HDML must be embedded in a general-purpose programming language, called host language.al-purpose programming language, called host language.

Hierarchical Data Manipulation Hierarchical Data Manipulation LanguageLanguage The followings are the syntax of a hierarchical DML of IMS (InformThe followings are the syntax of a hierarchical DML of IMS (Inform

ation Management System, a hierarchical DBMS). There are four ation Management System, a hierarchical DBMS). There are four parameters in IMS DML. They are:parameters in IMS DML. They are:

Function Code, which defines the database access function; Function Code, which defines the database access function; Program Control Block, which defines the external subschema access Program Control Block, which defines the external subschema access

path; path; I-O-Area, which is a target segment address; and I-O-Area, which is a target segment address; and Segment Search Argument, which defines the target segment Segment Search Argument, which defines the target segment

selection criteria as follows:selection criteria as follows:

CALL BLTDLI” USING FUNCTION-CODECALL BLTDLI” USING FUNCTION-CODE PCB-MASKPCB-MASK I-O-AREAI-O-AREA SSA-1 …SSA-1 …

SSA-n.SSA-n.

Hierarchical Data Manipulation Hierarchical Data Manipulation LanguageLanguage

Retrieval Command: Modification Commands:

1. Get Unique (GU) 2. Get Next (GN) 3. Get Next WITHIN PARE

NT(GNP)

1. INSERT(ISRT)2. REPLACE(REPL) 3. DELETE (DELT)

Example:CALL BLTDLI” USING GU PCB-MASK I-O-AREA LOAN_CONTRACT# = 277988. BALANCE_DATE = 19960722. BALANCE_AMOUNT = 1000000.CALL BLTDLI” USING ISRT PCB-MASK LOAN_BALANCE.

NETWORK (Codasyl) MODNETWORK (Codasyl) MODEL EL SYSTEM

Course StudentDepartment

Course#coure-location

student#s-name

Prerequisite

inst-nameinst-addr

Prerequisite#prerequisite-title

grade

section#

set set set

set set

set

set

Section

setInstructor

dept#dept-name

Grade

set

NETWORK (Codasyl) MODNETWORK (Codasyl) MODEL EL Date ItemDate Item – –

It is an occurrence of the smallest unit of named It is an occurrence of the smallest unit of named data. It is represented in the database by a value. data. It is represented in the database by a value. A data item may be used to build other more A data item may be used to build other more complicated data constructs. This corresponds to complicated data constructs. This corresponds to an attribute in the ER data model.an attribute in the ER data model.

Data AggregationData Aggregation – –

It is an occurrence of a named collection of data It is an occurrence of a named collection of data items within a record. items within a record.

NETWORK (Codasyl) MODNETWORK (Codasyl) MODEL EL RecordRecord - - It is an occurrence of a named collection of data It is an occurrence of a named collection of data

items or data aggregates. This collection is in conformity with items or data aggregates. This collection is in conformity with the record type definition specified in the database schema. the record type definition specified in the database schema.

SetSet - - It is an occurrence of a named collection of records. A It is an occurrence of a named collection of records. A set occurrence is in conformity with the set type definition set occurrence is in conformity with the set type definition specified in the database schema. Each set type consists of specified in the database schema. Each set type consists of one owner record type and at least one member record type.one owner record type and at least one member record type.

AreaArea - - The notion of an area is used to identify the partition of The notion of an area is used to identify the partition of record occurrences. An area is a named collection of records record occurrences. An area is a named collection of records which need not preserve owner-member relationships. An area which need not preserve owner-member relationships. An area may contain occurrences of one or more record types and a may contain occurrences of one or more record types and a record type may have occurrences in more than one area.record type may have occurrences in more than one area.

Relational Model A Publishing Company Relational Database Schema:

au_id (FK)title_id (FK)

au_ordroyaltyper

titleauthor

title_id

titletypepub_id(FK)priceadvanceroyaltyytd_salesnotespubdate

titles

au_id

au_lnameau_fnamephoneaddresscitystatezipcntract

authors

pub_id

pub_namecitystatecountry

publishers

pub_id(FK)

logopr_info

pub_info

emp_id

fnameminitlnamejob_id(FK)job_lvlpub_id(FK)hire_date

employee

job_id

job_descmin_lvlmax_lvl

jobs

stor_id

stor_namestor_addresscitystatezip

stores

stor_id(FK)ord_num

ord_dateqtypaytermstitle_id(FK)

sales

discounttypestor_id(FK)lowqtyhighqtydiscount

discounts

title_id(FK)lorangehirangeroyalty

roysched

pub_id

pub_id

title_id title_id

pub_id

pub_id

job_id job_id

stor_id job_id

job_id

job_id

title_idtitle_id

title_id title_id

au_id

au_id

pub_id

pub_id

Primary key

BForeign key

(B refer to A)A

Relational Model Relational model is a logical schema in the form of tables (reRelational model is a logical schema in the form of tables (re

lations) corresponding to the representation of an entity typlations) corresponding to the representation of an entity type. e.

A column(attribute) of the table represents the extension of A column(attribute) of the table represents the extension of an attribute in the entity. an attribute in the entity.

A row(tuple) of the table represents an instance of the entity. A row(tuple) of the table represents an instance of the entity. Such table is commonly called a record type and consists of Such table is commonly called a record type and consists of

a primary key as an attribute of non-null value that can uniqua primary key as an attribute of non-null value that can uniquely identify a tuple.ely identify a tuple.

The parent child relationship of relations are represented in tThe parent child relationship of relations are represented in the foreign key residing in the child relation referencing the phe foreign key residing in the child relation referencing the primary key of parent relation.rimary key of parent relation.

OBJECT-ORIENTED ModelOBJECT-ORIENTED Model

Dept# Dept-name hire

Inst-name Inst-addr

..... ......

..... ......

Department

OID Inst-name Inst-addr hired-by

xxx John Doe 1 Main St, HK

Class Instructor

OID Dep# Dept-name hire

yyy D01 Marketing

Class Department

OID

zzz

Class defining object

OIDs of Instructor

OBJECT-ORIENTED ModelOBJECT-ORIENTED Model an object is an instance value of a class. A collection of similar objects an object is an instance value of a class. A collection of similar objects

forms a class. A class has attributes and methods. The attributes of a cforms a class. A class has attributes and methods. The attributes of a class describe its properties. The methods of a class describe its operatlass describe its properties. The methods of a class describe its operations.ions.

a class must support encapsulation (i.e. hiding operations from the usa class must support encapsulation (i.e. hiding operations from the uses) such that object = data + program es) such that object = data + program

data = values of attributes program = methods that operates on the stdata = values of attributes program = methods that operates on the stateate

object attributes can be either simple or complex. The value of a complobject attributes can be either simple or complex. The value of a complex attribute is a reference to the instance of another class. In other worex attribute is a reference to the instance of another class. In other words, an object can be a nested object such that the value of an object is ds, an object can be a nested object such that the value of an object is another object.another object.

Object attributes can be single-valued or mutli-valued.Object attributes can be single-valued or mutli-valued. Objects are uniquely identified by object identifier (OID) that are assignObjects are uniquely identified by object identifier (OID) that are assign

ed by the system.ed by the system.

Direct translation from a Direct translation from a Network Model to a Network Model to a Relational ModelRelational Model Step 1 Derive relationsStep 1 Derive relations

Map each Network record type to a relation in a one-to-one mMap each Network record type to a relation in a one-to-one manner.anner.

Step 2 Derive relation keysStep 2 Derive relation keys

Map each record key of a Network schema to a primary key in Map each record key of a Network schema to a primary key in a Relational table. However, if the existing Network record kea Relational table. However, if the existing Network record key is not unique, then it needs to concatenate with its owner rey is not unique, then it needs to concatenate with its owner record key in order to be mapped as a primary key. The owner rcord key in order to be mapped as a primary key. The owner record key is also mapped as a foreign key in the Relational taecord key is also mapped as a foreign key in the Relational table to link between the parent and child records. If the set meble to link between the parent and child records. If the set membership in the logical Network schema is manual, then its rembership in the logical Network schema is manual, then its record key of member record will be mapped as a candidate kecord key of member record will be mapped as a candidate key in the relational table to to link between the parent and child y in the relational table to to link between the parent and child records. For instance, Figure 3-1 is the network schema for a records. For instance, Figure 3-1 is the network schema for a US President.US President.

Direct translation from a Direct translation from a Network Model to a Network Model to a Relational ModelRelational Model

SYSTEM

sys

set

Plname , pfname , party, collg

Eyear ,winvotes

ADM# ,iny,inm,ind

sys

CNGR# ,HD,HR,SD,SR

sys

set

set

SNAME ,CAP,yad

sys

set

set

PRESIDENT (Plname, Pfname, Party, Collg, *Sname)ADMINISTRATION (Adm#, Iny, Inm, Ind, *Plname, *Pfname)STATE (Sname, Cap, Pln, Pfn, Adm#, Yad)ELECTION (Eyear, Winvotes, *Plname, *Pfname)LINK (*Plname, *Pfname, Cngr#)CONGRESS (Cngr#, Hd, Hr, Sd, Sr)

Direct translation from a Direct translation from a hierarchical model to a hierarchical model to a relational modelrelational model Step 1 Step 1 Derive relations:Derive relations:

Map each record type to a relation.Map each record type to a relation. Step 2 Derive relation keysStep 2 Derive relation keys

The record key of a hierarchical schema is mapped as a The record key of a hierarchical schema is mapped as a primary key of a relation. However, if the record type of primary key of a relation. However, if the record type of the hierarchical schema is a child record, then the primthe hierarchical schema is a child record, then the primary key is derived by concatenating with its parent recorary key is derived by concatenating with its parent record key. The parent record key is also mapped as a foreigd key. The parent record key is also mapped as a foreign key in the child relation (Quizon, 1990).n key in the child relation (Quizon, 1990).

Direct translation from a Direct translation from a hierarchical model to a hierarchical model to a relational modelrelational model

GAA

Hierarchcial schema

GAB GAC

acct#name

meter#billmonet_charge

=

Mapped relational schema

Relations:GAA ( acct#,name )GAB ( acct#,meter# )GAC ( acct#,billmo,net_charge)

Indirect translation from a Indirect translation from a network model to a relational network model to a relational modelmodel

Hierarchcialor network

schema

ConceptualERR Model

Relationalschema

ReverseEngineeringfrom logicalmodel toconceptualmodel

ForwardEngineeringfromconceptualmodel tological model

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 1 Derive implied relationships: The explicit semantic implies a 1:n relationship if there is one duplicate key in one record type, or 1:1 if there is a duplicate key found in the record on both sides of the relationships. User input is sought to confirm the existence of such a semantic.

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model

CUSTOMER LOAN

Non-relational record types with one duplicate key

Customer#(record key)Loan#(Duplicate key)

Loan#(record key)

Implied relationshipCustomer Loan N : 1

CUSTOMER LOAN

Non-relational record types with two duplicate keys

Customer#(record key)Loan#(Duplicate key)

Loan#(record key)Customer#(Duplicate key)

Implied relationshipCustomer Loan

1 : 1

Step 1 Derive implied relationships

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 2 Derive multiple (alternative) relationships

In a network schema, a set of circuit loopy record types may carry different navigational semantics. It is thus up to user to confirm the original database designer's idea on the function of alternative path. If the user confirms the existence of a navigational semantic. then the record types and Sets in the alternative path are mapped to different Network subschema (one subschema for each path) before translating to the Relational schema.

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 2 Derive multiple (alternative) relationships

SYSTEM

CITIES

set

ITEMS

STORES

set

set

setstorestore-address

item qty

citycity-headquarter

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 3 Derive unary relationships.

Record Employee

Dummy Record

set set

Network Schema

Entity Employee manages

n

1

Corresponding EER model

1

1

1

n

Figure Map unary 1:n relationship from network to EER model

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 4 Derive binary relationships

Figure Map 1:n and m:n relationship from network to EER model

RECORDEMPLOYEE

RECORDDEPARTMENT

set

RECORD QTY

Network Schema

RECORDSUPPLIER

set

RECORDPARTS

ENTITYDEPARTMENT

ENTITYEMPLOYEE

HAS 1 N

corresponding EER model

ENTITYSUPPLIER

ENTITYPARTS

SUPPLYQTY N N

corresponding EER model

set

Network Schema

1

N

1 1

N N

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 5 Derive entities of n-ary relationships

Skill-used

setset

SkillProjectEmployee

set

Project Skill

Employee

Text-book-used

m

n

n

m

mn

:

:

:

Network schema corresponding EER model

Figure Map n-ary relationship to EER model

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 6 Derive aggregation, generalization and categorization

Figure Map set of relationships to aggregation in EER model

RECORDSECTION

RECORDCLASS

set

RECORDLECTURER

set

Network Schema

N N

RECORDSTUDENT

set

ENTITYCLASS SECTION ENTITY

LECTURER

ATTENDED BY

ENTITYSTUDENT

N M

N

1

Translated EER model

1 1

1

N

Map is a relationship to overlap Map is a relationship to overlap generalizationgeneralization

Network schema

Person

set set

Employee Alumnus Student

set

EMPLOYEE

Alumnus

o

Employee

corresponding EER model

Employee-flagAlumnus-falgStudent-flag

Student

Map is a relationships to categorization Map is a relationships to categorization in EER modelin EER model

Owner

setset

CompanyPersonBank

set

Network schema

Owner

Person

u

Company

corresponding EER model

Bank

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 7 Derive entity keys and other constraints.

Customer

set

Loan

Collateral

set

Customer

Loan

Collateral

Customer#

Loan#

Collateral#

Record identifier(Customer#)

Record identifier(Loan#)

Record identifier(Collateral#)

Figure Map network schema with fully internally identifier to relational

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 7 Derive entity keys and other constraints.

Figure Map network schema with partially internally identifier to relational

Customer

set

Loan

Collateral

set

Customer

Loan

Collateral

Customer#

Loan#

Collateral#

Record identifier(Customer#)

Record identifier(Customer#, Loan#)

Record identifier(Customer#, Loan#, Collateral#)

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 7 Derive entity keys and other constraints.

Figure Map network schema with internally unidentified to relational

Customer

set

Loan

Collateral

set

Customer

Loan

Collateral

Customer#

Loan#

Collateral#

Record identifier(Customer#)

Record identifier(Customer#, Loan#)

Record identifier(Customer#, Loan#, Sequence#)

Figure network schema Figure network schema dependency relationship dependency relationship translationtranslation

MANUAL-OPTIONAL/MANUAL-FIXED/MANUAL-MANDATORY/AUTOMATIC-OPTIONAL

RecordA

SET AB

RecordB

a

b

Network Schema corresponding EER model

AUTOMATIC-FIXEDAUTOMATIC-MANDATORY

RecordA

SET AB

RecordB

a

b

FD: B.b -> A.a

ID: B.a A.aENTITY

A

ENTITYB

a

ba

R

corresponding EER model

ENTITYA

ENTITYB

a

ba

R

Reverse engineering from Reverse engineering from relational model to relational model to conceptual EER modelconceptual EER modelStep 1. Define each relation, key and field • Primary relation. These relations describe entities.• Primary relation - Type 1 (PR1). This is a relation whose primary

key does not contain a key of another relation.• Primary relation - Type 2 (PR2). This is a relation whose primary

key does contain a key of another relation.• Secondary relation. This is a relation whose primary key is full or

partially formed by concatenation of primary keys of other relations.

Reverse engineering from Reverse engineering from relational model to relational model to conceptual EER modelconceptual EER modelStep 1. Define each relation, key and field

• Secondary relation - Type 1 (SR1). If the key of the secondary relation is formed fully by concatenation of primary keys of primary relations, it is of Type 1 or SR1

• Secondary relation - Type 2 (SR2). Secondary relations that are not of Type 1

• Key attribute - Primary (KAP). This is an attribute in the primary key of a secondary relation that is also a key of some primary relation.

• Key attribute - General (KAG). These are all the other primary key attributes in a secondary relation that are not of the KAP type.

• Foreign key attribute (FKA). This is a non-primary key attribute of a primary relation that is a foreign key.

• Nonkey attribute (NKA). The rest of the non-primary-key attributes.

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 2. Map each PR1 into entity

Figure Map primary relations to entities

Department Prerequisite

Dept#Dept_name

Student Course

Pre#prer_title

Student#Student_name

Course#Course_Location

Step 3. Map each PR2 into weak entity.

Figure Map PR2 into EER model

Department Instructor

Dept#Dept_name

Dept#Inst_nameInst_addr

hire

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model

Student grade

Student#Student_name

Section#

Section

Step 4. Map SR1 into binary/n-ary relati

onship.

Figure Map SR1 into EER model

Step 5. Map SR2 into binary/n-ary relationship

Figure Map SR1 into EER model

Section

teach

Instructor

Dept#Inst_NameCourse#Section#

Dept#Inst _nameInst_addr

Course

has

Course#Course_Location

Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model

Step 6. Map each FKA into relationship.

Figure Map FKA into EER model

Course pre-course Prerequisite

Prer#Prer_title

Course#C ourse_Location