university of namur faculté d'informatique precise research center - database engineering...

57
University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group www.info.fundp.ac.be/libd PReCISE - A (sort of) spatio-temporal view of DB reverse engineering - Jean-Luc Hainaut February 5, 2014 Stevens Award lecture WCRE-CSMR 2014 Data matters most but where has all the semantics gone?

Upload: christine-blair

Post on 11-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

University of NamurFaculté d'informatique

PReCISE Research Center - Database Engineering Groupwww.info.fundp.ac.be/libd

PReCISE

- A (sort of) spatio-temporal view of DB reverse engineering -

Jean-Luc Hainaut

February 5, 2014 Stevens Award lecture WCRE-CSMR 2014

Data matters most

but where has all the semantics gone?

Page 2: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

2

• Introduction

• Understanding data semantics

• Data models

• Tracing data semantics

• Recovering hidden data semantic

• Is data semantics recovery that important, actually?

• Summary and conclusions

Page 3: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

3

Introduction

Page 4: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

4

1. To study the concept of data semantics in business applications

2. To identify and evaluate the techniques used to represent data semantics

3. To observe how these techniques have evolved in time and in different cultures.

4. To discuss the methods used to recover the semantics lost when poor representation techniques have been used.

Objectives of the lecture

Page 5: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

5

1. The database is a picture of the application domain

• Its schema is a model of the static structures of the domain

• Its data describe the current state (or suite thereof) of the domain

The role of data in business applications

2. The database is designed independently of the application programs

The database is designed before the application programs

3. The database schema evolution translates the evolution of the functional requirements

Axioms on databases

4. The database is described by (at least) two schemas:

• the conceptual schema: abstract, platform-independent

formalism: ER model, conceptual UML class diagrams

• the logical schema: concrete, platform-dependent

formalism: SQL2, Java classes

There exists a bidirectional mapping between both.

Page 6: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

6

1. The axioms often are ignored by developers

- ignore = how interesting! I didn't know them

- ignore = I know them but they do not suit my way of working

The role of data in business applications

3. The biggest violation of the axioms concern the existence and role of the conceptual schema

Meta-axioms on axioms on databases

Page 7: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

7

Understanding data semanticsExperimental approach and first conclusions

Page 8: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

8Preliminary question

C400B512S144

C1

Darwen Owens Garcia

C2

London NY Madrid

C3

T

C400B512S144

CustID

Darwen Owens Garcia

Name

London NY Madrid

City

CUSTOMER

C400 Darwen LondonB512 Owens NYS144 Garcia Madrid

C

T

To what extent does each of these data setsexpresses the semantics of data?

Same data, different structures

Page 9: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

9Motivating example. 1. Reading data from a COBOL file (1970)

application code (COBOL)

WORKING-STORAGE SECTION.01 CUSTOMER. 02 CustID PIC X(12). 02 Name PIC X(60). 02 City PIC X(40).

CustID

Name

City

CUSTOMER

external file

SELECT FILE1 ASSIGN TO "FILE1.DAT"ORGANIZATION IS INDEXEDACCESS MODE IS DYNAMICRECORD KEY IS RKEY.

FD FILE1.01 REC. 02 RKEY PIC X(12). 02 RINFO PIC X(100).

C400B512S144

RKEY

Darwen London Owens NY Garcia Madrid

RINFO

REC

REC

RKEYRINFO

CUSTOMER

CustIDNameCity

B512

CustID

Owens

Name

NY

City

CUSTOMER READ FILE1 INTO CUSTOMER.

Page 10: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

10Motivating example: 1. Reading data from a COBOL file (1970)

REC

RKEYRINFO

CUSTOMER

CustIDNameCity

Where has data semantics been defined?

• In file description (10%) - [unique key, key data type]

• In application code (93%).

10%93%

Page 11: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

11Motivating example. 2. Reading data from an RDB (1980+)

Relational DB

create table CUSTOMER( CustID char(12) not null, Name char(60) not null, City char(40) not null, primary key (CustID)).

CustID

Name

City

CUSTOMER

C400B512S144

CustID

Darwen Owens Garcia

Name

London NY Madrid

City

CUSTOMER

application code (C)

string v1;string v2;string v3;

v1

v2

v3

select * into v1,v2,v3 from CUSTOMER where CustID = 'B512'v1 v2 v3

B512 Owens NY

v1 CUSTOMER

CustIDNameCity

v2

v3

Page 12: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

12Motivating example: 2. Reading data from an RDB (1980+)

Where has data semantics been defined?

• In DB schema (100%)

• In application code (3%) - [data type].

v1 CUSTOMER

CustIDNameCity

v2

v3

3% 100%

Page 13: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

13What does data semantics mean?

A tentative practical definition

Data semantics is the knowledge defined by all the

non technical,

domain-dependent,

information

that allows us to understand, to use and to manage the data.

Page 14: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

14Where can we find traces of data semantics?

data

DB schema

Applicationprogram

in the application code (reading from file)

in the DB schema (reading from DB)

Page 15: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

15

1. Expressiveness: DDL is the most appropriate language to declare data structures and constraints

2. Language independence: DDL is independent of application programming languages

6. Stability. The schema must be changed only when the application domain evolve.

3. Uniqueness: the schema is unique and centralized

4. Integration with data: the schema is a part of the database (no risk to loose it!))

5. Program independence: the schema is independent of application programs

1. Expressiveness: DDL is the most appropriate language to declare data structures and constraints

A first (trivial) observation

2. Language independence: DDL is independent of application programming languages

6. Stability. The schema must be changed only when the application domain evolve.

3. Uniqueness: the schema is unique and centralized

4. Integration with data: the schema is a part of the database (no risk to loose it!))

5. Program independence: the schema is independent of application programs

It is best to express data semantics in the database schema

Page 16: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

16

Only data structures are explicit in application programs:

• record name

• field name

• field data type

However, things are not always that simple (e.g.,COBOL files)

Additional constraints generally are controlled by the application code:

• where?

• in which way?

• in all the modules processing the data?

Understanding data semantics by analyzing the program code can be much complex than expected.

Page 17: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

17

Only standard integrity constraints can be coded through the DDL (SQL2):

• not null

• uniqueness

• referential integrity

However, things are not always that simple (e.g., RDB)

Additional constraints must be coded through generic means:

• check predicates

• triggers

• store procedures

Understanding data semantics by reading the database schema can be less easy than

expected.

Page 18: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

18

Data models

Page 19: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

19Data models: abstraction hierarchy

Coding SQL-DDL code

Physical design

Logical design

Information analysis

Userrequirements

Conceptualschema

Logical (RDB)schema

Physical (DB2)schema

Reminder on the database design process - The standard view

Page 20: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

20999. Data semantics and data models

Conceptual models

• ER (*)• UML class diagrams

Logical models

• Record oriented models: • files • legacy DBMS (IMS, CODASYL) • RDB (*)

• Key-Value models: • NoSQL (*)• CSV

• Structured object models: • OO• NoSQL• Json (*)• XML

The way data semantics is expressed in a database depends on its data model

Page 21: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

21ER conceptual model

Abstract, platform-independent information description

The world is perceived as:- sets of entities,- properties that characterize entities- relationships holding between entities

A conceptual schema can be translated into several logical, DBMS-dependent, schemas

1-10-N place

ORDER

OrdIDDateOrdAccount

id: OrdID

CUSTOMER

CustIDNameCity

id: CustID

Page 22: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

22

data

metadata

Relational data model (schema-based, 1NF)

Examples: Oracle, DB2, SQL Server, MySQL, PostgreSQL, etc.

• Domain-dependent schema• Schema and data are hierarchically distinct• Values are aggregated into rows• The semantics is explicit in the schema (part of!)• The semantic is managed/controlled by the DBMS

C400B512S144

CustID

Darwen Owens Garcia

Name

London NY Madrid

City

-124 5509 0

Account

Page 23: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

23

meta-metadata

metadata

data

ENTITY

903179031790317903175973159731597315973166830668306683066830

ATTRIBUTE

CustID Name City Account CustID Name City Account CustID Name City Account

VALUE

C400 Darwen London -124 B512 Owens NY 5509 S144 Garcia Madrid 0

Key-Value data model (schema-less, triples, 1NF)

Examples: Oracle NoSQL, BerkeleyDB, Voldemort, Riak, Redis

• Domain-independent schema• Metadata mixed with data • Elementary Key-Value• The semantics is explicit in the data• The semantics is managed/controlled by application programs or middleware

Page 24: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

24

data

metadata

{"CustID": "C400", "Name": "Darwen","City": "London", "Account": 124} {"CustID": "B512", "Name": "Owens", "City": "NY", "Account": 5509} {"CustID": "S144", "Name": "Garcia", "City": "Madrid", "Account": 0}

903175973166830

meta-metadata

ENTITY ATTRIBUTES

Structured object data models (schema-less, NF2)

Examples: CouchDB, MongoDB (BSON), SimpleDB

• Domain-independent schema• Metadata mixed with data• Aggregated Key-Value into objects (here in Json) • The semantics is explicit in the data• The semantic is managed/controlled by application programs or middleware

Page 25: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

25

Tracing data semantics

Page 26: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

26In the real world, where is semantics expressed?

We have identified two places: DB schema and application code.

Are there other places?

Page 27: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

27Architectural framework

data

DB schema

Applicationprogram

O/RMapping

class schema

User interface- data structure- labels- help, error messages)

Application code- data structures- procedural code)

Class schema

DB logical schema- global schema- views

Data

Doc

Documentation (text, structured, ontology)

Object/Relational mapping

Page 28: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

28Semantics in the documentation

data

DB schema

Applicationprogram

O/RMapping

class schema

Doc

Documentation (text, structured, ontology)

Functional documentation (should include the conceptual schema)

Technical documentation (should include the logical schema)

Drawback the documentation often is

• obsolete, • incomplete, • inconsistent• missing

Page 29: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

298. Semantics in the DB schema

data

DB schema

Applicationprogram

O/RMapping

class schema

Doc

DB logical schema- global logical schema- views

The logical schema is DBMS-dependent.

It is a more or less faithful implementation of the conceptual schema.

Some views can be more detailed than the logical schema.

Drawbacks• not a conceptual schema• additional constraints not always trivial to

identify and to understand

Page 30: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

3010. Semantics in the class schema

data

DB schema

Applicationprogram

O/RMapping

class schema

Class schema

Doc

DB logical schema

T

Bidirectional relation/object transformation.

Solving the impedance mismatch problem

The class schema seen as the domain model.

It is implemented into a relational database, which ensures object persistence.

The DB schema itself is hidden and may bear little semantics.

Drawbacks• inappropriate formalism• poor change propagation mechanism (if any)• semantics in the application and not in the DB• data model not easily shared by several

applications

Page 31: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

3111. Semantics in the application code

data

DB schema

Applicationprogram

O/RMapping

class schema

Application code- data structures- procedural code

Doc

Internal data structures may be more explicit that theDB schema.

Data integrity constraints checked by the application code.

Understanding data semantics from the wayprograms process the data.

However, program analysis is far from trivial:• size (millions of LOC)• architectural complexity• algorithmic complexity• data flow complexity• creative data processing

Drawbacks• redundancies (a constraint may be checked in

many places)• distributed traces (potential inconsistencies)

Page 32: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

3212. Semantics in the GUI

data

DB schema

Applicationprogram

O/RMapping

class schema

User interface- data structure- labels- help, error messages)Doc

The UI often is a view on a part of the database.

This view is intended for users user friendly.

Provides useful hints about the constraints and meaning of data:

• data structure (data types, aggregates)

• explicit labels

• sample data

• informative help and error messages

Drawbacks• distributed control (potential inconsistencies)• does not cover all the database objects

Page 33: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

3313. Semantics in the data (record-oriented models)

data

DB schema

Applicationprogram

O/RMapping

class schema

Data

Doc

In standard models

Data analysis: finding relationships among data

• uniqueness

• data types

• inclusion properties (foreign keys)

• etc.

Main strategy• validating hypotheses

Page 34: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

3413. Semantics in the data (alternative models)

data

DB schema

Applicationprogram

O/RMapping

class schema

Data

Doc

In alternative (schema-less) models

Metadata extraction

But also data analysis as in standard models

Experience• none. Too new.

Page 35: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

35

Recovering hidden data semantics:database reverse engineering

Page 36: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

36

Definition

DB reverse engineering

Reverse engineering a piece of software consists, among others, in recovering or reconstructing its functional and technical specifications, starting mainly from the source text of the programs. Recovering these specifications is generally intended to redocument, convert, refactor, maintain or extend existing applications.

Database reverse engineering is that part of Information System Engineering that addresses the problems and techniques related to the recovery of the conceptual and logical schemas of files and databases of existing systems.

Page 37: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

37

DB reverse engineering methodology

DB reverse engineering

Full project

Pilote

Conceptualization

Logical extraction

Physical extraction

Sourcemanagement

Projectplanning

Conceptualschema

Logical (RDB)schema

Page 38: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

38

DB reverse engineering methodology

DB reverse engineering

Full project

Pilote

Conceptualization

Logical extraction

Physical extraction

Sourcemanagement

Projectplanning

Others

UI analysis

Class analysis

Prog. analysis

Data analysis

Sch. analysis

Normalization

Untranslation

De-optimization

Page 39: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

39

Is data semantics recovery that important, actually?

Page 40: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

40

Yes

Definitely!

Page 41: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

41Can you prove it? At least I can show you an example

Example: database application migration

Porting a complete existing application, or some of its components, on another, generally

more modern, platform.

For a database: changing its DMS. A popular example: migrating the legacy set of files of

a business application to a RDBMS.

Two main approaches :

• physical approach

• semantic approach

Page 42: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

42Physical database migration

Database migration

The physical, or one-to-one migration strategy is the cheapest but also the worst

approach since it deeply degrades the final structure.

Requires no knowledge on data semantics Very popular

Physicalextraction

Physical (file)schema

COBOL code SQL-DDL code

Coding

Physical (DB2)schemaTransform

Page 43: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

43Physical database migration

physical (one-to-one) migration

SELECT CLIENT ASSIGN TO "CUST.DAT"ORGANIZATION IS INDEXEDRECORD KEY IS CUST_ID.FD CUST-FILE.01 CUSTOMER. 02 CUST-ID PIC X(12). 02 CUST-INFO PIC X(80). 02 CUST-HIST PIC X(1000).

Create table CUSTOMER( CUST_ID char(12) not null, CUST_INFO char(80) not null, CUST_HIST char(1000) not null, primary key (CUST_ID));

=

=

CUSTOMER

CUST-ID: char (12)CUST-INFO: char (80)CUST-HIST: char (1000)

id: CUST-ID

CUSTOMER

CUST_ID: char (12)CUST_INFO: char (80)CUST_HIST: char (1000)

id: CUST_ID

no added value

Page 44: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

44Semantic database migration

Database migration

Semantic approach: based on an in-depth understanding of the semantics of source data.

Provides a high quality result. Strong basis for the future.

Requires a complete, up to date, knowledge of the DB

Physicalextraction

Physical (IDMS)schema

Logical (DBTG)schema

Conceptualschema

Logical extraction

Conceptual-ization

IDMS-DDL code SQL-DDL code

Coding

Physicaldesign

Logicaldesign

Logical (RDB)schema

Physical (DB2)schema

Conceptualschema

Reverse Engineering

COBOL code SQL-DDL code

Coding

Physicaldesign

Logicaldesign

Logical (RDB)schema

Physical (DB2)schema

Page 45: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

45Semantic database migration (1)

semantic migration (refinement)

SELECT CLIENT ASSIGN TO "CUST.DAT"ORGANIZATION IS INDEXEDRECORD KEY IS CUST_ID.FD CUST-FILE.01 CUSTOMER. 02 CUST-ID PIC X(12). 02 CUST-INFO PIC X(80). 02 CUST-HIST PIC X(1000).

+

CUSTOMERCUST-ID: char (12)CUST-INFO: compound (70)

NAME: char (20)ADDRESS: char (40)STATUS: char (10)

CUST-HIST-PURCH[0-100] array: compound (10)ITEM: num (5)TOTAL: num (5)

id: CUST-IDid(CUST-HIST-PURCH):

ITEM

1-10-100 record

CUSTOMER

CUST-ID: char (12)CUST-INFO: compound (70)

NAME: char (20)ADDRESS: char (40)STATUS: char (10)

id: CUST-ID

CUST-HIST-PURCH

Index: index (4)ITEM: num (5)TOTAL: num (5)id: record.CUSTOMER

ITEMid': record.CUSTOMER

Index

Page 46: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

46Semantic database migration (2)

semantic migration (SQL translation)

1-10-100 record

CUSTOMER

CUST-ID: char (12)CUST-INFO: compound (70)

NAME: char (20)ADDRESS: char (40)STATUS: char (10)

id: CUST-ID

CUST-HIST-PURCH

ITEM: num (5)Index: index (4)TOTAL: num (5)id: record.CUSTOMER

ITEMid': record.CUSTOMER

Index

No more than 100 CUST_HIST_PURCHper CUSTOMER

CUSTOMER

CUST_IDCUS_NAMECUS_ADDRESSCUS_STATUS

id: CUST_ID

CUST_HIST_PURCH

CUST_IDITEMCINDEXTOTALid: CUST_ID

ITEMid': CUST_ID

CINDEXref: CUST_ID

Create table CUSTOMER( CUST_ID char(12) not null, CUST_NAME char(28) not null, CUST_ADDRESS char(60) not null, CUST_STATUS char(2) not null, primary key (CUST_ID));

Create table CUST_HIST_PURCH( CUST_ID char(12) not null, ITEM char(10) not null, CINDEX smallint not null check(CINDEX <= 100), TOTAL smallint not null, primary key (CUST_ID,ITEM), unique (CUST_ID,CINDEX), foreign key (CUST_ID) reference CUSTOMER);

Normalized DB

Page 47: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

47Database migration - Synthesis

Create table CUSTOMER( CUST_ID char(12) not null, CUST_NAME char(28) not null, CUST_ADDRESS char(60) not null, CUST_STATUS char(2) not null, primary key (CUST_ID));

Create table CUST_HIST_PURCH( CUST_ID char(12) not null, ITEM char(10) not null, CINDEX smallint not null check(CINDEX <= 100), TOTAL smallint not null, primary key (CUST_ID,ITEM), unique (CUST_ID,CINDEX), foreign key (CUST_ID) reference CUSTOMER);

Create table CUSTOMER( CUST_ID char(12) not null, CUST_INFO char(80) not null, CUST_HIST char(1000) not null, primary key (CUST_ID));

physical migration

semantic migration

Page 48: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

48Evolution

new application: compute total sales per item

CUSTOMER

CUST-ID: char (12)CUST-INFO: char (80)CUST-HIST: char (1000)

id: CUST-ID

?

• where is the required information?

• how to extract it from the CUSTOMER table?

• who will develop the (C, Java, VB) program?

• … and when?

Select ITEM, sum(TOTAL)from CUST_HIST_PURCHgroup by ITEM;

• clearly visible + documentation if needed

• just name the columns

• by any non expert

• immediately, 2 minutes

CUST_HIST_PURCH

CUST_IDITEMCINDEXTOTALid: CUST_ID

ITEMid': CUST_ID

CINDEXref: CUST_ID

CUSTOMER

CUST_IDCUS_NAMECUS_ADDRESSCUS_STATUS

id: CUST_ID

Page 49: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

49

Summary and conclusions

Page 50: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

50

• Theories (e.g., text books) teach that the conceptual schema must be the unique expression of data semantics. In an ideal world, the conceptual schema exists, and all the other artefacts (DB schemas, UML diagrams, views, class schema, programs, UI) derive from it and capture each a part of this semantics.

Some mundane observations

• Identifying, extracting, understanding and merging these traces to rebuilt the conceptual schema are the very goals of database reverse engineering.

• However, the real world doesn't learn from theories. Most often, the conceptual schema does not exist so that only the other artefacts bear traces of the data semantics.

Page 51: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

51Cultural aspects of data semantics expression

1. Small personal application

Mainly non-professional developers. Intuitive, bottom-up, incremental development. Weak culture in DB.

Data semantics: in the UI, in application code

2. Database (record-oriented) data-intensive processing

Professional developers. Disciplined, top-down development. Strong culture in DB.

Data semantics: in the DB schema (including additional constraints).

3. OO data-intensive processing

Professional developers. OO minded. Disciplined, top-down development. Weak culture in DB.

Data semantics: in the class schema (through O/RM middleware).

4. Big data

(Semi-)Professional developers. Low complexity applications.RDB discarded as old-style (however NewSQL DBMS are lurking!)

Data semantics: simple, loose (few constraints); metadata in data

Page 52: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

52

1950 - 1975: file-oriented processing

Semantics in record schema and application code

Evolution of data semantics expression

1968 - 1990: hierarchical/network database processing

Semantics in DB schema

1980 - ?: relational database processing

Semantics in DB schema

1990 - 2000: object-oriented DB processing

Semantics in DB schema and application code (methods)

2000 - ?: object-relational DB processing

Semantics in DB schema

2000 - ?: O/RM processing

Semantics in class schema

2011 - ?: NewSQL DB processing

Semantics in DB schema

2005 - ?: NoSQL DB processing

Semantics in data and in application code

prog

DB

DB

prog

DB

prog

prog

DB

Quality of DS representation

Page 53: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

53

Quite often, developers see the database as a mere repository for the data used and created by programs:

• "the database offers persistence services for the business logic layer"

• "the database is an implementation of the program classes"

Some conclusions

This view entails much problems when long term maintenance and evolution are concerned. When the program changes, the database schema often must be modified accordingly, even if its semantics does not change.

The view of the database as a model of the application domain ensures a great stability of business systems.

So, the database is directly dependent on the current state of program architecture.

It makes the joy of researchers in system evolution but lets the practitioners less enthousiast.

Is the database culture still living among today developers?

Page 54: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

54

Thanks

Page 55: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

55

Page 56: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

56

Page 57: University of Namur Faculté d'informatique PReCISE Research Center - Database Engineering Group  PReCISE - A (sort of) spatio-temporal

57

Abstract of the lecture

The role of databases may sometimes appear controversial since they are mere basic services for a significant part of the the software engineering community (the transparent "persistence layer") while they are the central component of business application for the database community. In this lecture, we examine the evolution of the balance database/program both in time (from the early sixties to a foreseenable future) and in space (technologies, communities) from the data semantics point of view. In particular we analyze and compare how and where data semantics has been located and implemented in each of these contexts. Current development practices tend to migrate semantics from the database (as was usual in the eighties and nineties) to the application logic (e.g., O/RM, NoSQL DB managers), a trend that may be seen of regression that reminds us the infancy of business application development where files were dedicated to one application. Finally, the lecture defines how data semantics can be recovered in these scenarios.