information systems: modelling complexity with categories four lectures given by nick rossiter at...

27
Information Systems: Modelling Complexity with Categories Four lectures given by Nick Rossiter at Universidad de Las Palmas de Gran Canaria, 15th-19th May 2000, under the Socrates-Erasmus

Upload: maurice-curtis

Post on 28-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Information Systems: Modelling Complexity with Categories

Four lectures given by Nick Rossiter at Universidad de Las Palmas de

Gran Canaria,

15th-19th May 2000, under the Socrates-Erasmus Programme

Lectures

1. Interoperability in Information Systems

2. Introduction to Category Theory

3. Object Concepts as Categories

4. Handling Heterogeneity with Information

Resource Dictionary System

Lecture 1: Interoperability in Information Systems

Nick Rossiter, Computing Science, Newcastle University, England

[email protected]://www.cs.ncl.ac.uk/people/b.n.rossiter/

Motivations

• Diversity of modelling techniques

• Distributed businesses may exercise local autonomy in platforms

• Data warehousing requires heterogeneous systems to be connected

• Data mining enables new rules to be derived from heterogeneous collections

Basic Definitions 1

• Distribution: information bases are stored on multiple computer systems interconnected by a communication medium.

• Homogeneous system: one that adheres to the same software at all sites.

• Heterogeneous system: one that does not adhere to the same software at all sites.

Basic Definitions 2

• Autonomy: the ability of a site to control its own activities with respect to one or more of:– design– communication– execution– association

Interoperability 1

• Interoperability:

the ability to request and receive services between various systems and use their functionality.

• More than data exchange.

• Implies a close integration.

Interoperability 2

• Features:

exchange of messages and requests

use of each other’s functionality

client-server abilities

distribution

operate multiple systems as single unit

communication despite incompatibilities

extensibility and evolution

Architectures for Interoperability 1

1. Global schema integration

Produces single new schema (C) for the different information systems with schemas (A, B).

A

C

B

Global Schema Integration

• Advantages– Transparent to end users -- appears as single

information system

• Disadvantages– Difficult -- needs human understanding to

perform integration– Local autonomy lost– Static - does not evolve automatically

Architectures for Interoperability2

2. Federated Database Systems

Less tightly coupled schema (than in 1)

Each service through an export schema specifies sharable objects

Common data model

Internal command language

Decentralised control (local autonomy)

Five-level architecture for federated system

Federated Databases: Loosely-coupled

• Created by users

AE,BE are

export V is view

schema

A,B are base schemas

A B

V

AE BE

Federated Databases: Tightly- Coupled

• Created by administrators

• Global schema integration on all export schemas

• More formal than loosely-coupled

• Much effort to resolve semantic inconsistencies

Federated Database Systems - General Advantages

• Preserves local autonomy

• Not all data needs to be integrated

• Provides metadata structures for views (external and export schema, data dictionary)

Federated Database Systems - Disadvantages by Approach

• Tightly-coupled– similar to global schema integration

1) complex, difficult to make changes dynamically

2) much effort in resolving semantic inconsistencies

• Loosely-coupled– duplication by different users in building views– updating data defined in views can be difficult

Multidatabase Language Approach

• No attempt at schema integration

• Various schema in services provided can be heterogeneous, inconsistent and duplicate information in different ways.

• Language (e.g. MSQL) is used to integrate databases at run time.

• Relational data model used as Common Data model

Multidatabase Language Approach - Diagram

A,B are schema

MSQL is runtime

language

A B

MSQL

Multidatabase Language Approach - Advantages

• No preparatory work to understand semantics of schema

• Dynamic -- access latest versions

• Very skilled users can succeed in reaching their goals

• Interesting work on multidatabase dependencies

Example Multidatabase Language

• MSQL (Multidatabase SQL)– Biased towards relational model– Illustrates problems

• Consider 2 databases– Each on publications of a computing society– And query:– “What is the name, email, title for each publication

of an author appearing in both of the society’s databases?”

MSQL - Schema

• Schema 1 (for AIIA):

– Contacts (PersonID, Name, Email, …)

– Conference (Name, Type, …)

– Attendees(ID, Conf_ID, Speaker, …)

– Publ_Papers(P_ID, Title, Author_ID, …)

• Schema 2 (for IFIP):– Member_Socs(Soc_Name, …)

– Conf (Conf_ID, …)

– Publ_Papers(P_Ref, Title, Conf_Ref, …)

– Authors(Name, Email, Paper_ID, …)

Underlined attributes are primary key; attributes in italics are foreign key.

MSQL for Query

USE AIIA, IFIP

SELECT Name, Email, Title

FROM Authors,

IFIP.Publ_Papers IFIP_Paper,

Contacts,

AIIA.Publ_papers AIIA_Paper

WHERE Authors.Name = Contacts.Name

AND Contacts.Person_ID = AIIA_Paper. Author_ID

AND Authors.Paper_ID = IFIP_Paper.P_Ref;

The USE statement declares the multidatabases which are aliased in the FROM statement to distinguish tables with the same name.

Retrieves Name, Email and Title from both databases.

Potential Problems with MSQL

• Are domains on name comparable?

• Can use LET command to create equivalencies of names but does not solve domain mismatch.

• What if one schema not relational? Entity-Relationship model often used as neutral schema for translation and comparison of heterogeneous features

Multidatabase Language - Disadvantages in General

• Distribution is not transparent

• Users must resolve inconsistencies themselves

• Common language may restrict scope of heterogeneity (relational bias)

• Local autonomous system may change schema freely (so that existing queries fail)

Comparison of Approaches

• By coupling:– how tightly is the interoperable system connected to

its underlying systems

• By adaptability:– the ability for the interoperable system to evolve in

line with underlying schema• By transparency:

– the need for the end-user to understand the underlying schema

Comparison of Approaches

Coupling AdaptabilityTransparency

ApproachGlobal Schema Tight Low High

Integration

Federated Medium Medium Medium

Data Bases

Multidatbase Low High Low

Languages

Summary

Trend:

• From Global Schema Integration

Federated Database

Multidatabase Language• of lower coupling, higher adaptability,

and lower transparency.

Further Reading

• Management of Heterogeneous and Autonomous Database Systems

Elmagarmid, Ahmed

Rusinkiewicz, Marek

Sheth, Amit

Morgan Kaufmann 1999.