comp30311: database programming: architectures and issues norman paton university of manchester...

91
COMP30311: Database Programming: Architectures and Issues Norman Paton University of Manchester [email protected]

Upload: robert-higgins

Post on 27-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

COMP30311: Database Programming: Architectures and Issues

Norman PatonUniversity of Manchester

[email protected]

Observations Databases are mostly accessed by

applications: Transaction processing: many small query

or update requests (e.g., flight booking, account management).

Analytical processing: more complex queries, but less frequent updates (e.g., management information systems).

In practice, databases are hardly ever accessed by users typing a query language at a prompt.

Client-Server Architecture The client:

Runs the application.

Invokes requests on the database using the query/update language.

The server: Manages

concurrency, caching, etc.

Client-1 Client-2

Server

Database

Network

Client-Server Issues Classical model

has a thick client: Process flow. Business rules. Constraints. ...

The server is essentially a shared fact store.

Thin clients involve more central code: Process flow. Business rules. Constraints. ...

Database servers are able to be much more than fact stores.

Classical Relational Database Clients encode

most application functionality.

Clients are written using embedded SQL.

Calls to the database use SQL-92.

Client (C)

SQL

DBMS

Tables

Views

Modern Relational Database Application

functionality is divided – there is no strict thin or thick client choice.

Clients are written using embedded languages.

The database has stored programs, using embedded languages or extensions to SQL.

Client (C)

SQL+

CallsDBMS

Tables

Views

Triggers

Procedures

Multi-Tier EnvironmentUser/ApplicationLayer

MiddlewareLayer

DatabaseServerLayer

ApplicationUser

InterfaceApplication

ApplicationLibrary

MiddlewareApplication

Library

DatabaseServer

DatabaseServer

Multi-Tier Environments Greater flexibility,

and thus potentially scaleability.

Data-intensive tasks near the database.

Compute-intensive tasks in the middle layer or on the client.

Example of multi-tier platform: A Web Browser

interacts with a Web Server using CGI (Common Gateway Interface).

The Web Server runs a Java Servlet that interacts with a DBMS using JDBC.

Where Does SQL Fit in? SQL acts as the

API to the database (if relational).

Features of SQL: Standardised. Declarative. Flexible (Queries,

Updates, Administration).

Problems with SQL: Non-trivial to learn

(not good for end users).

Poor for repetitive tasks (e.g., for manual data entry).

Of limited computational power (so used with other languages).

Programming Databases Options include:

Embed query language in existing programming language (e.g., JDBC, SQLJ).

Extend query language with programming features (e.g., SQL-99, PL/SQL).

Extend programming language with database features (no current products?).

Map database constructs to programming constructs as in Object Databases and JDO (e.g., FastObjects, Objectivity)

Provide database components for programming environments (e.g., Delphi, ADO.NET).

Embedded SQL ExampleEXEC SQL BEGIN DECLARE SECTION;

VARCHAR name[20]; // Data passed C <-> SQL

EXEC SQL END DECLARE SECTION;

EXEC SQL

SELECT type INTO :type // Single valued result

FROM station WHERE name=:name; // Parameter

type.arr[type.len] = '\0'

printf("%s\n",type.arr); // Note String Format

JDBC ExampleConnection conn =

DriverManager.getConnection(url,args[0],args[1]);

Statement stmt = conn.createStatement();

ResultSet rset = stmt.executeQuery (

"select T# from TRAIN");

while (rset.next())

System.out.println(rset.getString(1));

PL/SQL Exampledeclare

cursor c1 is

select t# from train

where source = ‘Edinburgh’;

begin

for ed_train in c1 loop

insert into edinburgh values (ed_train.t#);

end loop;

end

Multi-Language Environments Where two languages are used

together, a mapping is required between their type systems.SQL-92 C

INTEGER int

VARCHAR char*

tuple struct

table

array

Impedance Mismatch The problems encountered linking

two independently developed languages are known as the impedance mismatch, which has two aspects: A type system mismatch that affects

programmer productivity. An evaluation strategy mismatch that

affects performance.

Type System Mismatch Database types are not supported directly

in the programming language, so, for example, relations may have to be mapped to iterators.

Programming language types are not supported directly in the database, and thus have to be mapped, for example, to relations for storage.

The programming language type checker cannot check the legality of embedded calls, leading to runtime errors.

Evaluation Strategy Mismatch Database operations typically act on

and return collections. Programming language operations typically act on and return single values. Query results may be computed in their

entirety and cached before any access from the programming language.

The database may retrieve data that is never consumed by the programming language.

Summary There are many choices in

database programming: Which technologies to use. Which architecture to use.

Many non-trivial decisions may significantly influence: System performance. Development and maintenance costs.

Further Reading Oracle Database Application

Developers Guide – Fundamentals [Chapter 1: Programmatic Environments].

JDBC: Programming Relational Databases from Java

Trains Database Schema

Station

District

Visit Train*1 * 1

BookingCustomer

*

1

1 *

See handout for the relationalschema and example programs.

JDBC and SQLJ There are two standard interfaces

allowing relational databases to be accessed and manipulated from Java: JDBC: A class library that allows dynamic

SQL statements to be called from Java. SQLJ: A preprocessor that allows static

SQL statements to be embedded in Java. JDBC is much more widely used.

JDBC JDBC can be used in client applets or

applications, or (in some database systems) for implementing server-side functionality.

JDBC involves no extensions to the syntax of Java. The JDBC package is imported thus:

import java.sql.* Specific database systems are accessed

using vendor or third party drivers: DriverManager.registerDriver( new oracle.jdbc.driver.OracleDriver());

JDBC Database Interaction

ResultSet

Statement

ResultSet ResultSet

PreparedStatement

CallableStatement

Connection

DriverManager

mySQLDriver

OracleDriver

Application

Connecting to a Database Statements and transactions are

associated with connections. There are several ways of

establishing a connection. An example is:

String url = "jdbc:oracle:thin:@sr.cs.man.ac.uk:1526:teach";

Connection conn = DriverManager.getConnection

(url,username,password);

Connection URL The URL is of the form: jdbc:oracle:<drivertype>@<hoststring> An example hoststring is:

aardvark.cs.man.ac.uk:1526:teach

Different driver types use pure java or include native code, and use generic or custom network protocols.

In the above, 1526 is the port, and teach is the system identifier.

Single Slide Exampleimport java.sql.*;class Trains{ public static void main (String args []) throws SQLException { DriverManager.registerDriver(...); String url = “..."; Connection conn = DriverManager.getConnection (url,args[0],args[1]); Statement stmt = conn.createStatement(); ResultSet rset = stmt.executeQuery ("select T# from TRAIN"); while (rset.next()) System.out.println (rset.getString(1)); }}

Statements Queries are run against the database

through the creation and execution of statements:

Statement stmt = conn.createStatement(); ResultSet rset = stmt.executeQuery ("select T# from TRAIN");

Note that the query is a String, which could be constructed at runtime if required.

Note the potential for runtime errors if the query is invalid.

Query Results The result of executing a query is a ResultSet, which supports: Iterator functionality, through boolean next(), boolean previous().

Tuple access functionality, as described on the next slide.

Update functionality, for results from simple queries, through deleteRow(), updateXXX().

Control functionality, through setFetchSize(int rows).

Accessing Result Tuples In JDBC there is no predefined Java type for

the result of a query, so attribute values are retrieved by: getXXX() functions, where XXX is the result

type. The argument to the function is either the

column position (starting from 1) or its name.

Note the potential for runtime errors if the result is not as anticipated.

Prepared Statements - 1 A PreparedStatement object allows an

SQL statement to be run multiple times, with different parameters, without the SQL being recompiled by the database.

Simple example: PreparedStatement pstmt = conn.prepareStatement( "select t# from train where source = ?"); pstmt.clearParameters(); pstmt.setString(1,args[2]);

ResultSet rset = pstmt.executeQuery();

Prepared Statements - 2 Creating a prepared statement – formal

parameters are identified by “?”s: PreparedStatement pstmt = conn.prepareStatement( "insert into booking values (?,?,?)");

Parameters are bound using setXXX (pos,val) (pos starts from 1): pstmt.setString(1,args[2])

The request is executed using executeQuery() or executeUpdate().

Single Slide Updateimport java.sql.*;class MakeBooking{ public static void main (String args []) throws SQLException { DriverManager.registerDriver(...); String url = “...”; Connection conn = DriverManager.getConnection(url,args[0],args[1]); PreparedStatement pstmt = conn.prepareStatement( "insert into booking values (?, ?, ?)"); pstmt.clearParameters(); pstmt.setString(1,args[2]); pstmt.setString(2,args[3]); pstmt.setDate(3,java.sql.Date.valueOf(args[4])); pstmt.executeUpdate(); }}

Update Results Statement and PreparedStatement

objects can be associated with queries and updates (as strings).

The result types of the outputs are different, however, so separate ResultSet executeQuery() and int executeUpdate() methods are required.

Transactions By default, each statement

executes in a distinct transaction. To group statements, where conn is

a Connection, use: conn.setAutoCommit(false) to

override the single-statement default and start a transaction.

conn.commit() and conn.rollback() to complete a transaction.

Closing Things Down The close() operation is

supported on lots of things: Connection. Statement. ResultSet.

In all cases, close() reclaims resources; it is good practice to close all the above as soon as possible.

Handling Errors – Important!Connection conn = null;try { ...} catch (SQLException e) { System.out.println("SQL Exception: " + e.getMessage());} finally { if (conn != null) { try {

conn.rollback(); conn.close(); } catch (SQLException sqlEx) { // ignore } }}

Summary JDBC is the most widely used means of

accessing relational databases from Java. JDBC is a class library – there are no

syntactic extensions to Java. JDBC supports dynamic SQL (i.e., queries

are strings) – flexible, but runtime type error possibilities.

Impedance mismatches? See tutorial sheet.

Further Reading Oracle 10g JDBC Developers Guide

and Reference, 2001 [Chapter 1: Overview; Chapter 3: Basic Features].

Sun JDBC Tutorial: http://java.sun.com/docs/books/tutorial/jdbc/

Object Relational Extensions to SQL

Data Model History

1970

1980

1990

2000

2005

IMS Network Rel’n Object XML

Object-Relational Databases Weaknesses of vanilla

Relational databases: Limited data modelling

facilities. Limited application

development facilities. Object-relational

databases aim to overcome these weaknesses.

“Object-Relational” is an umbrella term for assorted extensions.

Model: Abstract data types

(cartridges, blades, ...). Object type extensions.

Programming: Programming language

extensions to SQL. Active rules/triggers.

Object Relational Databases These add to the relational model:

Object types. Nested tables. References. Inheritance. Methods. Abstract data types.

The SQL:2003 standard covers all of the above; in what follows, examples are from Oracle 10g.

SQL:1999 and SQL:2003 The SQL-92 standard

now characterises basic relational functionality (as taught in CS231).

SQL:1999 was the successor for object-relational databases, developed throughout the ’90s.

SQL:2003 refined the many extensions in SQL:1999 and started to add XML support.

SQL:1999/SQL:2003: Are not uniformly

adopted – many vendors have their own object-relational extensions developed since the early ’90s.

Cover model extensions, type extensions, programming extensions, triggers, etc.

Object-Relational in Oracle Model:

Type system extensions to support object types, encapsulation, references.

Primitive type extensions as cartridges to support multimedia data, spatial data, etc.

Programming: PL/SQL adds

imperative programming to SQL.

Triggers allow PL/SQL programs to be executed reactively.

Relational Model and Types Data type

completeness: each type constructor can be applied uniformly to types in the type system.

In the basic relational model:

There is only one type constructor (i.e. relation).

That type constructor cannot be applied to itself.

Incorporating data type completeness to the relational model gives nested relations.

In addition, the type relation is essentially:

Bag < Tuple >. Separating out these

type constructors provides further flexibility, such as tuple-valued attributes.

Object Types in Oracle An object type is a user-defined data

type, somewhat analogous to a class in object-oriented programming.

Types can be arranged in hierarchies, and instances of types can be referenced.

create type visit_type as object ( name varchar(20), /* the station */ thetime number);

Nested Relations Nested relations involve the storage of

one relation as an attribute of another.

create type visit_tab_type as table of visit_type;

create table train ( t# varchar(10) not null, type char(1) not null, visits visit_tab_type,primary key (t#))nested table visits store as visits_tab;

Populating Nested Tables The name of the type can be used as

a constructor for values of the type.

update train set visits = visit_tab_type( visit_type('Edinburgh',950), visit_type('Aberdeen',720)) where t# = '22403101'

Querying Nested Tables Query operations such as unnesting

allow access to the contents of a nested table.

The following query retrieves details of the trains that visit Inverness.

select *from train t, table(t.visits) vwhere v.name = ‘Inverness’

Abstract Data Types Abstract data types allow new primitive

types to be added to a DBMS (a.k.a. data blades, cartridges).

These primitive types can be defined by (skilled) users or vendors.

Oracle-supplied cartridges include: Time. Text. Image. Spatial. Video.

Oracle Spatial Cartridge The spatial cartridge provides a

collection of new primitive types.

Supporting the Spatial Types Operations:

Geometric (area, difference, …).

Topological:

Implementation: The cartridge uses

specialised index structures such as R-trees.

The optimiser knows the properties of the R-tree, and how it can be used to make queries faster.

Programming in SQLdeclare cursor c1 is select t# from train where source = 'Edinburgh' or dest = 'Edinburgh';begin for ed_train in c1 loop insert into edinburgh values (ed_train.t#); end loop;end

Programming in SQL The following PL/SQL program iterates

through a query result.

declare cursor c1 is select t# from train where source = 'Edinburgh' or dest = 'Edinburgh';begin for ed_train in c1 loop insert into edinburgh values (ed_train.t#); end loop;end

Example Program: Comments-1 Pl/SQL is a block structured language,

with structure:[declare declarations]

begin

statements

[exception handlers]

end No relation type in PL/SQL, so cursors

iterate over query results.

Example Program: Comments-2 The for loop iterates over the result

of the query associated with the cursor, fetching results one at a time.

Each tuple retrieved from the cursor has type:

record(t# varchar(10)) The type of the variable ed_train is

inferred.

PL/SQL: More Cursors/Loopsdeclare cursor c1 is <as before> ed_tno train.t#%type;begin open c1; loop fetch c1 into ed_tno; exit when c1%notfound; insert into edinburgh values (ed_tno); end loop; close c1;end

Loop Example: Comments The declare section can introduce new

cursors, types or variables. Variables and cursors have attributes,

such as %type, %rowtype and %notfound for accessing properties.

The cursor is explicitly opened, closed and fetched from (in contrast with the previous example).

The loop construct can mimic classical while-do and repeat-until loops.

Declaring Types Types can be declared explicitly:

As a choice, even if in the database. If there is no direct analogue in the

database. Other than records, there are object

types and lookup tables.

declare type ed_train_type is record (t# varchar(10), thetime number); ed_train ed_train_type;

Collection Types Collections tend to be

important in databases:

Persistent data types tend to be bulk data types (e.g. relations).

Operations on bulk data types tend to act on complete collections (e.g. there is no operation to update a tuple in SQL-92).

There are normally few built-in collection types in programming languages (e.g. array).

Collections are often provided in class libraries (e.g. java.util.Collection).

PL/SQL Collection Types Declarations:

type name is table of type-name.

type name is varray (size-limit) of type-name.

type name is table of type-name index by binary_integer.

Unlike tables, varrays: Have a maximum size. Are dense, so elements

cannot be deleted. Oracle can store

varrays and (non-indexed) tables in the database.

Stored varrays cannot be manipulated directly by SQL – they must be retrieved first.

Lots of curious rules...

Collection Type Exampledeclare type ed_train_type is table of train.t#%type index by binary_integer; ed_table ed_train_type; i binary_integer := 0;begin for ed_train in c1 loop i := i + 1; ed_table(i) := ed_train.t#; end loop; ...end

Stored Procedures/Functions Oracle supports

stored procedures, functions and packages.

Stored procedures can be called from each other, from triggers, from Java, from Web Services, ...

Client (C)

SQL+

CallsDBMS

Tables

Views

Triggers

Procedures

Example Header A procedure has no result type, whereas

a function returns a result. Function header from tutorial: function FastestTrain

(src varchar, dst varchar)

return varchar The body of a function is a PL/SQL block. Results are returned using return.

Calling PL/SQL from JDBCConnection conn =...// Create a CallableStatementCallableStatement cstmt = conn.prepareCall("{? = call FastestTrain(?,?)}");// Set its two parameterscstmt.setString(2, args[2]);cstmt.setString(3, args[3]);cstmt.registerOutParameter(1, Types.VARCHAR);

// Execute the statement and print its resultcstmt.execute();System.out.println("Fastest = " + cstmt.getString(1))

Summary Claims for

programming language extensions:

Reduces impedance mismatches.

Improves [Portfolio 01]:

Performance. Programmer

productivity. Portability. Security.

The reality: SQL extensions are

often not elegant. They are widely

used. They are not

portable across products.

Performance always has many facets.

Further Reading Oracle 10g PL/SQL User Guide and

Reference [Chapter 1: Overview]. Oracle 9i PL/SQL User Guide and

Reference [Appendix 1: Example Programs].

M. Piattini, O Diaz (eds), Advanced Database Technology and Design, Artech Press, 2000 [Chapter 6: Object-Relational Database Systems].

M. Stonebraker, P. Brown, Object-Relational DBMSs, 2nd Edition, Morgan-Kaufmann, 1999.

Programming Language Extensions to SQL: Triggers

Triggers An active database is

one that can respond automatically to events.

The events to which a database may want to react are mostly within the database, but could in principle be outside.

Most relational products support active behaviour, and it is in SQL:2003.

Active behaviour is expressed using rules containing:

an event, an (optional)

condition, and an action,

a.k.a. ECA-rules. These active rules

are known as triggers in relational products and SQL:2003.

Applications of Triggers Extending built-in

behaviours: integrity constraints. auditing. authorisation. statistics. data derivation.

Triggers are thus generic mechanisms, powerful, but often harder to use than the built-in behaviour.

Supporting application functionality:

Alerters – the user is informed when something significant happens.

Business rules – an organisational behaviour is enforced or carried out as a reaction to database changes.

Business Rules Recovering business

rules: Indicate how the

organisation recovers from a problem.

Example: too many people have

enrolled on a seminar for the space allocated.

reaction – book larger room, run two seminars in parallel, ...

Causal business rules:

Brings about a behaviour when a condition is satisfied:

Example: enough people enrol

for a seminar to make it viable.

reaction – book a room, inform potential attendees, inform tutor.

Trigger Structure Oracle trigger syntax:

create or replace trigger nameevent[when condition][for each row]action

In Oracle: The condition is a boolean expression (that

does not access the database). The action is a PL/SQL block.

Rule Triggering

Rulebase:R1: on U1 when C1 do U2, U4R2: on U2 when C2 do U3

U0 ...

if C1

U1transactiontrigger

R1 U2

if C2 U3triggerR2

U4

Trigger Concepts - 1 Transition

granularity. A rule may trigger: once per tuple

change – row transition granularity.

once per update statement – statement transition granularity.

Coupling mode. A rule condition may evaluate: as soon as the

event has taken place – immediate coupling mode.

at some time after the event took place – deferred coupling mode.

Trigger Concepts - 2 Priorities:

A single event may trigger multiple rules.

A collection of deferred rules may be triggered at the same time by different events.

Priorities may be: unspecified, relative, absolute, by creation date.

Event types: A primitive event type

is considered an atomic happening (e.g., the update to a tuple, a time of day).

A composite event type is based on an algebra over primitive events (e.g., E1 OR E2, E1 AND E2, ...).

Oracle Triggers Transition granularity:

row triggers - FOR EACH ROW. statement triggers – no FOR EACH ROW.

Coupling mode: immediate.

Priorities: unspecified.

Event types: primitive (DML, DDL and system (e.g.

startup/shutdown)). composite (but only OR).

DML Events Follow database updates:

[BEFORE|AFTER] INSERT ON table. [BEFORE|AFTER] DELETE ON table. [BEFORE|AFTER] UPDATE OF table. [BEFORE|AFTER] UPDATE OF column on table.

Plus disjunction, e.g.: BEFORE INSERT OR UPDATE OF visit.

Condition Row triggers can

have conditions that guard the action.

The condition is a boolean expression (AND, OR, NOT, >, <, ...).

The condition refers to literals and to event properties through correlation variables, e.g.:

WHEN new.age < 21.

Correlation variables available depend on event types:

Event new old

INSERT Y N

DELETE N Y

UPDATE Y Y

Action An action is a PL/SQL block. An action:

can refer to correlation variables, as :new, :old (row triggers only).

can test the type of event being reacted to using inserting, updating, deleting.

cannot use transaction control commands directly (but can raise exceptions).

Trigger Design Issues Termination:

Triggers can trigger each other recursively, which may lead to cycles (or a threshold as in Oracle).

Confluence: The (arbitrary) order of

selection for multiple triggered rules may lead to unanticipated behaviour.

Mutating tables: in Oracle, a row trigger cannot modify a table in mid-update.

create or replace trigger t9before insert on visitfor each rowbegin delete from visit where t# = :new.t#;end

Example Triggers Requirement: maintain

a table numBookings of the numbers of bookings of each train on each date.

Events to monitor: insert on booking. delete on booking. update of t# on

booking. update of date on

booking.

create table numBookings ( t# varchar(10) references train(t#), thedate date, num number,primary key (t#, thedate))

Insert Casecreate or replace trigger numBookings2after insert on bookingfor each rowdeclare numPresent integer;begin select count(*) into numPresent from numBookings where t# = :new.t# and thedate = :new.thedate; if (numPresent = 0) then insert into numBookings values (:new.t#, :new.thedate, 1); else update numBookings set num = num + 1 where t# = :new.t# and thedate = :new.thedate; end if;end;

Comments on Insert Case AFTER event, as only update numBookings if booking actually changed.

No use of condition, as need to conduct action for every insert to numBookings.

Creates a numBookings tuple if none was present before (corresponding delete action should remove if no bookings remain).

Delete Casecreate or replace trigger numBookings1after delete on bookingfor each rowdeclare currentNumber integer;begin select num into currentNumber from numBookings where t# = :old.t# and thedate = :old.thedate; if (currentNumber = 1) then delete from numBookings where t# = :old.t# and thedate = :old.thedate; else update numBookings set num = num - 1 where t# = :old.t# and thedate = :old.thedate; end if;end;

Comments on Delete Case Broadly the inverse of the insert case. Many references to :old correlation

variable (c.f. :new for insert case). Update case is broadly a delete then

insert – see tutorial. This problem can also be addressed

using statement triggers – see tutorial.

Identifying Events A single application

functionality may need to monitor many events. Example:

Tables: emp(ename,bname,sa

l) boss(bname,sal)

Constraint: no employee is paid

more than his/her boss.

Quiz: what events may invalidate the constraint?

NEW on ??? UPDATE on ??? UPDATE on ??? UPDATE on ???.

Choosing Reactions Many reactions may

be plausible, for example, to restore a constraint.

Different policies may be used in responding to different events.

Different policies: For example, may change boss if boss.sal reduced, but raise salary of boss to match increase in employee’s salary.

Quiz: what reactions could be used to resatisfy the constraint?

Possible reactions: Decrease ??? Increase ??? Change ??? Delete ??? Delete ???

Selecting Transition Granularity Tuple:

Access available to correlation variables.

Precise response to specific changes possible.

Often need many triggers to handle fine grained reactions.

Statement: No access to

correlation variables.

No possibility of precise response to changes.

Often need fewer triggers as generic reaction not very fine grained.

Summary on Triggers Triggers:

Extend the ways in which programming functionality can be stored in the database.

Extend built-in facilities for integrity, security, etc.

Are powerful ... but not always easy to develop or maintain.

Further Reading Oracle 10g Database Concepts

[Chapter 22: Triggers]. Oracle 10g Application Developers

Guide [Chapter 9: Using Triggers]. M. Piattini, O Diaz (eds), Advanced

Database Technology and Design, Artech Press, 2000 [Chapter 3: Active Databases].