cs240a: databases and knowledge bases temporal applications and sql carlo zaniolo department of...

25
CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Upload: janice-gaines

Post on 18-Jan-2018

222 views

Category:

Documents


0 download

DESCRIPTION

Applications Abound: Examples  Academic: Transcripts record courses taken in previous and the current semester or term and grades for previous courses  Accounting: What bills were sent out and when, what payments were received and when?  Delinquent accounts, cash flow over time  Money­management software such as Quickencan show e.g., account balance over time.  Budgets: Previous and projected budgets, multi­ quarter or multi­year budgets

TRANSCRIPT

Page 1: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

CS240A: Databases and Knowledge Bases

Temporal Applications and SQL

Carlo ZanioloDepartment of Computer ScienceUniversity of California, Los Angeles

Page 2: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Temporal Databases

The problem is harder than what you think A time ontology—Temporal data type in SQL Temporal Applications and SQL

Plenty of temporally-oriented DB applicationsNot supported well by SQLMany research approaches proposed to solve the

problem TSQL2 The physical level: efficient storage and indexing

techniques.

Page 3: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Applications Abound: Examples

Academic: Transcripts record courses taken in previous and the current semester or term and grades for previous courses

Accounting: What bills were sent out and when, what payments were received and when?Delinquent accounts, cash flow over timeMoney management software such as Quickencan show

e.g., account balance over time. Budgets: Previous and projected budgets, multi

quarter or multi year budgets

Page 4: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Temporal DB Applications (cont.)

Data Warehousing: Historical trend analysis for decision support

Financial: Stock market data Audit: why were financial decisions made, and

with what information available? GIS: Geographic Information Systems ()

Land use over time: boundary of parcels changeover time, as parcels get partitioned and merged.

Title searches Insurance: Which policy was in effect at each

point in time, and what time periods did that policy cover?

Page 5: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Temporal DB Applications (cont.)

Medical records: Patient records, drug regimes, lab tests.Tracking course of disease

Payroll: Past employees, employee salary history, salaries for future months, records of withholdingrequested by employees

Capacity planning for roads and utilities. Configuring new routes, ensuring high utilization

Project scheduling: Milestones, task assignments Reservation systems: airlines, hotels, trains. Scientific: Timestamping satellite images. Dating

archeological finds

Page 6: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Temporal DBs Applications: Conclusion

It is difficult to identify applications that do not involve the management of temporal data.

These applications would benefit from built in temporal support in the DBMS. Main benefits:More efficient application developmentPotential increase in performance

Page 7: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

A Case Study on using SQL for temporal Apps

University of Arizona's Office of Appointed Personnel has some information in a database.

Employee(Name, Salary, Title) The OAP wishes to add the date of birth

Employee(Name, Salary, Title, DateofBirth DATE)

SELECT Salary, DateofBirth FROM Employee

WHERE Name = 'Bob‘ Finding an employee's DoB is as easy as finding his/her

salary. Managing Dates and instants in time are not a problem---

the hard problems come with periods (a.k.a. intervals)

Page 8: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Converting to a Temporal Database

Now the OAP wishes to computerize the employment history.

Adding validity periods to tuples:

Employee (Name, Salary, Title, DateofBirth, Start DATE, Stop DATE)

Page 9: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

A Temporal Database using Periods

Employee (Name, Salary, Title, DateofBirth,Start DATE, Stop DATE)

Name Salary Title DateofBirth Start Stop Bob 60000

AssistantProvost

1945 04 19 1993 01 01 1993 06 01 Bob 70000 AssistantProvost 1945 04 19 1993 06 01 1993 10 01 Bob 70000 Provost 1945 04 19 1993 10 01 1994 02 01 Bob 70000 Professor 1945 04 19 1994 02 01 1995 01 01

Here we use closed intervals—intervals open to the right are also used often: if the new period begin at 1993-10-01 the old one must end at 1993-09-30.

Page 10: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Temporal Representations

Temporal Intervals (i.e., periods) are the most popular representations for time. For a short overview of the topic see: www.ise.bgu.ac.il/courses/trp/TR-in-CS-and-AI.revised.ppt

Operators and predicates associated with periods include: Overlap, Contains, Meets, Precedes, and Follows

Several other representations have been used including: Sets of Periods, Point-Based Representations TSQL2: the model is set of periods where each period is a

set of cronons--but there is no direct access to those: there manipulation follows implicitly from special constructs.

Page 11: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Allen’s 13 Temporal Predicates on PeriodsA

B

A

B

A

B

A

B

A

B

A

B

AB

A FINISHES B

B is FINISHED by A

A is BEFORE B

B is AFTER A

A MEETS B

B is MET by A

A OVERLAPS B

B is OVERLAPPED by A

A STARTS B

B is STARTED by A

A is EQUAL to B

B is EQUAL to A

A DURING B

B CONTAINS A

Page 12: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Period-based temporal representations are simple in SQL

But temporal queries are quite complex.. E.g., Find the employee's salary at a given time: e.g. the

current one:

SELECT SalaryFROM EmployeeWHERE Name = 'Bob‘AND Start <= CURRENT_TIMESTAMPAND CURRENT_TIMESTAMP <= Stop

Instead of CURRENT_TIMESTAMP we could have given any timestamp or date

Page 13: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Distributing the Salary History

OAP wants to distribute to all employees their salary history. For Bob 3 consecutive periods at 70000.

In general: employee could have arbitrarily many title changes between salary changes

Canonical representation: maximal intervals at each salary

Name Salary Start StopBob 60000 1993 01 01 1993 06 01Bob 70000 1993 06 01 1995 01 01

A complex operation called coalescing is needed to compute the maximal intervals—this must be computed after each projection.

Page 14: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Coalescing Using Embedded SQL

The coalescing operations that are needed after each projection operation are difficult to express in pure SQL.Coalescing Using Embedded SQL: Use SQL only to open a cursor on the table and perform the actual coalescing in a programming language.Coalescing Using a 4GL: solution given in the textbook (1995) is obsolete: use recursive queries in SQL1999/2003

Page 15: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

A Better Alternative Recursion

Use recursion. It two periods overlap merge them into a new

period.Overlap:Not(E1<S2 or E2 <S1)= E1>=S2 and E2>=S1.

Page 16: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Surprise: Non-Recursive SQL

CREATE TABLE Temp(Salary, Start, Stop) AS SELECT Salary, Start, Stop FROM Employee WHERE Name = 'Bob';

SELECT DISTINCT F.Salary, F.Start, L.StopFROM Temp AS F, Temp AS LWHERE F.Start < L.StopAND F.Salary = L.SalaryAND NOT EXISTS (SELECT * FROM Temp AS MWHERE M.Salary = F.Salary AND F.Start < M.StartAND M.Start < L.StopAND NOT EXISTS (SELECT *FROM Temp AS T1WHERE T1.Salary = F.Salary AND T1.Start < M.StartAND M.Start <= T1.Stop))AND NOT EXISTS (SELECT *FROM Temp AS T2WHERE T2.Salary = F.Salary AND ( (T2.Start < F.Start AND F.Start <= T2.Stop) OR (T2.Start < L.Stop AND L.Stop < T2.Stop)))

Page 17: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

The Curse of Coalescing

Maximal periods is the standard state-based representation

Projection is very common in queries and was no-op in SQL: but now requires coalescing

Expressing coalescing in SQL is possible but complex

A better solution could be to introduce a special operator—which is basically an aggregate, and aggregates in SQL require a particular structure.

Question: Can we find representations that eliminate or minimize the need for coalescing?

Page 18: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Minimize the need for coalescing by Reorganizing the schema

Separate Salary, Title, and DateofBirth information:Employee1 (Name, Salary, Start DATE, Stop DATE)

Employee2 (Name, Title, Start DATE, S top DATE)

Getting the salary information is now easy: SELECT Salary, Start, Stop

FROM Employee1WHERE Name = 'Bob‘

But what if we want a table with both salary and title?

Page 19: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Temporal Joins

Name Salary Start StopBob 60000 1993 01 01 1993 06 01Bob 70000 1993 06 01 1995 01 01

Name Title Start StopBob AssistantProvost 1993 01 01 1993 10 01Bob Provost 1993 10 01 1994 02 01Bob FullProfessor 1994 02 01 1995 01 01

Name Salary Title Start StopBob 60000 AssistantProvost 1993 01 01 1993 06 01Bob 70000 AssistantProvost 1993 06 01 1993 10 01Bob 70000 Provost 1993 10 01 1994 02 01Bob 70000 FullProfessor 1994 02 01 1995 01 01

Their Temporal Join:

Employee1:

Employee2:

Page 20: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Temporal Join in SQLSELECT E1.Name, Salary, Title,

E1.Start, E1.StopFROM Employee1 AS E1,

Employee2 AS E2WHERE E1.Name=E2.Name AND

E2.Start <= E1.Start AND E1.Stop <= E2.Stop

UNION ALLSELECT E1.Name, Salary, Title,

E1.Start, E2.StopFROM Employee1 AS E1,

Employee2 AS E2WHERE E1.Name = E2.Name AND

E1.Start > E2.Start AND E2.Stop< E1.Stop AND E1.Start < E2.Stop

UNION ALLSELECT E1.Name, Salary, TitleE2.Start, E1.Stop

FROM Employee1 AS E1, Employee2 AS E2

WHERE E1.Name = E2.Name AND E2.Start > E1.Start AND E1.Stop <= E2.Stop AND E2.Start < E1.Stop

UNION ALL SELECT E1.Name, Salary, Title

E2.Start, E2.Stop FROM Employee1 AS E1, Employee2 AS E2

WHERE E1.Name = E2 Name AND E2.Start => E1.Start AND E2.Stop <= E1.Stop AND NOT (E1.Start = E2.Start AND E1.Stop = E2.Stop)

Page 21: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Simpler Temporal Join in SQL

Overlap: E1>=S2 and E2>=S1What is the actual intersection? larger(X, Y, X) <- X>=Y. larger(X, Y, Y) <- X < Y.Symmetrically for smaller. You can also use if… then else in Deals, or CASE in SQLNow:

intersection(B, E)<- period(S1, E1), period(S2, E2), larger(B1, B2, B), smaller(E1, E2, E), B<E.

Page 22: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Summary

Coalescing and temporal joins are very difficult to express in SQL.

Solutions proposed …Time stamp attributes rather than tuples—but then

many temporal joins must be used Point-Based RepresentationOthers, including combinations of above (more than 40

counted that were using SQL)We will discuss TSQL2 and XML later.

Page 23: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Reviewing the Situation

The importance of temporal applications has motivated much research on temporal DBs: but no satisfactory solution has been found yet: SQL does not support temporal queries well Temporal DBs remain an open research problem.

The problem is much more difficult than it appears at first: we have become so familiar with the time domain that we tend to overlook its intrinsic complexity.

Other issues that we have not discussed yet include: Support for bitemporal models temporal clustering, indexing and implementation-oriented

issues

Page 24: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

This was proposed by temporal DB people Point-Based Model

Employee1 (Name, Sal, Day ) Bob 60000 1993 01 01

… Bob 60000 1993 05 31 Bob 70000 1993 05 31

… Bob 70000 1994 12 31

To project out Sal you only need to eliminate that column.

Internally we need to use more concise representations—e.g., the period-based representations:

Name Salary Start StopBob 60000 1993 01 01 1993 06 01Bob 70000 1993 06 01 1995 01 01

Page 25: CS240A: Databases and Knowledge Bases Temporal Applications and SQL Carlo Zaniolo Department of Computer Science University of California, Los Angeles

Queries in Point-Based

No coalescing needed in the query: e.g., project out salary:SELECT E1.Name, E1.DayFROM Employee1 AS E1

Temporal Joins are simple: SELECT E1.Name, Sal, Title

FROM Employee1 AS E1, Employee2 AS E2WHERE E1.Name = E2.Name AND E1.Day=E2.Day

The point –based representation can only be used as a logical view. Since a very different internal representation must be used, support for queries can be a challenge. No serious takers so far.