algorithms using more than two passes. the relations of arbitrary size can be processed by using...

76
Algorithms Using More Than Two Passes

Post on 21-Dec-2015

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Algorithms Using More Than Two Passes

Page 2: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

The relations of arbitrary size can be processed by using many passes as necessary.

Multipass Sort-Based Algorithms Multipass Hash-Bases Algorithms

Introduction

Page 3: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Allow us to sort a relation, however large it may be.

Suppose “M” main-memory buffers are available to sort a relation R, then do:

Basis: If R fits in M blocks (ie, B(R) <=M),

Read R into main memory Use main-memory sorting algorithm Write the sorted relation to disk.

Multipass Sort-Based Algorithms

Page 4: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Induction: If R does not fit into main memory,

Partition R into M groups (R1,R2,…,Rm) Recursively sort Ri for each i=1,2,…,M Merge the M sorted sublists

Merging M sorted sublists:

Output one copy of each distinct tuples, skip over copies Sort on the grouping attributes Combine the tuples with a given value of these grouping attributes in an appropriate

manner

Multipass Sort-Based Algorithms

Page 5: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

s- Size of the relation operated uponM – Buffersk – Passess(M,k) – Maximum size of a relation that can be sorted using M buffers

Compute s(M,k):Basis:If k=1,

One pass is allowed s(M,k) = M

Performance of Multipass Sort-Based Algorithms

Page 6: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Induction:If k>1,

Partition R into M pieces, each of which must be sortable in k-1 passes If B(R) =s(M,k), then s(M,k) / M – Size of each of the M pieces of R, cannot exceed s(M,k-1) ie, s(M,k) = M s(M,k-1) s(M,k) = M s(M,k-1) = M^2 s(M,k-1) = . . . = M^(k-1) s(M,1) But, s(M,1) = M s(M,k) = M^k

Minimum number of buffers:

Using k passes, we can sort a relation R if B(R) <= M^k ie, If we want to sort R in k passes, the minimum number of buffers we can use is M = (B(R))^1/k

Each pass of a sorting algorithm reads data from the disk and writes it out again A k-pass sorting algorithm requires 2kB(R) disk I/Os. The result is a total of(2k - 1) (B(R) f B(S)) disk I/0s

Performance of Multipass Sort-Based Algorithms

Page 7: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

A recursive approach to using hashing for operations on large relations.

Hash the relation into M-1 buckets, M – number of available memory buffers Apply the operation to each bucket individually – Unary operation Apply the operation to each pair of corresponding buckets – Binary operation

Recursive Approach:Basis: Case 1: Unary operation:

If the relation fits in M buffers, read it into memory Perform the operation

Case 2: Binary operation:

If either of the relations fits into M-1 buffers, read into main memory Read the second relation, one block at a time into the buffer no. M

Multipass Hash-Based Algorithms

Page 8: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Induction:If no relation fits in main memory,

Hash each relation into M-1 buckets Recursively perform the operation on each bucket or corresponding pair of buckets Accumulate the output from each bucket or pair

For the common relational operations we haveconsidered duplicate-elimination, grouping, union, intersection, difference,natural join, and equijoin - the result of the operation on the entire relation will be the union of the results on the bucket(s)

Multipass Hash-Based Algorithms

Page 9: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Assumption:The tuples divide as evenly as possible among the buckets. Practically, there will be some unevenness in the tuple distribution.

Case 1: Unary operation:Let u(M,k) be the number of blocks in the largest relation that a k-pass hashing algorithm can handle.

If R fits into M buffers,u(M,1) = M

If R does not fit into M buffers, Divide R into M-1 buckets of equal size – First pass The buckets for the next pass must be small that they can be handled in k-1 passes, ie, buckets are of size u(M,k-1). Since R is divided into M-1 buckets, we must have u(M,k) = (M-1) u(M,k-1) => u(M,k) = M(M-1)^(k-1) or approx, M^k, M is large.

Performance of Multipass Hash-Based Algorithms

Page 10: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Case 2: Binary operation:Let j(M,k) be the upper bound on the size of the smaller of the two relations R and S.

Basis:j(M,1) = M-1; ie, if we use the one-pass algorithm to join, then either R or S must fit in M-1 blocks.

Induction:

First pass – Divide into M-1 buckets => j(M,k) = (M-1) j(M,k-1) Each bucket size = 1/M-1 We must be able to join each pair of the corresponding buckets in M-1 passes

j(M,k) = (M-1)^k, M is large => j(M,k) = M^k

Performance of Multipass Hash-Based Algorithms

Page 11: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Wrappers in Mediator-Based Systems

Page 12: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Introduction

Mediator

Wrapper Wrapper

Source 1 Source 2 Query

Result

Page 13: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

The wrapper(extractor) consists of:

One or more predefined queries (based on source) SQL Web page

Suitable communication mechanism for sending and receiving information to/from source/mediator.(a) Pass ad-hoc queries to the source,(b) Receive responses from the source, and(c) Pass information to the warehouse.

Wrapper

Page 14: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Template for Query PatternsA systematic way to design a wrapper that connects a mediator to a source is toclassify the possible queries that the mediator can ask into templates, which arequeries that represent constantsDesign a wrapper – Build templates for all possible queries that the mediator can ask.

Mediator schema: AutosMed (serialNo,model,color,autoTrans,dealer)Source schema: Cars (serialNo,model,color,autoTrans,navi,…)

Mediator -> wrapper for cars of a given color ($c):

SELECT *FROM AutoMedWHERE color = ‘$c’;

=> SELECT serialNo,model,color,autoTrans,’dealer1’FROM CarsWHERE color = ‘$c’;

Wrapper Template describing queries for cars of a given color

Templates needed: Pow (2,n) for n attributes For all possible queries from the mediator

Page 15: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Wrapper Generators

The software that creates the wrapper is Wrapper Generator.

Driver

Table

Source

Templates

WrapperGenerator

Queries

Results

Wrapper

Page 16: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Wrapper Generators

Wrapper Generator:

Creates a table that holds the various query patterns contained in templates. Source queries associated with each of them.

The Driver: Accept a query from the mediator : The communication mechanism maybe mediator-specific and is given to the driver as a "plug-in," so the samedriver can be used in systems that communicate differently. Search the table for a template that matches the query: If one is found,then the parameter values from the query are used to instantiate a sourcequery. If there is no matching template, the wrapper responds negativelyto the mediator. Send the query to the source: The source query is sent to the source, again using a "plug-in" communication mechanisrm. The response is collected by the wrapper. Return the response to the Mediator.

Page 17: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Filters

Consider the Car dealer’s database. The Wrapper template to get the cars of a given model and color is:

SELECT * FROM AutoMedWHERE model = ‘$m’ and color = ‘$c’;=>SELECT serialNo,model,color,autoTrans,’dealer1’FROM CarsWHERE model = ‘$m’ and color = ‘$c’;

Another approach is to have a Wrapper Filter:

The Wrapper has a template that returns a superset of what the query wants. Filter the returned tuples at the Wrapper and pass only the desired tuples.

Position of the Filter Component:

At the Wrapper At the Mediator

Page 18: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Filters

To find the blue cars of model Ford:

Use the template to extract the blue cars. Return the tuples to the Mediator. Filter to get the Ford model cars at the Mediator.

Store at the temporary relation:TempAutos (serialNo,model,color,autoTrans,dealer)

Filter by executing a local query:

SELECT * FROM TempAutosWHERE model = ‘FORD’;

Page 19: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Other Operations at the Wrapper

It is possible to take the joins at the Wrapper and transmit the result to Mediator.

Suppose the Mediator is asked to find dealers and models such that the dealer has two red cars, of the same model, one with and one without automatic transmission:

SELECT A1.model, A1.dealerFROM AutosMed A1, AutosMed A2WHERE A1.model = A2.model AND A1.color = ‘red’ AND A2.color = ‘red’ AND A1.autoTrans = ‘no’ and A2.autoTrans = ‘yes’;

Wrapper can first obtain all the red cars:

SELECT * FROM AutosMedWHERE color = ‘red’;

RedAutos (serialNo,model,color,autoTrans,dealer)

Page 20: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Other Operations at the Wrapper

The Wrapper then performs a join and the necessary selection.

SELECT DISTINCT A1.model, A1.dealerFROM RedAutos A1, RedAutos A2WHERE A1.model = A2.model AND A1.autoTrans = ‘no’ AND A2.autoTrans = ‘yes’;

Page 21: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Three-phase locking (3pl)

by jasim qazi [121]

Page 22: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Problems with 2PL

• In database and transaction processing, two-phase locking, (2PL) is a concurrency control locking protocol, or mechanism, which guarantees serializability. It is also the name of the resulting class (set) of transaction schedules. Using locks that block processes, 2PL may be subject to deadlocks that result from the mutual blocking of two transactions or more.

• 2PL has performance issues when dealing with frequently-changing data in multi user environments:– Deadlocks: multiple users so concurrency a major issue.

Added to the number of data access, deadlocks occur.Deadlock occurs because several transactions are forced by the

scheduler to wait for a lock held by the other transaction.

Page 23: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Problems with 2PL(UPDATE)

Cascading Rollbacks: in this case if an error occurs after the Read(A,a) operation of T1, then T1 will have to rollbackand start over. Since T2 and T3 are waiting for the result of T1, these two will also be rolled back till T1 executes correctly.

T1 T2 T3Lock(A)Read(A,a)Lock(B)Read(B,b)Write(A,a)Unlock(A)

Lock(A)Read(A,a) Write(A,a)Unlock(A)

Lock(A)Read(A,a)

Page 24: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

3PL

• Not a necessity, not used a lot.• Procedures available to solve 2PL’s problems.• Dependent on the nature of the data and the

environment: changing data, many users.• Types of locks used: Read, Write, Write Intent.

Page 25: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Scenario

User signs in and requests data from the database.

Read lock is applied to the data.Once the data is read, the Read lock is dropped.There can be multiple Read locks by multiple

users on the same data/tuple.

Page 26: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Scenario

When the user indicates that he wishes to edit the data, take out a WRITE-INTENT lock.

UPDATE:Write Intent Lock also known as Change Lock or Protect Lock.

Other users can still obtain a Read Lock on the data.Write Lock allowed only to the user who has the

Write Intent Lock.If data locked with Write Intent Lock, then no

further Write Intent Locks can be applied on it.

Page 27: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Scenario

• When user finishes editing the data and submits the changes, immediately Write Lock is applied on the data.

• Write Intent Lock is unlocked.• Transaction is committed and Write Lock is

unlocked.

Page 28: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

3PL

All locks are subject to timeouts, with appropriate actions (unlocks, error/warning messages to user etc) taken in the event of lock failure. This prevents deadlocks.

All access to the data in question should use the same locking protocol. No other protocol should be applied or considered as it may disrupt the flow of operations.

UPDATE:Only WRITE-INTENT locks can be held for any length of time, typically because the record of interest is at the mercy of the user in edit mode...

Page 29: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Final words…

• This is actually not dissimilar to the way many "modern" relational databases handle locking,

• The average RDBMS can't support the degree of control involved here

• 3PL developed out of need.

Page 30: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Secondary Storage Management

30

Submitted by: Sathya Anandan(ID:123)

Page 31: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Topics Covered in this Presentation:

The Memory Hierarchy1.The Memory Hierarchy2.Transfer of Data Between Levels3.Volatile and Nonvolatile Storage4.Virtual Memory

Disks1.Mechanics of Disks2.The Disk Controller3.Disk Access Characteristics

31

Page 32: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Secondary Storage Management:

• Database systems always involve secondary storage like the disks and other devices that store large amount of data that persists over time.

32

Page 33: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

The Memory Hierarchy:

• A typical computer system has several different components in which data may be stored.

• These components have data capacities ranging over at least seven orders of magnitude and also have access speeds ranging over seven or more orders of magnitude.

33

Page 34: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

The Memory hierarchy from the text book as follows:

34

Page 35: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Cache:

• It is the lowest level of the hierarchy is a cache. Cache is found on the same chip as the microprocessor itself, and additional level-2 cache is found on another chip.

• Data and instructions are moved to cache from main memory when they are needed by the processor.

• Cache data can be accessed by the processor in a few nanoseconds.

35

Page 36: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Main Memory:

• In the center of the action is the computer's main memory. We may think of everything that happens in the computer - instruction executions and data manipulations - as working on information that is resident in main memory

• Typical times to access data from main memory to the processor or cache are in the 10-100 nanosecond range

36

Page 37: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Secondary Storage:

• Essentially every computer has some sort of secondary storage, which is a form of storage that is both significantly slower and significantly more capacious than main memory.

• The time to transfer a single byte between disk and main memory is around 10 milliseconds.

37

Page 38: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Tertiary Storage:

• As capacious as a collection of disk units can be, there are databases much larger than what can be stored on the disk(s) of a single machine, or even of a substantial collection of machines.

• Tertiary storage devices have been developed to hold data volumes measured in terabytes.

38

Page 39: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

39

Tertiary storage is characterized by significantly higher read/write times than secondary storage, but also by much larger capacities and smaller cost per byte than is available from magnetic disks.

Page 40: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Transfer of Data Between Levels:

Normally, data moves between adjacent levels of the hierarchy.

At the secondary and tertiary levels, accessing the desired data or finding the desired place to store data takes a great deal of time, so each level is organized to transfer large amount of data or from the level below, whenever any data at all is needed.

40

Page 41: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

41

The disk is organized into disk blocks and the entire blocks are moved to or

from a continuous section of main memory called a buffer.

Page 42: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Volatile and Nonvolatile Storage:

• A volatile device "forgets" what is stored in it when the power goes off.

• A nonvolatile device, on the other hand, is expected to keep its contents intact even for long periods when the device is turned off or there is a power failure.

42

Page 43: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

43

Magnetic and optical materials hold their data in the absence of power.

Thus, essentially all secondary and tertiary storage devices are nonvolatile.

On the other hand main memory is generally volatile.

Page 44: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Virtual Memory:

• When we write programs the data we use, variables of the program, files read and so on occupies a virtual memory address space.

• Many machines use a 32-bit address space; that is, there are 2(pow)32 bytes or 4 gigabytes.

44

Page 45: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

45

The Operating System manages virtual memory, keeping some of

it in main memory and the rest on disk.

Transfer between memory and disk is in units of disk blocks.

Page 46: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Disks:

• The use of secondary storage is one of the important characteristics of a DBMS, and secondary storage is almost exclusively based on magnetic disks

46

Page 47: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Mechanics of Disks:

• The two principal moving pieces of a disk drive are a disk assembly and a head assembly.

• The disk assembly consists of one or more circular platters that rotate around a central spindle

• The upper and lower surfaces of the platters are covered with a thin layer of magnetic material, on which bits are stored.

47

Page 48: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

A typical disk format from the text book is shown as below:

48

Page 49: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

0’s and 1’s are represented by different patterns in the magnetic material.

A 0 is represented by orienting the magnetism of asmall area in one direction and a 1 by orienting the

magnetism in the oppositedirection

A common diameter for the disk platters is 3.5 inches.

The disk is organized into tracks, which are concentric circles on a single platter.

The tracks that are at a fixed radius from a center, among all the surfaces form one cylinder. 49

Page 50: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Top View of a disk surface from the text is as shown below:

50

Page 51: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Tracks are organized into sectors, which are segments of the circle separated by gaps that are

magnetized to represent either 0’s or 1’s. The second movable piece the head assembly, holds

the disk heads.1 track consists of many points, each of which

represents a single bit by thedirection of its magnetism.

51

Page 52: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

The Disk Controller:

• One or more disk drives are controlled by a disk controller, which is a small processor capable of:

• Controlling the mechanical actuator that moves the head assembly to position the heads at a particular radius.

• Transferring bits between the desired sector and the main memory.

52

Page 53: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Selecting a surface from which to read or write, and selecting a sector from the track on that

surface that is under the head.An example of single processor is shown in next

slide.

53

Page 54: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Simple computer system from the text is shown below:

54

Page 55: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Disk Access Characteristics:

• Seek Time: The disk controller positions the head assembly at the cylinder containing the track on which the block is located. The time to do so is the seek time.

• Rotational Latency: The disk controller waits while the first sector of the block moves under the head. This time is called the rotational latency.

55

Page 56: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Transfer Time: All the sectors and the gaps between them pass under the head, while the disk controller reads or writes data in these sectors. This delay is

called the transfer time.The sum of the seek time, rotational latency, transfer

time is the latency of the time.If a disk has 250,000 bytes per

track and rotates once in 10 milliseconds, we can read from the disk at

25 megabytes per second. The transfer time for a 16.384-byte block is

around two-thirds of a millisecond. 56

Page 57: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

DATABASE SYSTEM PRINCIPLES

THE QUERY COMPILERDATABASE SYSTEMS – The Complete Book

(Diagram 16.16 & Diagram 16.18)

BY UNDER THE SUPERVISION OF POOJA SABNIS DR. T. Y. LINCS 257- 01ROLL NUMBER : 126

Page 58: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Problem Statement

• To verify the claim given in Figure 16.18 using the Figure 16.16 – The figures are from the 2nd Edition of Database

Systems – The Complete Book– Figure 16.16 – Page 815 Figure 16.18 – Page 816

Page 59: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

About Figure 16.16 The tree in the figure depicts the following Query :

Find Movies with Stars born in 1960.

Database Tables with Attributes : StarsIn(movieTitle, movieYear, starName)MovieStar(name, address, gender, birthdate)

SQL Query : SELECT movieTitle FROM StarsIn WHERE starName IN (SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’);

Page 60: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

About Figure 16.16 (continued)

• Figure 16.16 is obtained from Figure 16.14 (Page 813)

• This is done by applying the Rule which handles the two-argument selection with a condition involving IN

• In this query the subquery is uncorrelated.

Page 61: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Conversion of Figure 16.14 to Figure16.16

Page 62: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

About Figure 16.18 The tree in the figure depicts the following Query :

Find the movies where the average age of the stars was at most 40 when the movie was made.

SQL Query :SELECT DISTINCT m1.movieTitle, m1.movieYear FROM StarsIn m1 WHERE m1.movieYear – 40 <= (SELECT AVG(birthdate) FROM StarsIn m2, MovieStar s WHERE m2.starName = s.name AND m1.movieTitle = m2.movieTitle AND m1.movieYear = m2.movieYear);

This query is an example of correlated sub query.

Page 63: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Rules for Translating a Correlated Sub Query to Relational Algebra

Correlated sub queries involves unknown values defined outside themselves. Thus they cannot be translated in isolation.

These types of sub queries need to be translated so that they produce a relation in which certain extra attributes appear. These attributes must later be compared with the externally defined attributes.

Conditions that relate attributes from the sub query to attributes outside are then applied to this relation. The extra attributes(which are no longer necessary) can be projected out.

In this strategy, care should be taken of not forming duplicate tuples at the end.

Page 64: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Tree in Figure 16.18

Page 65: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Steps of Formation of Tree in Figure 16.18

The tree in figure 16.18 is formed by parsing of the query and partial translation to relational algebra.

The WHERE-clause of the sub query is split into two. It is used to covert the product of relations to an equijoin.

The aliases m1, m2 and s are made the nodes of the tree.

Page 66: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Query Compiler[Figure 16.16 & Figure16.18]

CS 257 [section 1]

Aakanksha Pendse.Roll number : 127

Page 67: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Problem Statement

To verify the claim given in the figure 16.18 using the figure 16.16

Page 68: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

What is figure 16.16?The tree in this diagram represents the following query.◦ “find movies with movie-stars born in 1960”

Database tables with its attributes are:◦ StarsIn(movieTitle, movieYear, starName)◦ MovieStar(name, address, gender, birthdate)

SQL Query:SELECT movieTitle FROM StarsIn WHERE starName IN (

SELECT name FROM MovieStar WHERE birthdate LIKE ‘%1960’

);

Page 69: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

This diagram (fig.16.14) is derived from figure 16.14

This is done by applying the Rule which handles the two-argument selection with a condition involving IN

In this query the sub-query is uncorrelated.

Page 70: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Figure 16.14

Page 71: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Figure 16.16 : applying the rule for IN condition

Page 72: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Rule

Page 73: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

What is figure 16.18?The tree in this diagram represents the following

query.◦ “Find the movies where the average age of the stars was at

most 40 when the movie was made.”

SQL Query :SELECT DISTINCT m1.movieTitle, m1.movieYear FROM StarsIn m1 WHERE m1.movieYear – 40 <= (SELECT AVG(birthdate) FROM StarsIn m2, MovieStar s WHERE m2.starName = s.name AND m1.movieTitle = m2.movieTitle AND m1.movieYear = m2.movieYear);

This is a co-related sub-query.

Page 74: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Rules for translating a co-related sub-query to relational algebra.

Correlated sub queries contains unknown values defined outside themselves.

Because of this reason, co-related sub-queries cannot be translated in isolation.

These types of sub queries need to be translated so that they produce a relation in which certain extra attributes appear. These attributes must later be compared with the externally defined attributes.

Conditions that relate attributes from the sub query to attributes outside are then applied to this relation. The extra attributes which are not necessary, can be projected out.

In this strategy, care should be taken of not forming duplicate tuples at the end.

Page 75: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Steps of formation of tree in figure 16.18

The tree in figure 16.18 is formed by parsing of the query and partial translation to relational algebra.

The WHERE-clause of the sub query is split into two. It is used to covert the product of relations to an equijoin

The aliases m1, m2 and s are made the nodes of the tree.

Page 76: Algorithms Using More Than Two Passes. The relations of arbitrary size can be processed by using many passes as necessary.  Multipass Sort-Based Algorithms

Figure 16.18 : partially transformed parse tree