executing joins dynamically in ddbs query optimizer

1

EXECUTING JOINS DYNAMICALLY IN DDBS QUERY OPTIMIZER

Er. Shiva K. Shrestha(15957)

ME COMPUTER

December 26, 2016

Paper Presentation

2

Paper Abstract■ Data transmission required to use join in multiple sites■ Factors:

– Communication Cost– Amount of data transmitted

■ To minimize these factors, join operation is used■ Two cases considered in this paper:

– query processing using join– query processing using semi join

■ amount of data transfer in case of join is more than in case of semi join

■ sub operations are executed dynamically to improve the comm. cost

December 26, 2016

3

Basic Introduction■ Distributed Processing includes

– increase reliability, – Availability & localization and – reduce communication costs

■ Parameters – high query response time, – sites to access queries

■ database system performance is effective depends on join operator

■ cost of distributed query = processing cost + transmission cost

■ optimizer must consider efficient order in which tables are joined in such a way that communication overhead has cut down

December 26, 2016

4

Query Processing & Query Optimization■ Distributed query processing phases

– Local processing phase – Reduction phase – Final processing phase

■ Total distributed execution cost is – Total processing cost (local processing cost involved in all sites)

and communication cost– Local processing cost = CPU cycles + disk I/O– Communication cost factors:

■ data exchanged, ■ no. of messages transferred, ■ best site choose for query execution and ■ communication network

December 26, 2016

5

Objectives of Joins in Distributed DBs■ to transfer the data as fast as

possible in order to improve join query performance

■ two basic join query execution methods– to transfer the smaller table of

two join query participating tables

– to transfer two tables in parallel

December 26, 2016

6

Related Works■ Semi-join is beneficial if transmission cost is main otherwise

join will be preferred■ Need to develop heuristic approach for solving query

optimization problem■ Multi-relation semi-join reduces data volume & reduces n/w

comm. cost■ Query methods can directly affect the execution speed of

system■ Heuristic based query optimization is a better one

December 26, 2016

7

Experimental Analysis■ According to Fig. 1,

– the size of the EMPLOYEE relation is 100 * 10,000 = 10,00,000 bytes,

– and the size of the DEPARTMENT relation is 35 * 100 = 3500 bytes

December 26, 2016

8

Distributed Query Processing using Join

December 26, 2016

9

Distributed Query Processing using Join (contd...)■ Consider the query “for each employee retrieve the employee name

and the name of department for which the employee works”

■ The result of query includes 10,000 records assuming each employee is related to department and consider each record in the query is 40 bytes long.

■ The query is submitted at site 3, which is the resultant site as the query result is required here. The original query that extract data from table EMP and table DEP can be executed and implemented in three different ways

December 26, 2016

10

Distributed Query Processing using Join (contd...)■ CASE I

– 10,00,000 + 3,500 = 10,03,500 bytes

■ CASE II– 4,00,000+10,00,000 =

14,00,000 bytes– 4,00,000+3,500 = 4,03,500

bytes■ CASE III

– 4,00,000 + 3,500 = 4,03,500 bytes

December 26, 2016

11

Distributed Query Processing using Semi join■ The idea behind distributed query processing using the semi join

operation is to reduce the number of tuples in a relation before transferring it to another site– Project the join attributes of DEP at site 2 and transfer them at

site 1– For Q, there is a transfer of f= πDNUMBER(DEPARTMENT) whose

size = 4*100 = 400 bytes– Join the transferred file with the EMP relation at site 1 and transfer

the required attributes from the resulting file to site 2 – For Q, there is a transfer R= whose size = 34*10000 = 3,40,000

bytes

December 26, 2016

12

Conclusion & Results

December 26, 2016

■ Data is physically distributed among geographically different locations, when there is need to join the data between sites data has to be transmitted from one site to other

■ Sub operations (Join & Semi-join) are used to determine data volume

■ Sub operations are decided dynamically in distributed optimizer so that cost can be reduced maximum

Analysis of Data Transmission

13

Thank You !

■ Q/A ?

December 26, 2016

executing joins dynamically in ddbs query optimizer

Engineering