executing joins dynamically in ddbs query optimizer
TRANSCRIPT
![Page 1: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/1.jpg)
1
EXECUTING JOINS DYNAMICALLY IN DDBS QUERY OPTIMIZER
Er. Shiva K. Shrestha(15957)
ME COMPUTER
December 26, 2016
Paper Presentation
![Page 2: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/2.jpg)
2
Paper Abstract■ Data transmission required to use join in multiple sites■ Factors:
– Communication Cost– Amount of data transmitted
■ To minimize these factors, join operation is used■ Two cases considered in this paper:
– query processing using join– query processing using semi join
■ amount of data transfer in case of join is more than in case of semi join
■ sub operations are executed dynamically to improve the comm. cost
December 26, 2016
![Page 3: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/3.jpg)
3
Basic Introduction■ Distributed Processing includes
– increase reliability, – Availability & localization and – reduce communication costs
■ Parameters – high query response time, – sites to access queries
■ database system performance is effective depends on join operator
■ cost of distributed query = processing cost + transmission cost
■ optimizer must consider efficient order in which tables are joined in such a way that communication overhead has cut down
December 26, 2016
![Page 4: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/4.jpg)
4
Query Processing & Query Optimization■ Distributed query processing phases
– Local processing phase – Reduction phase – Final processing phase
■ Total distributed execution cost is – Total processing cost (local processing cost involved in all sites)
and communication cost– Local processing cost = CPU cycles + disk I/O– Communication cost factors:
■ data exchanged, ■ no. of messages transferred, ■ best site choose for query execution and ■ communication network
December 26, 2016
![Page 5: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/5.jpg)
5
Objectives of Joins in Distributed DBs■ to transfer the data as fast as
possible in order to improve join query performance
■ two basic join query execution methods– to transfer the smaller table of
two join query participating tables
– to transfer two tables in parallel
December 26, 2016
![Page 6: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/6.jpg)
6
Related Works■ Semi-join is beneficial if transmission cost is main otherwise
join will be preferred■ Need to develop heuristic approach for solving query
optimization problem■ Multi-relation semi-join reduces data volume & reduces n/w
comm. cost■ Query methods can directly affect the execution speed of
system■ Heuristic based query optimization is a better one
December 26, 2016
![Page 7: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/7.jpg)
7
Experimental Analysis■ According to Fig. 1,
– the size of the EMPLOYEE relation is 100 * 10,000 = 10,00,000 bytes,
– and the size of the DEPARTMENT relation is 35 * 100 = 3500 bytes
December 26, 2016
![Page 8: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/8.jpg)
8
Distributed Query Processing using Join
December 26, 2016
![Page 9: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/9.jpg)
9
Distributed Query Processing using Join (contd...)■ Consider the query “for each employee retrieve the employee name
and the name of department for which the employee works”
■ The result of query includes 10,000 records assuming each employee is related to department and consider each record in the query is 40 bytes long.
■ The query is submitted at site 3, which is the resultant site as the query result is required here. The original query that extract data from table EMP and table DEP can be executed and implemented in three different ways
December 26, 2016
![Page 10: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/10.jpg)
10
Distributed Query Processing using Join (contd...)■ CASE I
– 10,00,000 + 3,500 = 10,03,500 bytes
■ CASE II– 4,00,000+10,00,000 =
14,00,000 bytes– 4,00,000+3,500 = 4,03,500
bytes■ CASE III
– 4,00,000 + 3,500 = 4,03,500 bytes
December 26, 2016
![Page 11: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/11.jpg)
11
Distributed Query Processing using Semi join■ The idea behind distributed query processing using the semi join
operation is to reduce the number of tuples in a relation before transferring it to another site– Project the join attributes of DEP at site 2 and transfer them at
site 1– For Q, there is a transfer of f= πDNUMBER(DEPARTMENT) whose
size = 4*100 = 400 bytes– Join the transferred file with the EMP relation at site 1 and transfer
the required attributes from the resulting file to site 2 – For Q, there is a transfer R= whose size = 34*10000 = 3,40,000
bytes
December 26, 2016
![Page 12: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/12.jpg)
12
Conclusion & Results
December 26, 2016
■ Data is physically distributed among geographically different locations, when there is need to join the data between sites data has to be transmitted from one site to other
■ Sub operations (Join & Semi-join) are used to determine data volume
■ Sub operations are decided dynamically in distributed optimizer so that cost can be reduced maximum
Analysis of Data Transmission
![Page 13: Executing Joins Dynamically in DDBS Query Optimizer](https://reader036.vdocuments.site/reader036/viewer/2022083021/58a817201a28ab4d148b4abb/html5/thumbnails/13.jpg)
13
Thank You !
■ Q/A ?
December 26, 2016