correlated update vs merge

7
Last week I was frustrated by the performance of Update in the Reporting SQL. It was a Correlated Update on a table (let’s say Target table) with around 21K records. The source table is equal to that target table and without index on it. The update SQL is like below: UPDATE TMP_ACT1_20130310 sub SET NAME = (SELECT DISTINCT NAME FROM TMP_ADDR1_20130310 tmp2 WHERE tmp2.subscriber_no = sub.subscriber_no AND NAME IS NOT NULL) The sub-query will be executed as much as the number of row in the outer/ driving table. With 21K records, Oracle will do FTS on the TMP_ADDR1_20130310 21K times. That is a HUGE number, and for sure a HUGE I/O. For this case, we can simply create index on the inner table to reduce the I/O and speed up the execution time. My headache was gone after creating index on TMP_ ADDR1_20130310. Mission accomplished!!! Further investigation, actually there are few other different ways when we are dealing with an Update statement. In this exercise, I created a small comparison between below 3 update methods (so far that I know) with Full Table Scan and also Index Scan: - Correlated Update - Merge - Update from Select Please go through the Excel sheet for the number Some highlighted items: 1. “Correlated Update” is not good at all if we don’t have index access on the inner table. It will produces huge I/O since we have to FTS the inner table over and over again (as much as the number of rows in the outer table)

Upload: heribertus-bramundito

Post on 17-Jun-2015

144 views

Category:

Technology


0 download

DESCRIPTION

comparison between performance of correlated update and merge

TRANSCRIPT

Page 1: Correlated update vs merge

Last week I was frustrated by the performance of Update in the Reporting SQL. It was a Correlated Update on a table (let’s say Target table) with around 21K records. The source table is equal to that target table and without index on it.

The update SQL is like below:

UPDATE TMP_ACT1_20130310 sub SET NAME =

(SELECT DISTINCT NAME FROM TMP_ADDR1_20130310 tmp2 WHERE tmp2.subscriber_no = sub.subscriber_no AND NAME IS NOT NULL)

The sub-query will be executed as much as the number of row in the outer/ driving table. With 21K records, Oracle will do FTS on the TMP_ADDR1_20130310 21K times. That is a HUGE number, and for sure a HUGE I/O.

For this case, we can simply create index on the inner table to reduce the I/O and speed up the execution time. My headache was gone after creating index on TMP_ ADDR1_20130310. Mission accomplished!!!

Further investigation, actually there are few other different ways when we are dealing with an Update statement. In this exercise, I created a small comparison between below 3 update methods (so far that I know) with Full Table Scan and also Index Scan:

- Correlated Update- Merge- Update from Select

Please go through the Excel sheet for the number

Some highlighted items:

1. “Correlated Update” is not good at all if we don’t have index access on the inner table. It will produces huge I/O since we have to FTS the inner table over and over again (as much as the number of rows in the outer table)

2. If the join condition is not matched, “Correlated Update” will update the row with empty string. We need to pay attention on this (please check “Data Changes” section below)It will produce more Undo for those additional update

3. If the join condition is not matched, both “Merge” and “Update from Select” method will leave the row as is. To me it is more reasonable

4. With “Update from Select” method, we need to create unique index on the inner table, otherwise we will ended up with ORA- error

5. “Update from Select” method is little bit superior compare to the other 2 methods when we update small percentage of outer table and there is index on the outer table, so that we can get benefit from Index Range Scan

Page 2: Correlated update vs merge

Sample Data

SQL> create table ttarget 2 pctfree 99 3 as 4 select rownum id, lpad(chr(rownum), 70) data 5 from dual 6 connect by rownum <= 1000;

Table created.

SQL> create table tsource 2 pctfree 99 3 as 4 select rownum id, lpad(chr(rownum)||'_CHG', 70) data 5 from dual 6 connect by rownum <= 500;

Table created.

SQL> exec dbms_stats.gather_table_stats(USER, 'TTARGET');

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(USER, 'TSOURCE');

PL/SQL procedure successfully completed.

SQL> select table_name, blocks, empty_blocks, num_rows 2 from user_tables 3 where table_name in ('TTARGET', 'TSOURCE');

TABLE_NAME BLOCKS EMPTY_BLOCKS NUM_ROWS------------------------------ ---------- ------------ ----------TSOURCE 518 0 500TTARGET 1024 0 1000

SQL> create unique index idx_tsource on tsource(id);

Index created.

SQL> select leaf_blocks, distinct_keys, avg_leaf_blocks_per_key, avg_data_blocks_per_key, clustering_factor, uniqueness from user_indexes where index_name = 'IDX_TSOURCE';

LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR UNIQUENES----------- ------------- ----------------------- ----------------------- ----------------- --------- 1 500 1 1 500 UNIQUE

SQL> create index idx_ttarget on ttarget (id);

Index created.

Page 3: Correlated update vs merge
Page 4: Correlated update vs merge

Data Changes

Correlated UpdateFor un-matched rows, we can see DATA is CHANGED

SQL> select id, trim(data) from ttarget where id > 500 and rownum < 3;

ID TRIM(DATA)---------- -------------------- 501 ) 502 ÷

SQL> select count(*) from ttarget where data like '%CHG%';

COUNT(*)---------- 0

SQL> update ttarget a set data = (select data from tsource b where a.id=b.id);

1000 rows updated.

SQL> select count(*) from ttarget where data like '%CHG%';

COUNT(*)---------- 500

SQL> select id, trim(data) from ttarget where id > 500 and rownum < 3;

ID TRIM(DATA)---------- -------------------- 501 502

MergeFor un-matched rows, we can see data is not changed

SQL> select id, trim(data) from ttarget where id > 500 and rownum < 3;

ID TRIM(DATA)---------- -------------------- 501 ) 502 ÷

SQL> select count(*) from ttarget where data like '%CHG%';

COUNT(*)---------- 0

SQL> merge into ttarget a 2 using tsource b 3 on (a.id=b.id) 4 when matched then 5 update set a.data=b.data;

500 rows merged.

Page 5: Correlated update vs merge

SQL> select count(*) from ttarget where data like '%CHG%';

COUNT(*)---------- 500

SQL> select id, trim(data) from ttarget where id > 500 and rownum < 3;

ID TRIM(DATA)---------- -------------------- 501 ) 502 ÷

Update from SelectFor un-matched rows, we can see data is not changed

SQL> select id, trim(data) from ttarget where id > 500 and rownum < 3;

ID TRIM(DATA)---------- -------------------- 501 ) 502 ÷

SQL> select count(*) from ttarget where data like '%CHG%';

COUNT(*)---------- 0

SQL> update 2 (select a.data dtarget, b.data dsource 3 from ttarget a, tsource b 4 where a.id=b.id) c 5 set c.dtarget=c.dsource;

500 rows merged.

SQL> select count(*) from ttarget where data like '%CHG%';

COUNT(*)---------- 500

SQL> select id, trim(data) from ttarget where id > 500 and rownum < 3;

ID TRIM(DATA)---------- -------------------- 501 ) 502 ÷