1 what’s up with dbms_stats? terry sutton database specialists, inc

87
1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc. www.dbspecialists.com

Upload: augustine-goodman

Post on 01-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

1

What’s Up with dbms_stats?

Terry SuttonDatabase Specialists,

Inc.www.dbspecialists.com

Page 2: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

2

Session Objectives Examine some of the statistics-gathering

options and their impact.

Focus on actual experience.

Learn why to choose various options when gathering statistics.

Page 3: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

3

Statistics are Important!

As the optimizer gets more sophisticated in each version of Oracle, the importance of accurate statistics increases.

Page 4: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

4

DBMS_STATS Procedures Package contains over 40 procedures,

including:– Deleting existing statistics for table, schema, or

database– Setting statistics to desired values– Exporting and importing statistics– Gathering statistics for a schema or entire

database– Monitoring tables for changes

Page 5: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

5

DBMS_STATS Procedures We will focus on:DBMS_STATS.GATHER_TABLE_STATS DBMS_STATS.GATHER_INDEX_STATS

The starting points for getting statistics so the optimizer can make informed decisions

Page 6: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

6

DBMS_STATS.GATHER_TABLE_STATS

DBMS_STATS.GATHER_TABLE_STATS ( ownname VARCHAR2, tabname VARCHAR2, partname VARCHAR2 DEFAULT NULL, estimate_percent NUMBER DEFAULT NULL, block_sample BOOLEAN DEFAULT FALSE, method_opt VARCHAR2 DEFAULT 'FOR ALL COLUMNS SIZE 1', degree NUMBER DEFAULT NULL, granularity VARCHAR2 DEFAULT 'DEFAULT', cascade BOOLEAN DEFAULT FALSE, stattab VARCHAR2 DEFAULT NULL, statid VARCHAR2 DEFAULT NULL, statown VARCHAR2 DEFAULT NULL, no_invalidate BOOLEAN DEFAULT FALSE);

Page 7: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

7

DBMS_STATS.GATHER_INDEX_STATS

DBMS_STATS.GATHER_INDEX_STATS ( ownname VARCHAR2, indname VARCHAR2, partname VARCHAR2 DEFAULT NULL, estimate_percent NUMBER DEFAULT NULL, stattab VARCHAR2 DEFAULT NULL, statid VARCHAR2 DEFAULT NULL, statown VARCHAR2 DEFAULT NULL, degree NUMBER DEFAULT NULL, granularity VARCHAR2 DEFAULT 'DEFAULT', no_invalidate BOOLEAN DEFAULT FALSE);

Page 8: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

8

We’re Going to Test: estimate_percent block_sample method_opt cascade

These are the parameters which address statistics accuracy and performance.

Page 9: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

9

estimate_percent The percentage of rows to estimate Null means compute Use DBMS_STATS.AUTO_SAMPLE_SIZE to

have Oracle determine the best sample size for good statistics

Page 10: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

10

block_sample Determines whether or not to use random

block sampling instead of random row sampling.

"Random block sampling is more efficient, but if the data is not randomly distributed on disk, then the sample values may be somewhat correlated."

Page 11: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

11

method_opt Determines whether to collect histograms to

help in dealing with skewed data. FOR ALL COLUMNS, FOR ALL INDEXED

COLUMNS, or a COLUMN list determine which columns, and SIZE determines how many histogram buckets.

Page 12: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

12

method_opt Use SKEWONLY for SIZE, to have Oracle

determine the columns on which to collect histograms based on their data distribution.

Use AUTO for SIZE, to have Oracle determine the columns on which to collect histograms based on data distribution and workload.

Page 13: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

13

cascade Determines whether to gather statistics on

indexes as well.

Page 14: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

14

Our Approach Try different values for these parameters and

look at the impact on accuracy of statistics and performance of the statistics gathering job itself.

Page 15: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

15

Our Data (Table 1)FILE_HISTORY [1,951,673 rows 28507 blocks, 223MB ] Column Name Null? Type Distinct Values ------------------------ -------- ------------- --------------- FILE_ID NOT NULL NUMBER 1951673 FNAME NOT NULL VARCHAR2(240) 1951673 STATE_NO NUMBER 6 FILE_TYPE NOT NULL NUMBER 7 PREF VARCHAR2(100) 65345 CREATE_DATE NOT NULL DATE TRACK_ID NOT NULL NUMBER SECTOR_ID NOT NULL NUMBER TEAMS NUMBER BYTE_SIZE NUMBER START_DATE DATE END_DATE DATE LAST_UPDATE DATE CONTAINERS NUMBER

Page 16: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

16

Our Data (Table 1)– Indexes

Table Unique? Index Name Column Name------------ ---------- --------------------------------------- ------------FILE_HISTORY NONUNIQUE TSUTTON.FILEH_FNAME FNAME

NONUNIQUE TSUTTON.FILEH_FTYPE_STATE FILE_TYPE STATE_NO

NONUNIQUE TSUTTON.FILEH_PREFIX_STATE PREF STATE_NO

UNIQUE TSUTTON.PK_FILE_HISTORY FILE_ID

Page 17: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

17

Our Data (Table 2)

PROP_CAT [11,486,321 rows 117705 blocks, 920MB ] Column Name Null? Type Distinct Values ------------------------ -------- ------------- --------------- LINENUM NOT NULL NUMBER(38) 11486321 LOOKUPID VARCHAR2(64) 40903 EXTID VARCHAR2(20) 11486321 SOLD NOT NULL NUMBER(38) 1 CATEGORY VARCHAR2(6) NOTES VARCHAR2(255) DETAILS VARCHAR2(255) PROPSTYLE VARCHAR2(20) 48936

Page 18: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

18

Our Data (Table 2)– Indexes

Table Unique? Index Name Column Name------------ ---------- ---------------------------------------- ---------------

PROP_CAT NONUNIQUE TSUTTON.PK_PROP_CAT EXTID SOLD

NONUNIQUE TSUTTON.PROPC_LOOKUPID LOOKUPID

NONUNIQUE TSUTTON.PROPC_PROPSTYLE PROPSTYLE

Page 19: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

19

To Find What Values the Statistics Gathering

Obtained:index.sql:

select ind.table_name, ind.uniqueness, col.index_name, col.column_name, ind.distinct_keys, ind.sample_sizefrom dba_ind_columns col, dba_indexes indwhere ind.table_owner = 'TSUTTON' and ind.table_name in ('FILE_HISTORY','PROP_CAT') and col.index_owner = ind.owner and col.index_name = ind.index_name and col.table_owner = ind.table_owner and col.table_name = ind.table_nameorder by col.table_name, col.index_name, col.column_position;

Page 20: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

20

tabcol.sql:select table_name, column_name, data_type, num_distinct, sample_size, to_char(last_analyzed, ' HH24:MI:SS') last_analyzed, num_buckets buckets from dba_tab_columns where table_name in ('FILE_HISTORY','PROP_CAT')order by table_name, column_id;

Page 21: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

21

The Old Days—ANALYZE A “Quick and Dirty” attempt:

SQL> analyze table file_history estimate statistics;

Table analyzed.

Elapsed: 00:00:08.26

SQL> analyze table prop_cat estimate statistics;

Table analyzed.

Elapsed: 00:00:14.76

Page 22: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

22

The Old Days—ANALYZE But someone complains that their query is

taking too long:

SQL> SELECT FILE_ID, FNAME, TRACK_ID, SECTOR_ID FROM file_history WHERE FNAME = 'SOMETHING';

no rows selectedElapsed: 00:00:08.66

Page 23: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

23

The Old Days—ANALYZE So we try it with autotrace on:Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2743 Card=10944

Bytes=678528) 1 0 TABLE ACCESS (FULL) OF 'FILE_HISTORY' (Cost=2743 Card=10944

Bytes=678528)Statistics---------------------------------------------------------- 0 recursive calls 0 db block gets 28519 consistent gets 28507 physical reads 0 redo size 465 bytes sent via SQL*Net to client 460 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed

Page 24: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

24

The Old Days—ANALYZE “It’s not using our index!” Let’s look at the index statistics:

Table Uniqueness Index Name Column Name Distinct Keys Sample Size------------ ---------- -------------------- ------------ ------------- ------------FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME 1,937,490 1106

NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE 12 1266 STATE_NO 12 1266

NONUNIQUE FILEH_PREFIX_STATE PREF 65,638 1053 STATE_NO 65,638 1053

UNIQUE PK_FILE_HISTORY FILE_ID 1,952,701 1347

DBA_INDEXES tells us we have 1,937,490 distinct keys for the index on FNAME, pretty close to the actual value of 1,951,673.

Page 25: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

25

The Old Days—ANALYZE But when we look at DBA_TAB_COLUMNSTABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE--------------- --------------- ---------- ------------ -----------FILE_HISTORY FILE_ID NUMBER 1948094 1034 FNAME VARCHAR2 178 1034 STATE_NO NUMBER 1 1034 FILE_TYPE NUMBER 7 1034 PREF VARCHAR2 478 1034

Only 178 distinct values for FNAME. The optimizer concludes that a full table scan will be more efficient. We’re also told there are only 478 distinct values for PREF, when we know there are 65,345.

Page 26: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

26

The Old Days—ANALYZE Let’s try a larger sample. The first test only sampled about 0.05% of the

rows. Let’s try 5% of the rows.

SQL> analyze table file_history estimate statistics sample 5 percent;

Table analyzed.

Elapsed: 00:00:36.21

SQL> analyze table prop_cat estimate statistics sample 5 percent;

Table analyzed.

Elapsed: 00:02:35.11

Page 27: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

27

The Old Days—ANALYZE We try our query:SQL> SELECT FILE_ID, FNAME, TRACK_ID, SECTOR_ID FROM file_history WHERE FNAME =

'SOMETHING';

no rows selected

Elapsed: 00:00:00.54

Execution Plan---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=54 Card=110 Bytes=6820) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'FILE_HISTORY' (Cost=54 Card=110 Bytes=6820) 2 1 INDEX (RANGE SCAN) OF 'FILEH_FNAME' (NON-UNIQUE) (Cost=3 Card=110)

Page 28: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

28

The Old Days—ANALYZE Our query, continued:

Statistics---------------------------------------------------------- 0 recursive calls 0 db block gets 3 consistent gets 0 physical reads 0 redo size 465 bytes sent via SQL*Net to client 460 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 0 rows processed

Much better! Let’s look at the stats.

Page 29: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

29

The Old Days—ANALYZE

Table Uniqueness Index Name Column Name Distinct Keys Sample Size------------ ---------- -------------------- ------------ ------------- ------------FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME 1,926,580 101179

NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE 8 102128 STATE_NO 8 102128

NONUNIQUE FILEH_PREFIX_STATE PREF 74,935 98709 STATE_NO 74,935 98709

UNIQUE PK_FILE_HISTORY FILE_ID 1,952,701 101025

TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE--------------- --------------- ---------- ------------ -----------FILE_HISTORY FILE_ID NUMBER 1951673 93010 FNAME VARCHAR2 11133 93010 STATE_NO NUMBER 5 93010 FILE_TYPE NUMBER 7 93010 PREF VARCHAR2 23744 93010

Page 30: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

30

The Old Days—ANALYZE 11,133 distinct values for FNAME, much

better than 178. 23,744 distinct values for PREF, much better

than 478. And our query is efficient!

But can we do better?

Page 31: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

31

The Old Days—ANALYZE How about a full “compute statistics”?

SQL> analyze table file_history compute statistics;

Table analyzed.

Elapsed: 00:07:38.32

SQL> analyze table prop_cat compute statistics;

Table analyzed.

Elapsed: 00:29:15.29

Page 32: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

32

The Old Days—ANALYZE And the statistics:

TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE--------------- --------------- ---------- ------------ -----------FILE_HISTORY FILE_ID NUMBER 1951673 1951673 FNAME VARCHAR2 78692 1951673 STATE_NO NUMBER 6 1951673 FILE_TYPE NUMBER 7 1951673 PREF VARCHAR2 65345 1951673

78692 distinct values for FNAME! After examining every row in the table!

Page 33: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

33

Is DBMS_STATS Better? Let’s start with a quick run, a 1% estimate and

cascade=true, so the indexes are analyzed also.

SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>1,cascade=>true)

PL/SQL procedure successfully completed.

Elapsed: 00:01:41.70

SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>1,cascade=>true)

PL/SQL procedure successfully completed.

Elapsed: 00:01:44.29

Page 34: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

34

And the Statistics:

TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE--------------- --------------- ---------- ------------ -----------FILE_HISTORY FILE_ID NUMBER 1937700 19377 FNAME VARCHAR2 1937700 19377 STATE_NO NUMBER 3 19377 FILE_TYPE NUMBER 7 19377 PREF VARCHAR2 6522 14342

Page 35: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

35

1% estimate, cascade=true Gathering statistics took 3:26. “Analyze estimate statistics” took 23 seconds. “Analyze estimate 5%” took 3:11. Stats are very accurate for FNAME. Stats are off for PREF (6522 vs. 65,345

actual)

Page 36: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

36

5% estimate, cascade=trueLet’s try 5%:

SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>5,cascade=>true) PL/SQL procedure successfully completed.

Elapsed: 00:01:23.52

SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>5,cascade=>true)

PL/SQL procedure successfully completed.

Elapsed: 00:03:08.75

Page 37: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

37

5% estimate, cascade=true

TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE--------------- --------------- ---------- ------------ -----------FILE_HISTORY FILE_ID NUMBER 1955420 97771 FNAME VARCHAR2 1955420 97771 PREF VARCHAR2 24222 72377

Page 38: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

38

5% estimate, cascade=true A 5% estimate took 4:32. PREF now shows 24,222 distinct values

(actual is 65,345). A 5% estimate using analyze took 3:11, but

only showed 11,133 distinct values for FNAME

Page 39: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

39

Full ComputeSQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'FILE_HISTORY',estimate_percent=>null,cascade=>true)

PL/SQL procedure successfully completed.

Elapsed: 00:09:35.13

SQL> EXECUTE dbms_stats.gather_table_stats (ownname=>'TSUTTON', tabname=>'PROP_CAT',estimate_percent=>null,cascade=>true)

PL/SQL procedure successfully completed.

Elapsed: 00:29:09.46

Page 40: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

40

Full ComputeTABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE LAST_ANALYZED--------------- --------------- ---------- ------------ ----------- -------------FILE_HISTORY FILE_ID NUMBER 1951673 1951673 14:19:30 FNAME VARCHAR2 1951673 1951673 14:19:30 PREF VARCHAR2 65345 1448208 14:19:30

PROP_CAT LOOKUPID VARCHAR2 40903 11486321 14:50:19 EXTID VARCHAR2 11486321 11486321 14:50:19 PROPSTYLE VARCHAR2 48936 11486321 14:50:19

Page 41: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

41

Full Compute Total time taken for a “DBMS_STATS

compute” was slightly longer than for an “ANALYZE compute” (38:45 vs. 36:54).

The stats are right on! There’s another interesting behavior…

Page 42: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

42

Full Compute – IndexesTable Uniqueness Index Name Column Name Distinct Keys Sample Size------------ ---------- -------------------- ------------ ------------- ------------FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME 2,019,679 131585

NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE 14 456182 STATE_NO 14 456182

NONUNIQUE FILEH_PREFIX_STATE PREF 16,365 428990 STATE_NO 16,365 428990

UNIQUE PK_FILE_HISTORY FILE_ID 1,951,673 1951673

PROP_CAT NONUNIQUE PK_PROP_CAT EXTID 10,995,615 373959 SOLD 10,995,615 373959

NONUNIQUE PROPC_LOOKUPID LOOKUPID 2,678 469772

NONUNIQUE PROPC_PROPSTYLE PROPSTYLE 3,434 504580

Page 43: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

43

Full Compute – IndexesWe’re doing a full compute, cascade.Sample size for the indexes ranges from 3% to 100%!

Page 44: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

44

block_sample = true

If >20 rows per block, visiting 5% of rows could visit every block.

So, is block_sample=true faster?

Page 45: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

45

block_sample = true, estimate = 5%, cascade =

true Results shown in summary table. Took 4:11 (block_sample=false took 4:32). Doesn’t seem “more efficient”. Less accurate.

Page 46: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

46

cascade = false People have suggested a small estimate for

table, compute for indexes. Let’s test 1% estimate for tables, compute for

indexes.

Page 47: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

47

Estimate = 1%, cascade = false

estimate = 1% for table, compute for indexes Took 2:59 (vs. 3:26 for cascade = true) As accurate as cascade = true Be careful– if your indexes change, you’ll

need to change your DBMS_STATS jobs.

Page 48: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

48

cascade = falseBUT— the sample sizes are still not 100%

Table Uniqueness Index Name Column Name Distinct Keys Sample Size------------ ---------- -------------------- ------------ ------------- ------------FILE_HISTORY NONUNIQUE FILEH_FNAME FNAME 1,961,200 127775

NONUNIQUE FILEH_FTYPE_STATE FILE_TYPE 13 467998 STATE_NO 13 467998

NONUNIQUE FILEH_PREFIX_STATE PREF 15,076 417975 STATE_NO 15,076 417975

UNIQUE PK_FILE_HISTORY FILE_ID 1,951,673 1951673

PROP_CAT NONUNIQUE PK_PROP_CAT EXTID 11,224,990 381760 SOLD 11,224,990 381760

NONUNIQUE PROPC_LOOKUPID LOOKUPID 2,666 459455

NONUNIQUE PROPC_PROPSTYLE PROPSTYLE 3,063 486558

Page 49: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

49

Summary of Tests So Farestimate_percent Block_sample Elapsed Time # Distinct

FNAME (1951673)

# Distinct PREF (65345)

# Distinct PROPSTYLE

(48936)

1%, cascade False 3:26 1937700 6522 16984

1% table, 20% indexes

False 2:44 1950100 6835 16932

1% table, compute indexes

False 2:59 1941600 6797 16917

5%, cascade False 4:32 1955420 24222 20095

5%, cascade True 4:11 2024540 15927 10092

5% table, 20% indexes

False 4:00 1956180 24188 20084

10%, cascade False 6:04 1962840 36172 23547

20%, cascade False 9:21 1950750 48271 29108

20%, cascade True 9:13 1885260 40653 21897

50%, cascade False 20:24 1950472 60804 39708

null (compute) , cascade

False 38:45 1951673 65345 48936

Page 50: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

50

Other Options

“What about all that ‘Auto’ stuff?” Two commonly referenced “auto” options:

– AUTO_SAMPLE_SIZE for estimate_percent.– method_opt=>'FOR ALL COLUMNS SIZE AUTO'

Page 51: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

51

DBMS_STATS.AUTO_SAMPLE_SIZE

This option tells Oracle to choose the proper sample size for estimate_percent.

Let’s test it.

Page 52: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

52

DBMS_STATS.AUTO_SAMPLE_SIZE

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true);end;/

Elapsed: 00:08:19.19

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true);end;/

Elapsed: 00:22:55.99

Page 53: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

53

DBMS_STATS.AUTO_SAMPLE_SIZE

TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE--------------- --------------- ---------- ------------ -----------FILE_HISTORY FILE_ID NUMBER 1969341 5841 FNAME VARCHAR2 1951673 1951673 STATE_NO NUMBER 6 1951673 FILE_TYPE NUMBER 7 1951673 PREF VARCHAR2 65345 1448208

Took 31:15. 7 ½ minutes less than “compute, cascade”. 10 minutes longer than “50%, cascade”, but not

much more accurate. Looks like it sampled nearly every row in the table

anyway.

Page 54: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

54

method_opt We tested different histogram options. First, FOR ALL INDEXED COLUMNS, which

seems to be the most commonly used choice. Note that:

– You’re unlikely to need histograms on all your indexed columns.

– You may well want histograms on some non-indexed columns.

– But it’s a good start for a test.

Page 55: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

55

method_opt- 'for all indexed columns'

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all indexed columns

size 30', cascade=>true);end;/

Elapsed: 00:01:35.63

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>10, method_opt=>'for all indexed columns

size 30', cascade=>true);end;/

Elapsed: 00:06:01.14

Page 56: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

56

method_opt- 'for all indexed columns'

TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS--------------- --------------- ---------- ------------ ----------- -------FILE_HISTORY FILE_ID NUMBER 1950360 195036 30 FNAME VARCHAR2 18725 195036 10 STATE_NO NUMBER 5 195036 4 FILE_TYPE NUMBER 7 195036 6 PREF VARCHAR2 36039 144909 14 CREATE_DATE DATE TRACK_ID NUMBER SECTOR_ID NUMBER TEAMS NUMBER BYTE_SIZE NUMBER START_DATE DATE END_DATE DATE LAST_UPDATE DATE CONTAINERS NUMBER

Page 57: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

57

method_opt- 'for all indexed columns'

Took 7:37 (compared to 6:04 for same sample size without histograms)

Statistics aren’t gathered for non-indexed columns.

Page 58: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

58

method_opt- 'for all columns size skewonly'

Rather than deciding the maximum number of buckets to use, let’s let Oracle decide.

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size skewonly', cascade=>true);end;/

Elapsed: 00:02:27.02

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>10, method_opt=>'for all columns size skewonly', cascade=>true);end;/

Elapsed: 00:12:57.15

Page 59: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

59

method_opt- 'for all columns size skewonly'

TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS--------------- --------------- ---------- ------------ ----------- -------FILE_HISTORY FILE_ID NUMBER 1951550 195155 200 FNAME VARCHAR2 18908 195155 39 STATE_NO NUMBER 5 195155 4 FILE_TYPE NUMBER 7 195155 6 PREF VARCHAR2 35913 144624 93 CREATE_DATE DATE 489501 195155 200 TRACK_ID NUMBER 9 195155 8 SECTOR_ID NUMBER 6 195155 5 TEAMS NUMBER 971 160436 46 BYTE_SIZE NUMBER 0 1 START_DATE DATE 0 1 END_DATE DATE 0 1 LAST_UPDATE DATE 525993 195155 200 CONTAINERS NUMBER 971 160483 46

Page 60: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

60

method_opt- 'for all columns size skewonly'

We get statistics on all columns It took 15:24 (twice as long as ‘for all indexed

columns’ 200 buckets for the primary key???

Page 61: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

61

Bug #3929552 Metalink Note 284917.1: “DBMS_STATS

WITH SKEWONLY GENERATES HISTOGRAMS FOR UNIQUE KEY COLUMN.”

The subject of the note says it all. So SKEWONLY isn’t very useful for the

affected versions.

Page 62: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

62

method_opt- 'for all columns size auto‘

Let’s see if the AUTO option for size does any better.

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size auto', cascade=>true);end;/

Elapsed: 00:01:40.49

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>10, method_opt=>'for all columns size auto', cascade=>true);end;/

Elapsed: 00:03:53.78

Page 63: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

63

method_opt- 'for all columns size auto'

TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS

------------ --------------- ---------- ------------ ----------- -------

FILE_HISTORY FILE_ID NUMBER 1946580 194658 1

FNAME VARCHAR2 18905 194658 38

STATE_NO NUMBER 4 194658 1

FILE_TYPE NUMBER 7 194658 1

PREF VARCHAR2 36144 144197 1

Page 64: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

64

method_opt- 'for all columns size auto'

Took only 5:34 to gather statistics (best yet for a 10% estimate)

Statistics leave something to be desired. 18,905 values for FNAME 38 buckets for FNAME (there is NO skew in

this column)

Page 65: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

65

Collecting Histograms with DBMS_STATS

Not as efficient as advertised in Oracle 9i.

Page 66: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

66

Is 10g Any Better? Performance and accuracy of statistics is

reasonable in 9i for “basic” choices. Serious deficiencies with

AUTO_SAMPLE_SIZE and histogram collection.

Are these issues resolved in 10g Release 1?

Page 67: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

67

10g- 'for all columns size skewonly'

First we’ll try the SKEWONLY option for SIZE in histogram collection.

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size skewonly', cascade=>true);end;/Elapsed: 00:02:45.05

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>10, method_opt=>'for all columns size skewonly', cascade=>true);end;/Elapsed: 00:12:28.17

Page 68: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

68

10g- 'for all columns size skewonly'

TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS--------------- --------------- ---------- ------------ ----------- -------FILE_HISTORY FILE_ID NUMBER 1947030 194703 200 FNAME VARCHAR2 26167 194703 200 STATE_NO NUMBER 6 194703 6 FILE_TYPE NUMBER 7 194703 7 PREF VARCHAR2 45092 144582 200

Page 69: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

69

10g- 'for all columns size skewonly'

Collection took 15:13 (about the same as in 9i).

Results are just as bad.– Still collecting histograms on primary key and

FNAME.– 26,167 distinct values of FNAME.

Page 70: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

70

10g- 'for all columns size auto'

Let’s try the auto option for size (and start our demo).

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all columns size

auto', cascade=>true);end;/Elapsed: 00:03:22.81

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>10, method_opt=>'for all columns size

auto', cascade=>true);end;/Elapsed: 00:08:15.00

Page 71: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

71

10g- 'for all columns size auto'

TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS

------------ --------------- ---------- ------------ ----------- -------

FILE_HISTORY FILE_ID NUMBER 1947700 194770 1

FNAME VARCHAR2 1947700 194770 1

STATE_NO NUMBER 4 194770 1

FILE_TYPE NUMBER 7 194770 1

PREF VARCHAR2 36073 144920 1

Page 72: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

72

10g- 'for all columns size auto'

Statistics gathering took 11:38 (twice as long as in 9i).

Statistics are reasonable. No histograms on unique columns. So the AUTO option for SIZE works in Oracle

10g… sort of.

Page 73: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

73

10g- 'for all columns size auto'

SQL> select STATE_NO, COUNT(*) from FILE_HISTORY group by STATE_NO; 

  STATE_NO   COUNT(*)---------- ----------         0         95        20        569        30    1950957        40         39       999          4      9999          9

  6 rows selected.

Seems like a candidate for a histogram.

Page 74: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

74

10g- 'for all columns size auto'

SQL> select file_type, count(*) from file_history group by file_type; FILE_TYPE COUNT(*)---------- ---------- 1 670950 2 83799 3 58925 4 48241 5 777258 6 62681 8 249819

7 rows selected.

Also a candidate for a histogram?

Page 75: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

75

10g- 'for all columns size auto'

Let’s try some queries:

select count(*) from file_history where file_type = 5;select count(*) from file_history where file_type = 4;select count(*) from file_history where fname = 'SOMETHING';

Page 76: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

76

10g- 'for all columns size auto'

And gather stats again

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>10, method_opt=>'for all

columns size auto', cascade=>true);end;/

Page 77: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

77

10g- 'for all columns size auto'

TABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE BUCKETS------------ --------------- ---------- ------------ ----------- ---FILE_HISTORY FILE_ID NUMBER 1952650 195265 1 FNAME VARCHAR2 26179 195265 254 STATE_NO NUMBER 5 195265 1 FILE_TYPE NUMBER 7 195265 7 PREF VARCHAR2 36154 144848 1

dbms_stats has taken the workload into account and created histograms on the columns we queried (though it’s again creating a histogram on FNAME).

Page 78: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

78

10g- AUTO_SAMPLE_SIZE Let’s see how 10g does with using

DBMS_STATS.AUTO_SAMPLE_SIZE for estimate_percent.

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'FILE_HISTORY', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true);end;/

Elapsed: 00:02:15.95

begin dbms_stats.gather_table_stats( ownname=>'TSUTTON', tabname=>'PROP_CAT', estimate_percent=>dbms_stats.auto_sample_size, cascade=>true);end;/

Elapsed: 00:04:01.00

Page 79: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

79

10g- AUTO_SAMPLE_SIZETABLE_NAME COLUMN_NAME DATA_TYPE NUM_DISTINCT SAMPLE_SIZE--------------- --------------- ---------- ------------ -----------FILE_HISTORY FILE_ID NUMBER 1951112 5764 FNAME VARCHAR2 1951112 57166 STATE_NO NUMBER 2 5764 FILE_TYPE NUMBER 7 5764 PREF VARCHAR2 2200 4246

Page 80: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

80

10g- AUTO_SAMPLE_SIZE Statistics gathering took 6:17 (an enormous

improvement over the 31:15 in 9i). Accuracy may be lacking.

– PREF shows 2200 distinct values (actual is 65,345)

– Results aren’t as accurate as a 5% sample in 9i (which took 1/3 less time)

Page 81: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

81

SummaryFrom our testing, it appears that: The sweet spot for balancing performance of the

gathering process and accuracy of statistics lies between 5 and 20%.

Gathering statistics separately for indexes is faster than the cascade=true option while gathering table statistics.

Block_sample=true doesn't appear to appreciably speed up statistics gathering, while at the same time delivering somewhat less accurate statistics.

Page 82: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

82

Summary Using AUTO_SAMPLE_SIZE takes nearly as

long as COMPUTE in Oracle 9i. It is much faster in 10gR1, but the accuracy of the statistics is not as good as with an estimate_percent of 5, which gathers the statistics faster.

Page 83: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

83

Summary Using the SKEWONLY option to size

histograms is inadvisable in both Oracle 9i and 10gRelease1.

Using the AUTO option to choose and size histograms is inadvisable in Oracle 9i, but seems to work better in 10gR1 (though it still has issues).

Page 84: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

84

In Conclusion These results should help you decide what

options you want to use in gathering statistics on your databases. Hopefully you’ll test different options on your data. It is clear that DBMS_STATS provides greater power, accuracy, and flexibility than ANALYZE, and it is improving with each version.

Page 85: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

85

The White Paper A companion white paper to this presentation

is available for free download from our company’s website at:www.dbspecialists.com/presentations.html

Page 86: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

86

Resources from Database Specialists

The Specialist newsletter www.dbspecialists.com/specialist.html Database Rx®

dbrx.dbspecialists.com/guestProvides secure, automated monitoring, alert notification, and analysis of your Oracle databases

Page 87: 1 What’s Up with dbms_stats? Terry Sutton Database Specialists, Inc

87

Contact InformationTerry SuttonDatabase Specialists, Inc.388 Market Street, Suite 400San Francisco, CA 94111

Tel: 415/344-0500Email: [email protected]: www.dbspecialists.com