developer teradata com blog carrie 2014 02 statistics thresh

10
pdfcrowd.com open in browser PRO version Are you a developer? Try out the HTML to PDF API HIDE FEATURES General Database Aster Hadoop UDA Extensibility Connectivity Applications Tools Viewpoint Register Log in Home Downloads Forums Blogs Search CARRIE'S BLOG Every day at work for me is a different mix of handling the familiar, discovering the new, and being surprised by the unexpected. This blog is my “Dear Diary” spot for whatever remarkable Teradata- technical-things I have stumbled across recently. DATABASE Identifying Used, Unused and Missing Statistics RK GENERAL Multi-Active Systems with New Unity Director/Loader 14.11 Clif f L EXTENSIBILITY Teradata Query Grid and Machine Learning in Hadoop watzke UDA Working with Identity Columns and Unity Director and Loader Paul LaPointe UDA New Teradata Mover 15.00 Clif f L 1 post 1 post 1 post 1 post 1 post 2 posts 1 post 1 post An earlier blog post focused on simple steps to get started using the Teradata 14.10 Automated Statistics Management (AutoStats) feature. One of the new capabilities that AutoStats relies on when it streamlines statistics collection is the new “Threshold” option. Threshold applies some CARRIE'S ARCHIVE February 2015 January 2015 December 2014 September 2014 July 2014 June 2014 March 2014 February 2014 PRINT ALL BLOGS CARRIE'S BLOG Statistics Threshold Functionality 101 Blog entry by carrie on 06 Feb 2014 3 comments Tags: collect statistics statistics threshold teradata database 14.10

Upload: anilkumar-nandala

Post on 22-Dec-2015

3 views

Category:

Documents


0 download

DESCRIPTION

Teradaat

TRANSCRIPT

Page 1: Developer Teradata Com Blog Carrie 2014 02 Statistics Thresh

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

HIDE FEATURESGeneral Database Aster Hadoop UDA Extensibility Connectivity Applications Tools Viewpoint

Register Log in

Home Downloads Forums BlogsSearch

CARRIE'S BLOG Every day at work for me is a different mix of handling the familiar, discovering the new, and beingsurprised by the unexpected. This blog is my “Dear Diary” spot for whatever remarkable Teradata-technical-things I have stumbled across recently.

DATABASE

Identifying Used, Unusedand Missing StatisticsRK

GENERAL

Multi-Active Systemswith New UnityDirector/Loader 14.11Clif f L

EXTENSIBILITY

Teradata Query Grid andMachine Learning inHadoopwatzke

UDA

Working with IdentityColumns and UnityDirector and LoaderPaul LaPointe

UDA

New Teradata Unity DataMover 15.00 AvailableClif f L

1 pos t

1 pos t

1 pos t

1 pos t

1 pos t

2 pos ts

1 pos t

1 pos t

An earlier blog post focused on simple steps to get started using the Teradata 14.10 AutomatedStatistics Management (AutoStats) feature. One of the new capabilities that AutoStats relies onwhen it streamlines statistics collection is the new “Threshold” option. Threshold applies some

CARRIE'S ARCHIVE

February 2015

January 2015

December 2014

September 2014

July 2014

June 2014

March 2014

February 2014

PRINTALL BLOGS CARRIE'S BLOG

Statistics Threshold Functionality 101Blog entry by carrie on 06 Feb 2014

3 comments

Tags: collect statistics statistics threshold teradata database 14.10

Page 2: Developer Teradata Com Blog Carrie 2014 02 Statistics Thresh

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

1 pos t

1 pos t

1 pos t

Blogging since

Number of posts

Number of comments

Number of views

Most popular tags

intelligence about when statistics actually need to be re-collected, allowing the optimizer to skip somerecollections.

Although you will probably want to begin relying on AutoStats when you get to 14.10, you don’t haveto be using AutoStats to take advantage of threshold, as the two features are independent from oneanother. This post will give you a simple explanation of what the threshold feature is, what defaultthreshold activity you can expect when you get on 14.10, and what the options having to do withthreshold do for you. And you’ll get some suggestions on how you can get acquainted with thresholda step at a time.

For more thorough information about statistics improvements in 14.10, including the thresholdfunctionality, see the orange book Teradata Database 14.10 Statistics Enhancements by RamaKorlapati.

What Does the Threshold Option Do?

When you submit a COLLECT STATISTICS statement in 14.10, it may or may not execute. A decision ismade whether or not there is a value in recollecting these particular statistics at the time they aresubmitted. That decision is only considered if threshold options are being used.

Threshold options can exist at three different levels, each of which will be discussed more fully in theirown section below. This is a very general description of the three levels:

1. System threshold: This is the default approach for applying thresholds for all 14.10 platforms. The system threshold default is not a single threshold value. Rather this default approachdetermines the appropriate threshold for each statistic and considers how much the underlyingtable has changed since the last collection.

2. DBA-defined global thresholds: These optional global thresholds override the system default,and rely on DBA-defined fixed percentages as thresholds. Once set, all statistics collectionstatements will use these global threshold values, unless overridden by the third level ofthreshold options at the statement level.

3. Thresholds on individual statements: Optional USING clauses that are attached to COLLECTSTATISTICS statements can override the system default or any global DBA-defined thresholdswhen there is a need for customization at the individual statistic level.

Whichever threshold level is being used, if the optimizer determines that the threshold has not beenmet, no statistics will be collected, even though they have been requested. When a collection hasbeen asked for but has not been executed, a StatsSkipCount column in the DBC.StatsTbl row thatrepresents this statistics will be incremented.

StatsSkipCount appears as an explicit column in the view, but in the base DBC.StatsTbl StatsSkipCount is carried in the Reserved1 field. When StatsSkipCount is zero it means that the

February 2014

January 2014

December 2013

CARRIE'S BLOG STATS

April 2009

59

934

388347

database, tasm,workload

management

Page 3: Developer Teradata Com Blog Carrie 2014 02 Statistics Thresh

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

StatsSkipCount is carried in the Reserved1 field. When StatsSkipCount is zero it means that themost recent COLLECT STATISTICS request was executed.

Ways That a Threshold Can Be Expressed

The system setting (level 1) for threshold logic is not one threshold value applied to all statisticscollections. Rather, when enabled, the setting tells the optimizer to hold back the execution of acollection submission based on whatever it deems as an appropriate threshold for this statistics atthis point in time. This high-level setting uses a “percent of change” type of threshold only.

Statistics collection thresholds are explicitly specified when using DBA-defined global settings orindividual statement thresholds are used. These explicit thresholds can be expressed as a percent ofchange to the rows of the table upon which statistics are being collected, or as time (some number ofdays) since the last collection.

The most reliable way to express thresholds is by means of a percent of table change. That is whythe highest level system setting, the one that is on by default, only supports percent of changethresholds. Time as a threshold must be explicitly specified in order to be used.

Importance of DBQL USECOUNT Logging

The recommended percent of change thresholds rely on having DBQL USECOUNT logging turned on. See my earlier blog on AutoStats for an explanation of USECOUNT DBQL logging. USECOUNT loggingis a special type of DBQL logging that is enabled at the database level. Among other things,USECOUNT tracks inserts, deletes and updates made to tables within a database, and as a result,can provide highly accurate information to the optimizer about how the table has changed since thelast statistics collection.

The default system threshold functionality is able to be applied to a statistic collection only ifUSECOUNT logging has been enabled for the database that the statistics collection table belongs to. In the absence of USECOUNT data, the default threshold behavior will be ignored. However, bothDBA-defined global thresholds and statement-based thresholds are able to use percent of changethresholds even without USECOUNT logging, but with the risk of less accuracy.

In the cases where USECOUNT logging is not enabled, percent of change values are less reliablebecause the optimizer must rely on random AMP sample comparisons. Such comparisons considerestimated table row counts (the size of the table) since the last statistics collection. This can masksome conditions, like deletes and inserts happening in the same timeframe. Comparisonsbased strictly on table rows counts are not able to detect row updates, which could change columndemographics. For that reason, it is recommended that USECOUNT logging be turned on for alldatabases undergoing change once you get to 14.10.

Percent of change is the recommended way to express thresholds when you begin to use the

Page 4: Developer Teradata Com Blog Carrie 2014 02 Statistics Thresh

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

threshold feature in 14.10. Time-based thresholds are offered as options primarily for sites thathave evolved their own in-house statistics management applications at a time when percent ofchange was unavailable, and wish to continue to use time.

The next three sections discuss the three different levels of threshold settings.

More about the System Threshold Option

All 14.10 systems have the system threshold functionality turned on by default. But by itself, that isnot enough. USECOUNT logging for the database must also be enabled. If USECOUNT DBQL loggingis turned on, then each COLLECT STATISTICS statement will be evaluated to see if it will run or beskipped.

During this evaluation, an appropriate change threshold for the statistic is established by theoptimizer. The degree of change to the table since the last collection is compared against the currentstate of the table, based on USECOUNT tracking of inserts, deletes and updates performed. If thechange threshold has not been reached, and enough history has been collected for this statistics(usually four of five full collections) so that the optimizer can perceive a pattern in the data such thatextrapolations can be confidently performed, then this statistics collection will be skipped.

Even if the percent of change threshold has not been reached (indicating that statistics can beskipped), if there are insufficient history records, the statistics will be recollected. And even with 10 or20 history records, if there is no regular pattern of change that the optimizer can rely on to makereasonable extrapolations, statistics will be recollected.

There is a DBS Control record parameter called SysChangeThresholdOption which the behavior of thesystem threshold functionality. This parameter is set at zero by default. Zero means that as long asUSECOUNT logging in DBQL is enabled for the database that the table belongs to, then all statisticscollection statements will undergo a percent of change threshold evaluation, as described above.

If you want to maintain the legacy behavior, threshold logic can be turned off completely at thesystem level by disabling the SysChangeThresholdOption setting in DBS Control (set it to 3). Thisfield, along with parameters to set DBA-defined global parameters, can be found in the new OptimizerStatistics Fields in DBS Control.

It is important to re-emphasize that the DBQL USECOUNT logging must be enabled for all databasesthat you want to take advantage of the system threshold functionality. In addition, all other lower-level threshold settings must remain off (as they are by default) in order for the system threshold tobe in effect.

More about DBA-Defined Global Thresholds

Page 5: Developer Teradata Com Blog Carrie 2014 02 Statistics Thresh

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

While it is recommended that the system threshold setting be embraced as the universal approach,there are some sites that have established their own statistics management processes prior to14.10. Some of these involve logic that checks on the number of days that have passed since the lastcollection as an indicator of when to recollect.

In order to allow those statistics applications to continue to function as they have in the past withinthe new set of threshold options in 14.10, global thresholds parameters have been made available. These options are one step down from the system threshold and will cancel out use ofDefaultUserChangeThreshold.

There are two parameters in the same section of DBS Control Optimizer Statistics Field that allow youto set DBA-defined thresholds:

DefaultUserChangeThreshold: If this global threshold is modified with a percent of change value(some number > 0), then the system default threshold will be disabled, and the percent defined herewill be used to determine whether or not skip statistic collections globally.

Unlike the system default, if DBQL USECOUNT logging has not been enabled, random AMP samples willbe used instead if this global setting has been enabled. The approach of using random AMP sample issomewhat less reliable, particularly in cases where there are updates, or deletes accompanied byinserts, rather than just inserts.

DefaultTimeThreshold: This global setting provides backward compatibility with home-grownstatistics management applications that rely on the passage of time. Using a time-based thresholdoffers a less precise way of determining when a given statistic requires recollection. Some tablesmay undergo large changes in a 7-day period, while others may not change at all during that sameinterval. This is a one-size-fits-all lacks specificity and may result in unneeded resource usage.

More about Statement-Level Thresholds

USING THRESHOLD syntax can be added manually to any COLLECT STATISTIC statement.

When you use USING THRESHOLD, it will override any default or global threshold settings that are inplace. See the Teradata Database 14.10 Statistics Enhancements orange book for detailedinformation about the variations of statement-level options you can use for this purpose.

For statement-based percent of change thresholds, the optimizer does not require that there be ahistory of past collections. If data change is detected over the specified threshold statistics will becollected, otherwise they will be skipped.

Statement-level thresholds are for special cases where a particular statistic needs to be treated

COLLECT STATISTICS USING THRESHOLD 10% AND THRESHOLD 15 DAYS COLUMN TestID ON SandBoxT1;

Page 6: Developer Teradata Com Blog Carrie 2014 02 Statistics Thresh

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

differently than the higher level default parameters dictate. They can also be useful when you aregetting starting with threshold, and you want to limit the scope to just a few statistics.

Getting Started Using Threshold

Here are some suggestions for sites that have just moved to 14.10 and want to experience how thethreshold logic works on a small scale before relying on the system and/or global options:

1. Pick a small, non-critical database.

2. Enable DBQL USECOUNT logging on that database:

BEGIN QUERY LOGGING WITH USECOUNT ON SandBoxDB;

Page 7: Developer Teradata Com Blog Carrie 2014 02 Statistics Thresh

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

3. Disable the system threshold parameter by setting DBS Control setting:

SysChangeThresholdOption = 3

4. Leave the global parameters disabled as they are by default:

DefaultUserChangeThreshold = 0

DefaultTimeThreshold = 0

5. Add USING THRESHOLD 10 PERCENT to the statistics collection statements just for the tableswithin your selected database:

6. Insert a few rows into the table (less than 10% of the table size) and run an Explain of thestatistics collection statement itself, and it will tell you whether or not skipping is taking place.

See page 39 of the Teradata Database 14.10 Statistics Enhancement orange book for some examples.

Summary of Recommendations for Threshold Use

The following recommendations apply when you are ready to use the threshold functionality fully:

1. If your statistics are not under the control of AutoStats, make no changes and rely on the systemthreshold functionality to appropriately run or skip your statistics collection statements.

2. Always turn on USECOUNT logging in DBQL for all databases for which statistics are beingcollected and where you are relying on system threshold.

3. If you have your own statistics management routines that rely on executing statistics only after aspecific number of days have passed, set the DefaultTimeThreshold to meet your thresholdcriteria. You should experience similar behavior as you did prior to 14.10. Over time, considerswitching to a change-based threshold and re-establishing the system threshold, as it will bemore accurate for you.

4. Don’t lead with either of the DBA-defined global parameters DefaultUserChangeThreshold orDefaultTimeThreshold unless there is a specific reason to do so.

5. Use the statement-level threshold only for special statistics that need to be handled differentlyfrom the system or the DBA-defined global defaults.

6. Favor percent of change over number of days as your first choice for the type of threshold touse.

COLLECT STATISTICS USING THRESHOLD 10% COLUMN TestID ON SandBoxT1;

EXPLAIN COLLECT STATISTICS COLUMN TestID ON SandBoxT1;

Page 8: Developer Teradata Com Blog Carrie 2014 02 Statistics Thresh

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

Sandeepyadav

12 comments

Joined 09/13

carrie

456 comments

Joined 04/08

7. But if USECOUNT is not being logged, then rely on time-based thresholds and setDefaultTimeThreshold at your preferred number of days.

DISCUSSION

3 days ago

Hi Carrie,There are 980 Database exists in my system. If i enableusecount logging for each database then is there any impact ?DBQLRULES table will have 1000 rows. so Is it fine ? Is it good to make different Collect Jobs or one is enough tocollect the auto stats ?

Thanks, -Sandeep.

1 day ago

Sandeep, I have not heard of any issues with UseCount loggingoverhead being problematic. UseCount logging uses thesame techniques as other DBQL collections. There is aUseCount DBQL cache that is flushed at pre-set intervals andthe cached entries are inserted into the database in a veryefficient way. However, every site is different, so cannot tellyou for sure that you will not experience some overhead. Butbased on feedback from other customers, it is not likely. StatsUsage and XMLPlan logging, on the other hand, do comewith some overhead. For that reason, we recommend youonly have that type of logging on for limited periods of time tocapture representative queries when analysis is going to beperformed. StatsUsage has less impact as it is at the request

Page 9: Developer Teradata Com Blog Carrie 2014 02 Statistics Thresh

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

Sandeepyadav

12 comments

Joined 09/13

level, while XMLPlan is at the step level and will have moreoverhead. I can not think of any reason why 1000 rows in the DBQLRulestable would be an issue. These new options will cause thenumber of rows to increase, and I think that is expected. In terms of AutoStats, a Collect job currently is limited to asingle session. However the collect lists generated by aCollect job group all the stats on a table into a singlestatement, so there will be parallelism and overlap achievedthere. If you have more available cycles you want to put into statscollection there is no reason you cannot create multiple Collectjobs with non-overlapping object scopes and schedule themto run at the same time. Try to stay away from too manyCollect jobs running at the same time, as that could becomedifficult to manage and monitor. The AutoStats orange book (availabe in the orange bookrepository) has a lot of helpful tips in using AutoStats if youneed more information in that area. Thanks, -Carrie

16 hours ago

Thanks Carrie :)

Thanks, -Sandeep.

You must sign in to leave a comment.

Page 10: Developer Teradata Com Blog Carrie 2014 02 Statistics Thresh

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

Contact Us | Privacy Policy | Terms of Use | Send Feedback | Teradata.com

C opyright © 2004-2013 Teradata C orporation. Your use of this Teradata webs ite is governed by the P rivacy P olicy and the Terms of Use, inc luding your rights to materials on this webs ite, the rights you grant to your submiss ions to this webs ite, and your respons ibilities regarding your conduc t on this webs ite. The Privacy Policy and Terms of Use for this Teradata website changed effective March 31, 2014.