Lables

Friday 18 October 2013

Purpose/Use of collect statistics

The following list is a very high-level description of the most important purposes for column 
and index statistics.

• The Optimizer uses statistics to decide whether it should generate a query plan that use a 
secondary, hash, or join index instead of performing a full-table scan. 

• The Optimizer uses statistics to estimate the cardinalities of intermediate spool files based 
on the qualifying conditions specified by a query. 
The estimated cardinality of intermediate results is critical for the determination of both 
optimal join orders for tables and the kind of join method that should be used to make 
those joins. 
For example, should 2 tables or spool files be redistributed and then merge joined, or 
should one of the tables or spool files be duplicated and then product joined with the 
other. Depending on how accurate the statistics are, the generated join plan can vary so 
greatly that the same query can take only seconds to complete using one join plan, but take 
hours to complete using another. 

• For PPI tables, statistics collected on the PARTITION system-derived column permit the 
Optimizer to better estimate costs. 
Teradata Database also uses PARTITION statistics for estimates when predicates are based 
on the PARTITION column, for example: WHERE PARTITION IN (3,4). 

• The Teradata Statistics Wizard uses existing statistics, which it assumes are current by 
default, as a basis for recommending other statistics to be collected for a query or query 
workload (see “Special Considerations for Sampled and Stale Statistics” on page 668).
This is yet another reason you should keep your statistics as current as you possibly can.

No comments:

Post a Comment