Data Warehouse & Business Intelligence: Purpose/Use of collect statistics

The following list is a very high-level description of the most important purposes for column
and index statistics.

• The Optimizer uses statistics to decide whether it should generate a query plan that use a
secondary, hash, or join index instead of performing a full-table scan.

• The Optimizer uses statistics to estimate the cardinalities of intermediate spool files based
on the qualifying conditions specified by a query.
The estimated cardinality of intermediate results is critical for the determination of both
optimal join orders for tables and the kind of join method that should be used to make
those joins.
For example, should 2 tables or spool files be redistributed and then merge joined, or
should one of the tables or spool files be duplicated and then product joined with the
other. Depending on how accurate the statistics are, the generated join plan can vary so
greatly that the same query can take only seconds to complete using one join plan, but take
hours to complete using another.

• For PPI tables, statistics collected on the PARTITION system-derived column permit the
Optimizer to better estimate costs.
Teradata Database also uses PARTITION statistics for estimates when predicates are based
on the PARTITION column, for example: WHERE PARTITION IN (3,4).

• The Teradata Statistics Wizard uses existing statistics, which it assumes are current by
default, as a basis for recommending other statistics to be collected for a query or query
workload (see “Special Considerations for Sampled and Stale Statistics” on page 668).
This is yet another reason you should keep your statistics as current as you possibly can.

Data Warehouse & Business Intelligence

Lables

Friday, 18 October 2013

Purpose/Use of collect statistics

No comments:

Post a Comment