Data Warehouse & Business Intelligence: Explain about Skew Factor?

Wednesday, 11 September 2013

Explain about Skew Factor?

The data distribution of table among AMPs is called Skew Factor.

Generally For Non-Unique PI we get duplicate values so the more duplicate vales we get more the data have same row hash so the same data will come to same amp, it makes data distribution inequality, One amp will store more data and other amp stores less amount of data, when we are accessing full table, The amp which is having more data will take longer time and makes other amps waiting which leads processing wastage

In this situation (unequal distribution of data) we get Skew Factor High

For this type of tables we should avoid full table scans

Ex:

AMP0 AMP1

10000(10%) 9000000(90%)

In this situation skew factor is very high 90%

Data Warehouse & Business Intelligence

Lables

Wednesday, 11 September 2013

Explain about Skew Factor?

No comments:

Post a Comment