Title of Presentation - Welcome to The TeradataForum

Download Report

Transcript Title of Presentation - Welcome to The TeradataForum

Teradata Architectural Summit 2007
Yasir Hassan Maken
Professional Services EMEA (GCC – Pakistan)
What Data Skew can cause?
Why I cannot insert 50 GB
evenly distributed data in a
database having 500 GB unused
space?
Why we are not able to move only
10% of the unused space from a
database?
Why most of my requests
are Spooling out?
Why most of the queries running
on this database usually have high CPU
SKEW?
2 > 7/17/2015
Teradata Confidential
Because
• Most of the database space is being wasted
• Almost all allocated space for a database on at least one of the AMPs
has been occupied
• The chunk of upcoming data that needs to go on the highly congested
AMP, after distribution, do not finds a space for accommodation
• If space needs to be moved, the chunk of space that is supposed to
be fetched from the congested AMP would not be allowed. As, equal
space will be fetched from each of the AMPs and the activity will fail if
one of the AMPs does not have FREE SPACE=“Total Space to be
moved/# of AMPs”
• During processing of any request on the database, there would be
high chances of spooling out on the congested AMP as “MaxPerm –
CurrPerm” will be the available Spool which will be less due to high
Skew on the AMP
• In some cases, the CPU Skew of the sql request on the skewed tables
of the database will be more due to uneven distribution of data and
different CPU required on different AMPs
3 > 7/17/2015
Teradata Confidential
How to make things work normally?
Identify the Skewed tables
Identify the Skewed tables and plan
Primary Index Revision process for the
tables having high skew and huge Skew
savings
Analyze the DBQL to find the most
frequently used columns in the sql
executed on highly skewed tables
(Choose the columns in WHERE
clause and JOIN conditions)
Check the row distribution per AMP for the
most frequently used column or group of
columns (Composite) being used in the
WHERE cause and JOIN conditions. Also
check the new columns for Hash collisions
Check the row distribution per amp and
find the Skew with the suggested Index
SkewFactor = 100 - (AVG(CurrentPerm)/MAX(CurrentPerm)*100)
Suggest the Primary Index on the basis
of best distribution and high frequency of
use in order to make sure that data gets
evenly distributed and almost equal
amount of space is free/utilized across
all AMPs
4 > 7/17/2015
Check for Hash collisions
Teradata Confidential
Improvements
Now I can easily insert enough
data in the database
Now the Spool out issue has
improved a lot
Now I can easily move space from the
database if there is any available space
The CPU Skew has decreased a lot
BEFORE
xx GB
xx GB
xx GB
AFTER
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB
xx GB
10 0
10 0
90
90
80
80
70
70
60
60
50
50
U sed S p ace
U sed S p ace
40
40
30
30
20
20
10
10
0
0
A M P1
5 > 7/17/2015
A M P3
A M P1
A M P5
Teradata Confidential
A M P3
A M P5