123 Scatter!

Download Report

Transcript 123 Scatter!

1, 2, 3…
Scatter!
Getting Your Humpty-Dumpty
Database in Order.
Tom Bascom, White Star Software
[email protected]
A Few Words about the Speaker
• Tom Bascom; Progress user & roaming DBA
since 1987
• VP, White Star Software, LLC
– Expert consulting services related to all aspects of
Progress and OpenEdge.
– [email protected]
• President, DBAppraise, LLC
– Remote database management service for OpenEdge.
– Simplifying the job of managing and monitoring the
world’s best business applications.
– [email protected]
2
“Fragmentation”
vs
“Scatter”
3
Fragmentation
• “Fragmentation” is splitting records into
multiple pieces.
$ proutil dbname –C dbanalys > dbname.dba
…
RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : 95
------------------------------------------------------Record Size (B) -FragmentsScatter
Table
Records
Size Min Max Mean
Count Factor Factor
PUB.APP_FLAGS 1676180 47.9M
28
58
29 1676190
1.0
1.9
…
4
Fragmentation
• “Fragmentation” is splitting records into
multiple pieces.
$ proutil dbname –C dbanalys > dbname.dba
…
RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : 95
------------------------------------------------------Record Size (B) -FragmentsScatter
Table
Records
Size Min Max Mean
Count Factor Factor
PUB.APP_FLAGS 1676180 47.9M
28
58
29 1676190
1.0
1.9
…
10 additional fragments
Beware!
5
Fragmentation Occurs When
• Record data is too big for the block: i.e. 16k
of data going into a 4k block.
• Updated data needs more room to expand
than is available.
• The “create limit” and the “toss limit” can be
used to reserve more free space in blocks and
control fragmentation.
• Progress will automatically “de-frag” when
possible (10.1+).
6
Scatter
• “Scatter” is a measure of the “sequentialness”
of records.
$ proutil dbname –C dbanalys > dbname.dba
…
RECORD BLOCK SUMMARY FOR AREA "APP_FLAGS_Dat" : 95
------------------------------------------------------Record Size (B) -FragmentsScatter
Table
Records
Size Min Max Mean
Count Factor Factor
PUB.APP_FLAGS 1676180 47.9M
28
58
29 1676190
1.0
1.9
…
7
What is “Scatter Factor”?
• The “factor” does not care about any ordering –
not even primary key.
• For type 1 areas it is a measure of how well the
data fits into the minimum # of blocks that would
be required to hold it with “distance” between
blocks taken into account.
• For type 2 areas there is no distance penalty –
but free space in a cluster can increase scatter.
• “Logically adjacent” isn’t really reported by
dbanalys.
8
Fragmentation and Scatter
• “Fragmentation” is splitting records into
multiple pieces.
• “Scatter” is a measure of the “sequentialness”
of records.
• The “scatter factor” that proutil reports might
be better described as “density”.
• If you have two or more indexes at least one
of them is probably scattered.
9
Why is This
Important?
10
Locality of Reference
• When data is referenced there is a high
probability that it will be referenced again
soon.
• If data is referenced there is a high probability
that “nearby” data will be referenced soon.
• Locality of reference is why caching exists at
all levels of computing.
11
How Does Progress Help?
• Temporal Locality (Will be reused “soon”)
• -B (LRU Chain)
• -mmax
• -Bt
• Spatial Locality (“nearby” data will be
accessed)
• Type 2 storage areas
• -B2 (no LRU)
• Memory mapped prolib
12
Cache Effectiveness
Layer
Time
Progress 4GL to –B
# of
Recs
# of Ops
Cost per
Op
Relative
0.96
100,000 203,473
0.000005
1
-B to FS Cache
10.24
100,000
26,711
0.000383
75
FS Cache to SAN
5.93
100,000
26,711
0.000222
45
-B to SAN Cache
11.17
100,000
26,711
0.000605
120
SAN Cache to Disk
200.35 100,000
26,711
0.007500
1500
-B to Disk
211.52 100,000
26,711
0.007919
1585
13
Logical
Scatter
14
Definition
• Logical Scatter is the probability that records
in a given logical order are also in “physical”
order (in the same block).
• Each index has its own ordering and thus its
own logical scatter.
• It is very unlikely that more than one index
will be well ordered.
• It is quite possible that all indexes might be
scattered.
15
Type 1 Storage Area
Block 3
Block 1
Burlington
14
Cologne Germany
Germany
9/28
Standard Mail
2
Upton Frisbee
Oslo
54
4.86
Shipped
1
Koberlein
Kelly
55
23.85
Shipped
1
53
1
Lift Tours
3
66
9/23
1
1
1
2
Block 2
1/26
1/31
FlyByNight
Block 4
1
3
53
8.77
Shipped
BBB
Brawn, Bubba B.
1,600
2
1
19
2.75
Shipped
DKP
Pitt, Dirk K.
1,800
2
2
49
6.78
Shipped
4
Go Fishing Ltd
Harrow
2
3
13
10.99
Shipped
16
Thundering Surf Inc.
Coffee City
16
Type 2 Storage Area
Block 3
Block 1
1
Lift Tours
Burlington
9
Pihtiputaan Pyora
Pihtipudas
2
Upton Frisbee
Oslo
10
Just Joggers Limited
Ramsbottom
3
Hoops
Atlanta
11
Keilailu ja Biljardi
Helsinki
4
Go Fishing Ltd
Harrow
12
Surf Lautaveikkoset
Salo
Block 2
Block 4
5
Match Point Tennis
Boston
13
Biljardi ja tennis
Mantsala
6
Fanatical Athletes
Montgomery
14
Paris St Germain
Paris
7
Aerobics
Tikkurila
15
Hoopla Basketball
Egg Harbor
8
Game Set Match
Deatsville
16
Thundering Surf Inc.
Coffee City
17
Tangent…
• The preceding slides should be all you need to
see in order to be convinced that type 1 areas
are a bad place to be putting data.
• The schema area is always a type 1 area.
Should it have data in it?
18
How to Determine “Logical Scatter”?
• You could read the whole database…
• Multiple times…
• (Because every index must be considered)
-or• For each table randomly choose a record.
• For each index of that table find the NEXT record.
• Is it in the same block?
• Lather, Rinse and Repeat.
19
Type 2 Storage Area
Block 3
Block 1
1
Lift Tours
Burlington
9
Pihtiputaan Pyora
Pihtipudas
2
Upton Frisbee
Oslo
10
Just Joggers Limited
Ramsbottom
3
Hoops
Atlanta
11
Keilailu ja Biljardi
Helsinki
4
Go Fishing Ltd
Harrow
12
Surf Lautaveikkoset
Salo
Block 2
Block 4
5
Match Point Tennis
Boston
13
Biljardi ja tennis
Mantsala
6
Fanatical Athletes
Montgomery
14
Paris St Germain
Paris
7
Aerobics
Tikkurila
15
Hoopla Basketball
Egg Harbor
8
Game Set Match
Deatsville
16
Thundering Surf Inc.
Coffee City
Id = 100%
Name = 25%
City 19%
20
Which
Index?
21
4GL Index Selection
compile cust.p xref “cust.xrf”
/* cust.p */
for each customer no-lock:
display custNum.
end.
cust.p
cust.p
cust.p
cust.p
cust.p
cust.p
cust.p
cust.p
cust.p
cust.p
1
1
1
1
1
2
2
3
3
3
COMPILE cust.p
CPINTERNAL ISO8859-1
CPSTREAM ISO8859-1
STRING "Customer" 8 NONE UNTRANSLATABLE
SEARCH sports2000.Customer CustNum WHOLE-INDEX
ACCESS sports2000.Customer CustNum
STRING ">>>>9" 5 NONE TRANSLATABLE FORMAT
STRING "Cust Num" 8 LEFT TRANSLATABLE
STRING "CustNum" 7 NONE UNTRANSLATABLE
STRING "--------" 8 NONE UNTRANSLATABLE
22
Amdahl’s Law
The performance enhancement
possible with a given improvement
is limited by the fraction of the
execution time that the improved
feature is used.
23
Compile Time is Not Enough
•
•
•
•
•
Dynamic Queries
SQL-92 Cost Based Optimizer
Unreached Code
Rarely Run Code
Widespread, but Low Impact Code
24
Execution Time Index Usage
for each _indexStat no-lock:
find _index no-lock where
_index._idx-num = _indexStat._indexStat-id no-error.
if available( _index ) then
do:
find _file no-lock where
recid( _file ) = _index._file-recid no-error.
display
_indexStat._indexStat-id
"*" when ( _file._prime-index = recid( _index ))
"u" when ( _index._unique = true )
_file._file-name when available _file
_index._index-name when available _index
_indexStat._indexStat-read
.
end.
end.
25
Execution Time Index Usage
Id
1
*
2
*
File-Name
Index-Name
_File
_File-Name
Read
39,550,882
U _Field
_File/Field
29
3
U _Field
_Field-Name
2,457,451
4
U _Field
_Field-Position
1,999,506
5
*
U _Index
_File/Index
4,791,744
6
*
U _Index-Field
_Index/Number
7,344,550
7
8
_Index-Field
*
_Field
0
U table1
t1_idx1
6,668,593
U table1
t1_idx2
224,913
U table2
t2_idx1
42,078,065
11
table2
t2_idx2
19,772,967,351
12
table2
t2_idx3
0
9
10
*
26
VST Note
• The default db settings only collect statistics
for the first 50 tables and indexes.
• To fix this:
define variable t as integer
define variable i as integer
for each _file no-lock where
for each _index no-lock: i =
display t i.
no-undo label “Tables”.
no-undo label “Indexes”.
_hidden = no: t = t + 1. end.
i + 1. end.
-tablerangesize 1000
-indexrangesize 3000
27
Case
Study
28
Logical Scatter Case Study
• A process reading approximately 1,000,000
records.
• An initial run time of 2 hours.
– 139 records/sec.
• Un-optimized database.
29
Baseline
4k DB Block
Type 1 Area
Table
Index
Table1
t1_idx1
0%
100%
0.09
t1_idx2
0%
0%
0.09
t2_idx1
69%
99%
0.51
t2_idx2
98%
1%
0.51
t2_idx3
74%
0%
0.51
Table2
•
•
•
•
%Sequential %Idx Used
Density
-B 25,000
Hit Ratio 95%
19,208 IO ops
Run time 2 hours
30
Round 1 – Increase Big B
4k DB Block
Type 1 Area
Table
Index
Table1
t1_idx1
0%
100%
0.09
t1_idx2
0%
0%
0.09
t2_idx1
69%
99%
0.51
t2_idx2
98%
1%
0.51
t2_idx3
74%
0%
0.51
Table2
•
•
•
•
%Sequential %Idx Used
Density
-B 100,000
Hit Ratio 98%
9,816 IO ops
Run time 60 minutes
31
Round 2 – Increase Some More
4k DB Block
Type 1 Area
Table
Index
Table1
t1_idx1
0%
100%
0.09
t1_idx2
0%
0%
0.09
t2_idx1
69%
99%
0.51
t2_idx2
98%
1%
0.51
t2_idx3
74%
0%
0.51
Table2
•
•
•
•
%Sequential %Idx Used
Density
-B 200,000
Hit Ratio 99%
6,416 IO ops
Run time 40 minutes
32
Restructure DB
• Dump & Load
• Convert to 8KB DB Blocks
• Convert to Type 2 Storage Areas
33
Round 3 – Back to Baseline –B
8k DB Block Table
Type 2 Areas Table1
Table2
•
•
•
•
Index
%Sequential %Idx Used
Density
t1_idx1
71% (0)
100%
0.10
t1_idx2
63% (0)
0%
0.10
t2_idx1
85% (69)
99%
1.00
t2_idx2
100% (98)
1%
1.00
t2_idx3
83% (74)
0%
0.99
-B 12,500
Hit Ratio 95%
9,417 IO ops
Run time 55 minutes
34
Round 4 – Bump Big B
8k DB Block Table
Type 2 Areas Table1
Table2
•
•
•
•
Index
%Sequential %Idx Used
Density
t1_idx1
71%
100%
0.10
t1_idx2
63%
0%
0.10
t2_idx1
85%
99%
1.00
t2_idx2
100%
1%
1.00
t2_idx3
83%
0%
0.99
-B 50,000
Hit Ratio 98%
4,746 IO ops
Run time 30 minutes
35
Round 5 – Big B … Again
8k DB Block Table
Type 2 Areas Table1
Table2
•
•
•
•
Index
%Sequential %Idx Used
Density
t1_idx1
71%
100%
0.10
t1_idx2
63%
0%
0.10
t2_idx1
85%
99%
1.00
t2_idx2
100%
1%
1.00
t2_idx3
83%
0%
0.99
-B 100,000
Hit Ratio 99%
3,192 IO ops
Run time 20 minutes
36
Are We Done?
8k DB Block Table
Type 2 Areas Table1
Table2
Index
%Sequential %Idx Used
Density
t1_idx1
71%
100%
0.10
t1_idx2
63%
0%
0.10
t2_idx1
85%
99%
1.00
t2_idx2
100%
1%
1.00
t2_idx3
83%
0%
0.99
• The most used index is not the most
sequential index!
37
Restructure DB
• Dump & Load
• Dump Table 2 using the most used index:
• t2_idx1
• Load Normally
38
Why Not?
• Return on Investment:
– Pickups from improving %SEQ are less than those
from improving Hit Ratio.
– That last 15% is a drop in the bucket compared to
the 6x improvement already gained.
– Expected improvement would be about 2% -- of
20 minutes. Or around 24 seconds.
39
However…
• At low buffer hit ratios (95% or lower):
– Restructuring to favor the most used index results in a
60% improvement in time.
– And the hit ratio improves to 99.75%.
– By eliminating 95% of the disk IO ops (112,247 ->
5196).
• On the other hand… the system in question has
grown again and it may now be worth revisiting.
40
Conclusion
• Type 2 Storage Areas improve “logical scatter”.
• Addressing “logical scatter” can be a powerful
performance improvement technique.
• Addressing “logical scatter” can be an
alternative to increasing –B in environments
where shared memory is constrained.
41
Questions?
42
Questions?
• Should I use USE-INDEX to force a “well
ordered” index?
• Why might scatter grow over time?
• I have two (or more) conflicting, but very
important, needs. What can I do?
43
Thank You!
44
45