Database Monitoring With ProTop!

Download Report

Transcript Database Monitoring With ProTop!

Database Monitoring With
Tom Bascom
President, Greenfield
Technologies
Agenda







2
Why do you need a monitor?
Monitoring Alternatives
What Are VSTs?
A Monitoring Architecture
Customizing And Extending The Code
Basic Capabilities
Advanced Features
Why Do You Need A Monitor?





3
Baselining
Benchmarking
Interactive troubleshooting
Capacity management
Resource Optimization
Agenda







4
Why do you need a monitor?
Monitoring Alternatives
What Are VSTs?
A Monitoring Architecture
Customizing And Extending The Code
Basic Capabilities
Advanced Features
Monitoring Alternatives







5
SAR, vmstat, iostat
Glance, TOPAS, Navisphere, Measureware,
PerfMon …
TOP, NMON
PROMON
Fathom
ProMonitor
ProTop!








6
Progress Focused
Interactive, Real-Time
Sample Oriented
Multi-platform
VST Based
4GL Code
Open Source
Free!
Agenda







7
Why do you need a monitor?
Monitoring Alternatives
What Are VSTs?
A Monitoring Architecture
Customizing And Extending The Code
Basic Capabilities
Advanced Features
What Are VSTs?
 Virtual System





8
Tables
A 4GL View of Progress Data Structures (the
same as those shown in PROMON.)
No Performance Impact (unless you do some
really dumb things!)
Primarily Read-Only
Not Terribly “User Friendly”
Quirky at times…
Some VST Quirks

Updateable:




Table & Index Ranges:




-tablebase, -tablerangesize
-indexbase, -indexrangesize
Table & Index Window can be reset!
Quirky Keys:


9
_startup._spin
Private buffers
APW settings
_myconnection…
_tablestat & _indexstat
User Number/Id VST Confusion…
find _myconnection no-lock.
find _connect no-lock where _connect-usr = _myconn-userid.
display _connect-usr _connect-id _myconn-userid.
find _userio no-lock where _userio-usr = _connect-usr.
display _userio-id _userio-usr.
User-Id _Connect-Id MyConn-UserId
_UserIO-Id
Usr
======= =========== ============= =========== ===========
253
10
254
253
254
253
Table Stats
/** This does NOT work if –tablebase <> 1!!!
find _File no-lock where _File._File-num = p_tbl.
find _TableStat no-lock where _TableStat-id = p_tbl.
display p_tbl _file-num _TableStat-id.
**/
/*** instead, use the following: ***/
find _TableStat no-lock where _TableStat-id = p_tbl.
find _File no-lock where _File._File-num = _TableStat-id.
display p_tbl _file-num _TableStat-id.
11
Index Name
find _IndexStat no-lock where _IndexStat-id = p_idx.
find _Index no-lock where _Index._Idx-num = _IndexStat-id.
find _File where recid( _File ) = _Index._File-recid.
tt_index.idxnote
= _File._File-name + “.” +
_Index._Index-name +
( if _file._prime-index = recid(_index)
then “ P" else “ " ) +
( if _index._unique then "U" else "" )
12
Agenda







13
Why do you need a monitor?
Monitoring Alternatives
What Are VSTs?
A Monitoring Architecture
Customizing And Extending The Code
Basic Capabilities
Advanced Features
A Monitoring Architecture


VST Based
Multi-Platform






14
UNIX Character
HTML
Windows GUI
Using Publish & Subscribe
More than just a VST Browser!
Customizable!
A Monitoring Architecture
Support Library
Monitored
Database
/* ProTop.p */
Loadable
Module
Loadable
Module
Loadable
Module
Temp Tables
Define
….
Initialize
…
Loop:
Publish...
Wait-For…
End.
15
Logging
Database
(optional)
Output
Module
Output
Module
Agenda







16
Why do you need a monitor?
Monitoring Alternatives
What Are VSTs?
A Monitoring Architecture
Customizing And Extending The Code
Basic Capabilities
Advanced Features
Customizing And Extending
The Code






17
Events That A Module Handles
Structure Of A Module
Defining the Display
Maintaining State
Adding Help
Making A Module Available
Events That A Module Handles

Mon-Restart



Mon-Init



Empty Temp-Table
Define Display Data Elements
Mon-Update



18
Empty Temp-Table
Remove self from memory
Refresh Data
Calculate intervals, rates and so forth
Update UI Temp-Table with results
Structure
Of
A
Module
{lib/protop.i}
def var support as character no-undo initial “Resources”.
{lib/tt_xstat.i}
procedure mon-restart:
empty temp-table tt_xstat.
delete procedure this-procedure.
end.
procedure mon-init:
empty temp-table tt_xstat.
/* define labels */
end.
procedure mon-update:
/* the real work */
end.
subscribe to “mon-restart” anywhere run-procedure “mon-restart”.
subscribe to “mon-init”
anywhere run-procedure “mon-init”.
subscribe to “mon-update” anywhere run-procedure “mon-update”.
publish “register-disp-type” ( input support ).
19
Defining the Display
ui-define-label(
ui-define-label(
ui-define-label(
ui-define-label(
ui-define-label(
ui-define-label(
support,
1,
8,
"stat-ratio",
" Lock%“
).
20
support,
support,
support,
support,
support,
/*
/*
/*
/*
/*
1,
1,
1,
1,
1,
1,
2,
5,
6,
8,
"xid",
" Id" ).
"xname", "Resource
" ).
"stat1", "
Locks" ).
"stat2", "
Waits" ).
"stat-ratio", " Lock%" ).
display type
variant
order
data element name
label value
*/
*/
*/
*/
*/
Maintaining State
define temp-table tt_xstat no-undo
field xid
as integer
field xvalid
as logical
field xname
as character
field misc1
as character
field misc2
as character
field stat1
as integer extent 5
field stat2
as integer extent 5
field stat3
as integer extent 5
field stat-ratio as decimal
index xid-idx is unique primary xid.
21
Sample, Summary, Rate & Raw
Data
•BaseValue
•LastValue
•ThisValue
•SampleTime
•SummaryTime
SampleRate
= (ThisValue – LastValue) / SampleTime.
SummaryRate = (ThisValue – BaseValue) / SummaryTime.
22
SampleRaw
= (ThisValue – LastValue) / 1.
SummaryRaw
= (ThisValue – BaseValue) / 1.
Updating Data
for each dictdb._Resrc no-lock:
run update_xstat (
input _Resrc-Id,
input _Resrc-name,
input "", input "",
input _Resrc-lock,
input _Resrc-wait,
input 0 ).
end.
ui-det(support, 1, i, 1, "xid",
string(tt_xstat.xid,
">>9")).
ui-det(support, 1, i, 2, "xname",
string(tt_xstat.xname,
"x(20)")).
ui-det(support, 1, i, 5, "stat1",
string((tt_xstat.stat1[x]/z), ">>>>>>>>>9")).
ui-det(support, 1, i, 6, "stat2",
string((tt_xstat.stat2[x]/z), ">>>>>>>>>9")).
ui-det(support, 1, i, 8, "stat-ratio",
string(tt_xstat.stat-ratio, ">>9.99%")).
23
Adding Help






24
Help files are in the “hlp” directory.
File name is value(“hlp/” + support + “.hlp”)
Title the screen.
Provide an overview of the screen. Try to
explain why the metrics are important and how
they are related to other metrics.
Define each label and give some insight into its
meaning.
Provide explanations of any codes that might
appear under a label.
FileIO.hlp
IO Operations to Database Extents
Id:
The extent id number.
Extent Name:
The file name of the extent.
Mode:
The "mode" in which the file is opened. Possible
values are:
BUFIO
The extent is opened for buffered IO.
UNBUFIO The extent is opened for un-buffered IO.
BOTHIO
The extent is opened for both buffered and un-buffered
IO. Variable extents are opened with BOTHIO (there are
two file descriptors unless you're using -directio.)
BlkSz:
The Block size for the extent. This potentially
varies between data, before-image and after-image extents. Values
are expressed in bytes.
25
Making A Module Available

Drop it into the “mon/” directory.


If it is OS specific use the “os/” directory



26
“mon/mymetric.p”
“os/AIX/df.p”
“os/Linux/netstat.p”
Send me a copy so that I can include it in
the base distribution!
Agenda







27
Why do you need a monitor?
Monitoring Alternatives
What Are VSTs?
A Monitoring Architecture
Customizing And Extending The Code
Basic Capabilities
Advanced Features
28
Basic Capabilities









29
Summary Data
Blocked Clients & Open Transactions
Table & Index Activity
User Activity
Estimating Big B
Latches & Resources
Storage Area Capacity
Balancing IO
Clients & Servers
Summary Data
11:32:52
ProTop xvi -- Progress Database Monitor
Sample
Hit Ratio:
Miss% :
sports [/db/sports]
182:1
195:1
Commits:
149
0.549% 0.512%
Latch Waits:
13
Hit% : 99.45%
99.48%
Log Reads:
76342
80927
OS Reads:
419
Rec Reads:
23789
Tot/Mod Bufs: 60002
195 Sessions: 2057
16
Local:
953
3167
Remote:
956
1
414
Lock Table: 1516
3
Server:
97
23619
LkHWM|OldTrx: 1392
00:00
Other:
51
54
TRX:
26
Blocked:
0
3.2091 3.4264
Area Full:
1 98.60%
Evict Bufs:
Rate
2
Log/Rec:
30
07/05/05
Old/Curr BI:
54
After Image: Disabled
Batch: 1045
BI Clusters
for each _Trans no-lock where _Trans-usrnum <> ?:
if _Trans-counter <> ? and _Trans-counter > 0 then
do:
if oldbi = 0 or _Trans-counter < oldbi then
oldbi = _Trans-counter.
currbi = max( currbi, _Trans-counter ).
end.
end.
find _BuffStatus no-lock.
currbi = _BfStatus-LastCkpNum.
if oldbi = 0 then oldbi = currbi. /* if no TRX is active… */
31
Blocked Sessions
Blocked Sessions
Usr Name
Waiting
Note
--- -------- -------- ---------------------------------24 tom
00:00:32 REC XQH 102 [Order] julia, peter
22 tucker
00:00:02 REC XQH 201 [Cust] astro, tiger
321 julia
32
00:00:00 BKSH:83524928:
Locked Records
for each _Lock no-lock while _Lock-usr <> ?:
if _Lock-recid = _Connect-wait1 then
do:
find _file where _file._file-num = _Lock-table.
bxtbl = _file._file-name.
end.
if _Lock-usr = _Connect-usr then
bxwait = bxwait + “ “ + _Lock-flags.
else
bxque = bxque + " " + _Lock-name.
end.
bxnote = bxtbl + bxwait + bxque.
33
Open Transactions
Open Transactions
Usr Name
TRX Num BI Clstr Start
Trx Stat Duration Wait
---- ----- -------- -------- -------- -------- -------- ----------
34
9 tom
2432897
20 jami
2432896
1024 15:39:05 ACTIVE
-
00:00:01
--
29440
ALLOCATE 00:00:00
--
20115
5 emily
2432898
1024 15:39:06 ACTIVE
00:00:00
--
21952
7 peter
2432899
1024 15:39:06 ACTIVE
00:00:00
--
19040
23 julia
2418661
-
ALLOCATE 00:00:00
--
0
22 astro
2417938
-
ALLOCATE 00:00:00
--
0
Table Activity
Table Statistics
Tbl# Table Name
Create
Read
Update
Delete
---- ---------------- --------- --------- --------- --------4 OrderLine
1
28715
11
1
18 Order
0
2384
1
0
24 POLine
0
848
1
0
23 PurchaseOrder
0
627
40
0
21 Bin
0
216
0
0
18
175
20
20
1
148
3
0
2 Customer
1 Invoice
35
Index Activity
Index Statistics
Idx# Index Name
Create
Read
Split Delete BlkDel
---- --------------- -- ------ ------ ------ ------ -----904 usage
78 journal
14
31597
0
13
0
0
21011
0
0
0
0
7376
0
0
0
0
1995
0
0
0
0
1991
0
0
0
U
0
945
0
0
0
PU
1
783
0
1
0
P
435 keyindex
388 icest
PU
1251 keyindex
1247 warehs
900 stuff
36
User IO Activity
UIO
Usr Name
Flags PID
DB Access OS Rd OS Wr
Hit%
---- ------- ----- ------ --------- ----- ----- ------13 tom
SB
13590
2266
200
1
91.13%
10 jami
SB
13584
190
6
1
97.10%
16 julia
SB
13596
185
6
1
97.03%
17 peter
SB
13598
181
5
1
97.07%
15 emily
SB
13594
177
5
1
97.12%
11 tiger
SB*
13586
166
4
0
97.58%
14 tucker
SB
13592
159
5
1
97.10%
19 granite SB
13602
146
1
0
99.25%
13578
145
4
1
97.16%
7 astro
37
SB
Estimating Big B
Big B GuessTimator
Pct
38
Big B % db Size Hit:1
Miss%
Hit% OS Rd
----- --------- --------- -----
------
------- -----
10%
6000
0.124%
30
3.306%
96.694%
1343
25%
15001
0.311%
48
2.091%
97.909%
849
50%
30001
0.622%
68
1.479%
98.521%
601
100%
60002
1.243%
96
1.046%
98.954%
425 <=
150%
90003
1.865%
117
0.854%
99.146%
347
200%
120004
2.486%
135
0.739%
99.261%
300
400%
240008
4.973%
191
0.523%
99.477%
213
Big B
http://www.peg.com/lists/dba/history/200301/msg00509.html
MissPct = 100 * ( 1 – ( LogRd – OSRd ) / LogRd )).
HitPct
= 100 – MissPct.
OSRd
= LogRd * ( MissPct / 100 ).
m2 = m1 * exp(( b1 / b2 ), 0.5 ).
39
Resource Waits
Resource Waits
Id Resource
Locks
Waits
Lock%
--- -------------------- ---------- ---------- ------10 DB Buf S Lock
2661
0 100.00%
658
0 100.00%
7 DB Buf Read
40
0 100.00%
2 Record Lock
21
0 100.00%
11 DB Buf X Lock
11
0 100.00%
19 TXE Share Lock
11
0 100.00%
3
0 100.00%
2
0 100.00%
1 Shared Memory
0
0
0.00%
3 Schema Lock
0
0
0.00%
6 Record Get
8 DB Buf Write
21 TXE Commit Lock
40
Latch Waits
Latch Waits
Id Latch
Requests
Waits
Lock%
--- -------------------- ---------- ---------- -------
41
28 MTL_BF4
5540
33
99.40%
17 MTL_BHT
4205
106
97.49%
21 MTL_LRU
4154
55
98.68%
10 MTL_LHT
1800
24
98.65%
15 MTL_LKF
1798
0 100.00%
26 MTL_BF2
1218
6
99.48%
27 MTL_BF3
1184
10
99.13%
25 MTL_BF1
1150
10
99.16%
4 MTL_OM
913
4
99.60%
Storage Area Capacity
Area Statistics
A# Area Name
-- -----------68 order_idx
67 order
6 Schema Area
3 BI Area
13 customer
92 After Image
49 order-line
61 inventory
55 discount
57 employee
42
Alloc
------16
256
256
32000
512000
0
32000
128000
1024000
2048000
Var Hi Water
Free %Used
Note
----- -------- ------ ------- ------1998
1927
87 12044% i(3)
14670
14860
66
5805% t(1)
1454
1391
319
543% i(25) *
13070
45056
14
141%
55565
567515
50
111% t(15)
5199
5191
8
100% Busy
2
25164
6838
79% t(1)
2
94897 33105
74% t(1)
0
755885 268114
74% t(1)
0 1442919 605076
70% t(1)
Storage Area Capacity
for each _AreaStatus no-lock, _Area no-lock where
_Area._Area-num = _AreaStatus._AreaStatus-Areanum:
bfree = _AreaStatus-Totblocks - _AreaStatus-Hiwater.
if ( _AreaStatus-Freenum <> ? ) then
bfree = bfree + _AreaStatus-Freenum.
if bfree = ? then bfree = _AreaStatus-totblocks.
used = (( _AreaStatus-totblocks - bfree) /
_AreaStatus-totblocks ) * 100.
end.
43
Storage Area Contents
for each _storageobject no-lock where
_storageobject._area-number = xid and
_storageobject._object-num > 0 and
_storageobject._object-associate > 0:
if _storageobject._object-type = 1 then
so_tbl = so_tbl + 1.
else if _storageobject._object-type = 2 then
so_idx = so_idx + 1.
end.
/* ianum = initial area number…
44
*/
Balancing IO
Database File IO
Id Ext Name
Mode
Blksz
Size Read Wrt Ext
---- ---------- ------- ----- ------- ----- --- --63 s2k_29.d1 F UNBUF 8192 2048000 11828
0
0
64 s2k_29.d2 F UNBUF 8192 2048000 7790
0
0
124 s2k_55.d2 F UNBUF 8192 2048000
432
0
0
125 s2k_55.d3 F UNBUF 8192 2048000
367
8
0
123 s2k_55.d1 F UNBUF 8192 2048000
220
0
0
67 s2k_30.d1 F UNBUF 8192 2048000
106
0
0
57 s2k_26.d1 F UNBUF 8192 1024000
26
2
0
128 s2k_56.d1 F UNBUF 8192 2048000
19
1
0
135 s2k_57.d6 F UNBUF 8192 2048000
12
0
0
140 s2k_58.d2 F UNBUF 8192 1024000
11
1
0
121 s2k_54.d1 F UNBUF 8192 256000
7
0
0
139 s2k_58.d1 F UNBUF 8192 1024000
6
0
0
134 s2k_57.d5 F UNBUF 8192 2048000
5
0
0
69 s2k_31.d1 F UNBUF 8192 128000
4
0
0
73 s2k_33.d1 F UNBUF 8192 128000
3
0
0
3 s2k.b2
V UNBUF 16384
0
0
0
0
45
Servers and Clients
Servers
Srv Type
Port Con Max MRecv MSent RRecv RSent QSent Slice
--- ----- ----- --- --- ----- ----- ----- ----- ----- ----1 Login
7150
0
1
0
0
0
0
0
0
2 Auto
1026
10
55
0
0
0
0
0
0
3 Auto
1027
10
55
23
13
0
6
10
86
Server IO
Srv Type
Port Con Max DB Access OS Rd
OS Wr
Hit%
--- ----- ---- --- --- --------- ----- ------ -------
46
19 Auto
1043
10
55
5041
2
0
99.96%
20 Auto
1044
10
55
1348
1
0
99.96%
18 Auto
1042
10
55
157
1
0
99.51%
16 Auto
1040
10
55
42
1
0
98.70%
Agenda







47
Why do you need a monitor?
Monitoring Alternatives
What Are VSTs?
A Monitoring Architecture
Customizing And Extending The Code
Basic Capabilities
Advanced Features
Drill Down
User Details
Usr#:
23
Name: tom
Transaction:
PID:
18570
Device:
/dev/pts/3
Jul 7 15:20:36 2005 ACTIVE 00:00:45 REC 5892
Blocked On:
REC
XQH 5892 [Customer] peter
User 23's Other Sessions
Usr Name
Flags PID
DB Access
OS Rd
OS Wr
Hit%
---- ------- ----- ------ ---------- ------- ------ ------23 tom
48
S *
18570
9
2
0
81.61%
0 tom
O
18017
0
0
0
0.00%
22 tom
S
18542
8534
134
15
98.43%
24 tom
S
18576
3964
64
31
98.38%
ProTop Alerts
49
Alerts & Alarms
# $PROTOP/etc/alert.cfg
#
# Metric
Type
? Target Message
Action
# ========= ==== == ====== =========== ====================
50
LogRd
num
> 100000
"&1 &2 &3" alert-log
OSRd
num
>
500
"&1 &2 &3" alert-log
BufFlsh
num
>
0
"&1 &2 &3" alert-log,alert-mail
Trx
num
>
200
"&1 &2 &3" alert-log,alert-mail
LatchTMO
num
>
200
"&1 &2 &3" alert-log,alert-mail
ResrcWt
num
>
200
"&1 &2 &3" alert-log,alert-mail
Summary








51
Reasons to monitor.
Some tools that are available for monitoring.
How Progress VSTs work.
An architecture for monitoring.
How to modify and extend ProTop.
What ProTop can do for you “out of the box”.
What is “under the covers” of ProTop.
How to use VSTs more effectively.
Questions
52
Thank you for
your time!
[email protected]
http://www.greenfieldtech.com
53