332_debghb_v2 - PUG Challenge Americas

Download Report

Transcript 332_debghb_v2 - PUG Challenge Americas

PUG Challenge Americas 2013 – Westford, MA

1 And how it may help in troubleshooting certain DB problems

Presented by: Gus Bjorklund & Dan Foreman

PUG Challenge Americas 2013

2

Gus Bjorklund

• Progress Wizard PUG Challenge Americas 2013

3

Dan Foreman

• • • • • Progress User since 1984 Author of several Progress related Publications

News Flash

– A new publication which is a superset of this presentation, titled

promon debghb

, will be available shortly Author of several cool and useful Progress DBA Tools – – – ProMonitor & ProCheck & LockMon Pro Dump&Load Balanced Benchmark Basketball Fanatic…which sometimes leads to unexpected trips to the ER PUG Challenge Americas 2013

4

Dan Foreman

PUG Challenge Americas 2013

5

Brief History of the debghb option in

promon

• • • • • Added to

promon

V6.3 by Gus Purpose: The shared memory architecture introduced in V6.3 was quite a bit different and a way to monitor shared memory activity at a detailed level was needed The debghb option was not a formally endorsed enhancement but written by Gus in his spare time deb = DEBug ghb = Gus’s initials….the middle initial is a Top Secret fanatically guarded by the Finnish government PUG Challenge Americas 2013

6

Warnings

• • • • • • • The debghb option is not documented by Progress

DO NOT

option call or email Gus or PTS for help in using this Many of the screens and/or metrics have no value to a DBA The view of the data is not transactionally consistent sometimes even on the same screen; example to follow Some of the data is not accurate (data overflow, rounding errors, etc) Some of the screens are broken (don’t display any data) The debghb option can be altered, removed, or hidden by Progress any time they want to PUG Challenge Americas 2013

7

Warnings #2

• DB activities in shared memory can be slowed down if certain options are enabled 05/08/13 OpenEdge Release 10 Monitor (R&D) 19:10:44 Adjust Latch Options 1. Spins before timeout: 24000

2. Enable latch activity data collection 3. Enable latch timing data collection

4. Initial latch sleep time: 10 milliseconds 5. Maximum latch sleep time: 5000 milliseconds 6. Record Free Chain Search Depth Factor: 5 PUG Challenge Americas 2013

8

How do I access debghb?

• • • • • Start

promon

Enter: R&D (also works in lower case starting in V10) Enter: debghb You have now entered the debghb “zone” Two main differences in the world of debghb – – Extensions to some existing R&D screens Enables access to “This menu is not here Menu” • • Enter: “6” even though there is no visible option 6 See next slide PUG Challenge Americas 2013

9

This menu is not here Menu

06/06/13 OpenEdge Release 10 Monitor (R&D) 15:23:01 This menu is not here Menu 1. Cache Entries 2. Hash Chain 3. Page Writer Queue 4. Lru Chains 5. Locked Buffers 6. Buffer Locks 7. Buffer Use Counts

8. Resource Queues

9. TXE Lock Activity 10. Adjust TXE Options

11. Latch Counts

12. Latch Times 13. I/O Wait Time by Type 14. I/O by File

15. Buffer Lock Queue

16. Semaphores 17. Shutdown PUG Challenge Americas 2013

10

Operating Hints

• • • • • Allow at least 40-45 lines of screen data Allow at least 120-140 columns of screen width Zero out the stats (“z”) to get a clean starting place – This ‘zeroing’ does not wipe out the actual shared memory counters but only affects the current

promon

session Update the stats periodically (“u”) to get snapshots All the above can be scripted PUG Challenge Americas 2013

11

Operating Hints - User# -1

• • • Usecnt = # of concurrent processes accessing the block When initially examining the BLQ there were 5 Clients accessing the same DBKEY But before all 5 could be displayed: – One Client dropped off, i.e. released the Buffer Lock, before they could be displayed – Another one of the 5 is partially displayed; i.e. the -1 User# 02/06/13 Status: Buffer Lock Queue 00:37:21 User DBKEY Area T Status -1 746 762 826 70658368 34 I LOCKED 70658368 34 I LOCKED 70658368 34 I LOCKED 70658368 34 I LOCKED Type Usect SHARE SHARE SHARE SHARE 5 3 3 3 PUG Challenge Americas 2013

12

Useful Screens - Checkpoints

• • • • Extensions to the ‘normal’ Checkpoint screen Columns of interest – – Duration: the amount of time required to complete a Checkpoint; the entire Database is transactionally frozen during this time • _CheckPoint._CheckPoint-Duration (V10.2B SP5) Sync Time: a subset of the ‘Duration’ column; the amount of time required to execute

fdatasync()

system call • _CheckPoint._CheckPoint.Synctime (V10.2B SP5) See http://www.makelinux.net/alp/060 description of

fdatasync

for an excellent (don’t confuse it with

fsync

).

Sample data on the next slide PUG Challenge Americas 2013

Useful Screens - Checkpoints

• • • • The ‘Duration’ of the Checkpoints (i.e. the total freeze time) is very high for most of the CPs displayed A ‘Duration’ of less than 1 second is a good goal The 10 sec ‘Duration’ is approximately 1/3 of the CP ‘Len’ In other words, a CP is occurring approximately every 30 seconds and for 10 seconds of that period, NO transaction activity can take place.

Ckpt No. Time Len Freq ------ Database Writes --- Dirty CPT Q Scan APW Q Flushes

Duration Sync Time

2499 00:56:04 246 265 14097 14586 3003 62 0 2498 00:52:16 212 228 33792 35191 2915 199 0 2497 00:51:40 33 36 34536 37114 229 85 0 2496 00:51:09 2495 00:50:34 2494 00:49:59 2493 00:49:29 30 34

34

29 31 34385 36950 164 35 37992 40285 408 35 40933 43429 132 30 41377 43427 690 151 0 341 0 363 0 281 0 2.12 6.49 7.15 7.22 8.33

10.20

5.39 0.42

4.85

5.96

5.89

7.15

9.05

3.84

13 PUG Challenge Americas 2013

14

Useful Screens – Resource Queues

• • • • • NHM (Not Here Menu) - Option #8 Do not confuse Resources with Latches Link to Banville’s presentation In general the busiest locks will be: – – – DB Buf S Lock DB Buf X Lock Record Lock Waits that can be problematic: – – DB Buf I Lock (I = Intent but these are for Index blocks) Sample on the next slide PUG Challenge Americas 2013

15

Useful Screens – Resource Queues

01/31/13 Activity: Resource Queues 00:31:57 01/31/13 00:26 to 01/31/13 00:31 (5 min) Queue - Requests ------- Waits ------ Total /Sec Total /Sec Pct Record Lock 1007903 3360 8 0 0.00

Trans Commit 1631 5 0 0 0.00

DB Buf I Lock 1006724 3356 139476 465 0.14

Record Get 724869 2416 0 0 0.00

DB Buf Read 305596 1019 0 0 0.00

DB Buf Write 62727 209 0 0 0.00

DB Buf S Lock 33370848 111236 159591 532 0.00

DB Buf X Lock 1092894 3643 157022 523 0.14

DB Buf S Lock LRU2 20934886 69783 0 0 0.00

DB Buf X Lock LRU2 11088 37 0 0 0.00

DB Buf Write LRU2 3367 11 0 0 0.00

BI Buf Read 4788 16 0 0 0.00

BI Buf Write 16075 54 1096 4 0.07

TXE Share Lock 1148821 3829 0 0 0.00

TXE Update Lock 10347 34 282 1 0.03

TXE Commit Lock 63540 212 1927 6 0.03

PUG Challenge Americas 2013

16

Useful Screens – Latch Counts

• • • • • NHM (Not Here Menu) - Option #11 The R&D Blocked Clients screen doesn’t show Latch contention so debghb is the only place in

promon

where detailed Latch activity is visible Definition of Naps: When –spin is ‘used up’ by a Progress Client, the process Naps (i.e. does no useful work) for a while and tries again General Principle: Napping is bad Samples on the next few slides PUG Challenge Americas 2013

17

Latch Counts – OM Latch

• OM (Object Cache) Latch activity can be totally eliminated by setting the -omsize parameter equal to or greater than the number of _StorageObject records.

04/24/13 Activity: Latch Counts 00:59:28 04/24/13 00:54 to 04/24/13 00:59 (5 min 1 sec) Owner ----- Locks ---- --- Busy -- Total /Sec /Sec Pct Naps -------- Spins ------- /Sec /Sec /Lock -- Nap Max /Busy Total HWM MTX - USR - OM - 1563935 5195 86 1.6 1178290 3914 3 0.0 21523860 71507 7322 10.2 45 0 139 6461297 1243 74724 0 9144588 0 127 0 1248 259 300 0 0 3 80

PUG Challenge Americas 2013

Latch Counts – USR Latch

18 • The small contention on the USR (DB Connection Table) Latch is because Statement Caching is enabled

04/25/13 Activity: Latch Counts 00:33:17 04/25/13 00:28 to 04/25/13 00:33 (5 min 0 sec) ----- Locks ---- Owner Total ---- Busy -- /Sec /Sec Pct Naps --------- Spins --------- /Sec /Sec /Lock --- Nap Max /Busy Total HWM MTX - USR - 2402181 1517252 8007 5057 17 8 0.2 45 0.1 0 OM 4343 27667792 92225 1962 2.1 32 BIB - SCH - LKP - 2170630 447 90680 7235 1 302 0 0 0 0.0 1 0.0 0 0.0 0 GST - TXT - SEQ - AIB - TXQ - EC - LKF - BFP - BHT - PWQ - CPQ - LRU - LRU - BUF - 195 703633 505781 1947241 3304146 0 1834458 0 491 535397 812009 0 0 2345 1685 6490 11013 0 6114 0 63583335 211944 1 1784 2706 0 39136969 130456 0 0 0 0 5 0 1 0 40 0 0 0 0 13 0.0 0 0.0 0 0.0 0 0.0 0 0.0 1 0.0 0 0.0 0 0.0 0 0.0 9 0.2 0 0.0 0 0.0 2 0.0 0 0.0 1 2059022 0 5781905 49241 0 6 200 811 18283 4957 116828 0 16091 0 1511922 23 23253 129408 0 191806 257 0 62 6 0 0 307 0 10 0 10 0 2 0 7 14 13 47 0 1 120481 0 2946 165982 0 923 0 10149 66889 43745 20788 0 15673 0 37429 7184 47135 160424 0 13868 133 300 0 0 0 80 11 300 0 0 0 10 1 10 0 10 4 10 1 20 0 20 0 0 7 10 0 0 13 300 0 0 0 160 0 80 0 0 3 40

PUG Challenge Americas 2013

19

Latch Counts – LRU Chain

• • The total number of Locks for LRU is the second highest of all the Resources shown (BHT – Buffer Hash Table – is #1) The # of Naps per Second is the highest of all latches (Zero is the goal) 01/31/13 Activity: Latch Counts 00:05:38 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) Owner ----- Locks ---- ------ Busy ----- Total /Sec /Sec Pct Naps /Sec MTX - OM - BHT - CPQ --

LRU 830

LRU - BUF - BUF - BUF - BUF - 1654034 5513 8216844 27389 62371320 207904 197126 657

40395944 134653

36 0 32676880 108922 39818994 132729 31278094 104260 33342130 111140 0 0.0 0 0.0 0 0.0 0 0.0

0 0.0

0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 15 1 3 0

1402

0 0 529 8 3 PUG Challenge Americas 2013

20

Latch Counts – LRU Chain

• The # of locks on the second LRU (Alternate Buffer Cache) is nil because all the ABC Objects completely fit in the amount of –B2 memory allocated 01/31/13 Activity: Latch Counts 00:05:38 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) Owner BHT - CPQ --

LRU 830

LRU - BUF - BUF - ----- Locks ---- Total /Sec /Sec Pct 62371320 207904 197126 657

40395944 134653

36 0 ------ Busy ----- 0 0.0 0 0.0

0 0.0

0 0.0 Naps /Sec 3 0

1402

0 32676880 108922 0 0.0 0 39818994 132729 0 0.0 529 PUG Challenge Americas 2013

21

Latch Counts – LRU Chain

• ‘Owner’ column: if the User# doesn’t change (in value or frequency) that can be a problem indicator because Latches should be held for only a fraction of a second 01/31/13 Activity: Latch Counts 00:05:38 01/31/13 00:00 to 01/31/13 00:05 (5 min 0 sec) Owner BHT - CPQ - ----- Locks ---- Total /Sec /Sec Pct 62371320 207904 197126 657 ------ Busy ----- 0 0.0 0 0.0

LRU 830

LRU - BUF - BUF - 40395944 134653 36 0 0 0.0 0 0.0 Naps /Sec 3 0 1402 0 32676880 108922 0 0.0 0 39818994 132729 0 0.0 529 PUG Challenge Americas 2013

22

Using Latch Counts to set -spin

• • • • • Short answer – Forget It!

If it was that easy Progress would have done it already Past attempts have not been successful Also the optimal value of –spin is not going to be the same for each Latch General guidelines: – – – Greater than 1000 Less than 100000 – – Current Default: 6000 * (# of CPU Cores) • Default not advised if you have more than 16 Cores Dan’s (Patent Pending) Formula: (DBA-Birthday-Year *  ) Gus’s formula: 5000 PUG Challenge Americas 2013

23

Useful Screens – Buffer Lock Queue

• • • • NHM (Not Here Menu) - Option #15 The ‘normal’ R&D Blocked Clients screen does not show the Area that the DBKEY belongs to The Buffer Lock Queue (BLQ) Screen shows the Area as well as the Block Type Examples on the next two slides PUG Challenge Americas 2013

24

R&D Blocked Clients

• • The R&D Blocked Clients screen doesn’t show enough information to identify the Object involved in this contention storm for DBKEY 65987456 There were

29 Clients

all blocked on the same DBKEY 01/31/13 Status: Blocked Clients 00:26:41 Usr Name Type Wait Wait Info Trans id Login time 730 _AUTO-B SELF/ABL BKSH 735 _AUTO-B SELF/ABL BKSH 743 _AUTO-B SELF/ABL BKSH 747 _AUTO-B SELF/ABL BKSH 749 _AUTO-B SELF/ABL BKSH 755 _AUTO-B SELF/ABL BKSH 769 _AUTO-B SELF/ABL BKSH 65987456 601383708 01/30/13 23:22 65987456 601383773 01/30/13 23:23 65987456 601383921 01/30/13 23:22 65987456 601384104 01/30/13 23:22 65987456 601383895 01/30/13 23:23 65987456 601384175 01/30/13 23:23 65987456 601384161 01/30/13 23:22 PUG Challenge Americas 2013

Buffer Lock Queue

25 • • IF there is a matching DBKEY on the the BLQ screen, we can get the Area# and the Block Type (I = Index) There were 29 processes on the Blocked Clients screen with this DBKEY and only 4 on the BLQ screen with the same DBKEY 01/31/13 Status: Buffer Lock Queue 00:26:41 User DBKEY Area T Status Type Usect -1 722 772 856 859 65987456 65987456 65987456 65987456 34 I 34 I 34 I 65987456 34 I 34 I LOCKED LOCKED LOCKED LOCKED LOCKED SHARE SHARE SHARE SHARE SHARE 4 4 4 4 4 PUG Challenge Americas 2013

26

Thank You!

Questions?

• •

Gus: [email protected]

Dan: [email protected]

PUG Challenge Americas 2013