Caché system-wide metrics - InterSystems Symposium 2013
Download
Report
Transcript Caché system-wide metrics - InterSystems Symposium 2013
Caché Performance
Troubleshooting
Part II
The System
Vik Nagjee
Product Manager, Kernel Technologies
System Performance: Limiting Factors
System system-wide metrics
Caché system-wide metrics
Significance: Caché system-wide metrics
What are your users experiencing?
How busy is
Your
database?
How well is
your application
using
database
cache?
How well is
your disk
system
responding?
Collecting: system-level metrics system-wide
PERFDAT
sar
| glance | nmon
T4
Resource and Performance Monitor
iostat
| vmstat
MONITOR
logman
top | topas
Process Explorer
Collecting: system-wide Caché metrics
Collecting Caché metrics: GLOSTAT
• %SYS>DO ^GLOSTAT
Collecting Caché metrics: ^pButtons
• %SYS>DO ^pButtons
• Installed in %SYS since 2008.2 but
• The latest version (currently 1.15c) is available at
ftp://ftp.intersystems.com/pub/performance/
• Can be automated via TASKMGR
• Low overhead – logging data that’s already
available.
• Documented in the Caché Monitoring Guide
11
The performance “button” report (^pButtons)
Notes on using ^pButtons
• Profiles are configurable:
• Create custom duration and interval combinations
• Add or delete from the OS level metric collection
• Collect the logs into one easy-to-use .html file:
%SYS>DO Collect^pButtons
• Preview a currently running profile’s data:
%SYS>DO Preview^pButtons(runid)
• Available at any point while profile is running.
• May result in some truncated data.
Collecting Caché metrics: Monitors
• Caché History Monitor – SYS.History
• Collect Caché metrics and User-defined metrics over time
• Stored in your Caché database
• Query or export the data using a variety of methods
• Caché System Monitor – %Monitor.Health
• Monitor the system health of your database
• Alerts on abnormal metrics based on configurable criteria
• Alerts from the System Monitor in cconsole.log:
04/01/13-13:55:55:847 (13897) 1 [SYSTEM MONITOR] CPUusage
Warning: CPUusage = 82 ( Warnvalue is 75)....(repeated 1 times)
Collecting Caché metrics: SNMP/WMI
• SNMP, WMI, WSMON
• Documented in the Caché Monitoring Guide
• Caché metrics are exposed via the SNMP or WMI or Web
services
• NOTE: Future focus is on SNMP
• Add CUSTOM application-specific metrics to be exposed
• Use your EXISTING network management infrastructure
to collect and alert on Caché metrics, your application
metrics and operating system metrics
System-level clues to performance issues
• CPU
• Lack of processing cycles ( 0% CPU Idle)
• Blocked processes (run queue or device queuing)
• Disk
• Abnormal disk IO rate
• Queuing on devices
• Higher than normal latency on busy disk
• Memory
• Lack of free memory
• Hard page faults
Caché-level clues to performance issues
• GloRefs and/or RouCmds
• Higher than normal?
• Your app will be using more CPU…
• Are there extraneous processes or more users?
• Lower than normal?
• Your app may be struggling with another problem (slow disk)
• Concurrency issues
• Blocked users upstream on the network
Caché-level clues to performance issues
• PhysBlkRds
• Higher than normal?
• Cache size doesn’t match current load
• Use of CACHETEMP is forcing more disk reads for other data
• Lower than normal?
• Maybe that’s ok
• App is struggling elsewhere such as lack of CPU cycles
• If coupled with abnormally low GloRefs maybe disk latency issue
Application clues!
• All the above coupled with application-level
clues lead to solutions:
• Are users complaining?
• Is the rate of application activity the same?
• Are batch-jobs/print jobs/screen refreshes completing
in a timely manner?
• Are your interfaces queuing?
0
21
10:00:00
10:00:40
10:01:20
10:02:00
10:02:40
10:03:20
10:04:00
10:04:40
10:05:20
10:06:00
10:06:40
10:07:20
10:08:00
10:08:40
10:09:20
10:10:00
10:10:40
10:11:20
10:12:00
10:12:40
10:13:20
10:14:00
10:14:40
10:15:20
10:16:00
10:16:40
10:17:20
10:18:00
10:18:40
10:19:20
10:20:00
Comparing metrics – Load measure
500
400
300
Users
200
100
200
100
0
22
10:00:00
10:00:40
10:01:20
10:02:00
10:02:40
10:03:20
10:04:00
10:04:40
10:05:20
10:06:00
10:06:40
10:07:20
10:08:00
10:08:40
10:09:20
10:10:00
10:10:40
10:11:20
10:12:00
10:12:40
10:13:20
10:14:00
10:14:40
10:15:20
10:16:00
10:16:40
10:17:20
10:18:00
10:18:40
10:19:20
10:20:00
Comparing metrics – add App Metric
500
0.7/min/user
400
0.8/min/user
300
0.8/min/user
0.9/min/user
0.8/min/user
Users
Accts Logged
0
23
10:00:00
10:00:40
10:01:20
10:02:00
10:02:40
10:03:20
10:04:00
10:04:40
10:05:20
10:06:00
10:06:40
10:07:20
10:08:00
10:08:40
10:09:20
10:10:00
10:10:40
10:11:20
10:12:00
10:12:40
10:13:20
10:14:00
10:14:40
10:15:20
10:16:00
10:16:40
10:17:20
10:18:00
10:18:40
10:19:20
10:20:00
Comparing metrics – add Caché metric
1200
500
1000
400
800
300
600
200
400
100
200
0
Users
Accts Logged
GloRefs
Key points
• Many important metrics available for capture
• Capture the metrics at all times
• Many tools/methods for capturing metrics
• Include application-level metrics in your
capture
• Analysis for capacity or troubleshooting
begins with understanding your application’s
affects on the system.
You can reach me at: [email protected]
Thanks for attending!
Q&A