IBMAdmin2013_Pedisich_Advancedservermonitoring

Download Report

Transcript IBMAdmin2013_Pedisich_Advancedservermonitoring

Advanced Server
Monitoring and Alert
Notifications
Andy Pedisich
Technotics
© 2013 Wellesley Information Services. All rights reserved.
Your Presenter
•
•
•
One half of a pair of two hard-working IBM® Notes®
Administrators/Developers who have worked with IBM® Notes®
and IBM Domino® since version 2.1
 From Technotics, Inc. in Philadelphia, Pennsylvania – USA
Andy Pedisich
 28 years in IT
 19 years with Lotus Notes
Rob Axelrod
 23 years in IT
 19 years with Lotus Notes
1
What We’ll Cover …
•
•
•
•
•
•
•
•
Setting up the foundation for guarding your domain
Working with event generators and event handlers
Selecting a notification method
Customizing recommended actions in Domino Domain Monitoring
Tracking problem servers
Finding and tracking events that show on the console, but not in
the log
Using LotusScript to access server statistics
Wrap-up
2
What We’ll Cover …
•
•
•
•
•
•
•
•
Setting up the foundation for guarding your domain
Working with event generators and event handlers
Selecting a notification method
Customizing recommended actions in Domino Domain Monitoring
Tracking problem servers
Finding and tracking events that show on the console, but not in
the log
Using LotusScript to access server statistics
Wrap-up
3
Requirements for Efficient and Accurate Statistics
Collection
•
Two things are required for statistics collection:
 The Collect task must be running on any server that is
designated to collect the statistics
 And Not all servers should run the Collect task
 Only servers designated as collecting servers
 The EVENTS4 Monitoring Configuration database must have at
least one Statistics Collection document
 Minimum collection time should be an hour
4
There Is a Special Replica ID for Your EVENTS4.NSF
•
•
The replica ID of system databases, such as EVENTS4, is derived
from the replica ID of the Domino directory
Database
Replica ID
NAMES.NSF
852564AC:004EBCCF
CATALOG.NSF
852564AC:014EBCCF
EVENTS4.NSF
852564AC:024EBCCF
ADMIN4.NSF
852564AC:034EBCCF
Notice that the first two numbers after the colon for the
EVENTS4.NSF replica are 02
 Make sure EVENTS4.NSF is the same replica ID
 Opening a copy from every server and putting it on your
desktop
 There’s some code on the next slide to help you do that
5
Add a Button to Your Toolbar
•
Add this code to a button on your toolbar
 This is courtesy of Thomas Bahn
 He’s a smart guy, nice guy, and sometimes brings chocolates
to his friends from Europe
 www.assono.de/blog
_names := @Subset(@MailDbName; 1) : "names.nsf";
_servers := @PickList([Custom]; _names; "Servers"; "Select
servers"; "Select servers to add database from"; 3);
_db := @Prompt([OkCancelEdit]; "Enter database"; "Enter the file
name and path of the database to add."; "log.nsf");
@For( n := 1; n <= @Elements(_servers); n := n + 1;
@Command([AddDatabase]; _servers[n] : _db) )
6
Add a Database Icon from All Servers to the Desktop
•
•
This code will prompt you to pick the servers that have the
database you want on your desktop
 Then it will prompt for the name of the database
 And open it on all the servers you’ve selected
Use it to make sure all the EVENTS4.NSF are the same replica in
your domain
7
What We’ll Cover …
•
•
•
•
•
•
•
•
Setting up the foundation for guarding your domain
Working with event generators and event handlers
Selecting a notification method
Customizing recommended actions in Domino Domain Monitoring
Tracking problem servers
Finding and tracking events that show on the console, but not in
the log
Using LotusScript to access server statistics
Wrap-up
8
Event Monitoring Details
•
•
•
Enough setting up already!
Event monitors of all types are set in the
EVENTS4 database
Two broad categories of events:
 Event handlers
 Specify the action that Domino takes
when a specific event occurs
 Event generators
 Each type of event generator has a
view that provides a list of all event
generators, plus additional
configuration information
9
Event Generators
•
•
•
Event generators deal with specific Notes/Domino issues
There are six types of event generators:
 Database Event Generator
 Domino Server Response Event Generator
 Mail-Routing Event Generator
 Statistic Event Generator
 Task Status Event Generator
 TCP Server Event Generator
 Some are used more than others
We’ll stick to the more popular ones that every administrator
should use, for starters
10
Here’s One That Everyone Should Use
•
•
•
The ACL of Names.nsf should rarely
change, so monitor it!
 Alarms should go off if it changes
Select Names.nsf
 Choose either a single server or all
servers in the domain
I like to pick all servers in the domain
 Admins won’t get away with anything!
 But I do get a storm of messages when
an ACL change occurs
 Every server tells me about
the change
11
Unused Space Event Generator
•
This is an example of the Events system actually doing something
automatically when a certain condition exists
 It’s questionable – it is going to execute the Compact task
immediately upon detection of free space threshold being
exceeded
 I could see this event being used on archive servers
 And I wish there was a way to run it during specific hours
12
Domino Server Response Generator
•
•
•
One server checks others by sending a probe
 It’s a good idea to try opening Names.nsf
 If you can’t open Names.nsf, then something is wrong!
Default is every three minutes
Default response time tolerance is 1,000 Msecs (one second)
 Your settings will depend on your own environment
13
More About Probes
•
•
The response time is a bit on the harsh side
 If you leave it at 1,000 Msecs (one second), you will receive a lot
of notifications
 You should make it ten seconds, or whatever the metrics in
your Service Level Agreement (SLA) require
Also, be careful what servers you choose to probe other servers
 Try to pick probing servers that are in the same LAN as the
probed servers
 Otherwise, your probing will actually be testing network
latency, rather than the servers themselves
 I have used these probes as a method of testing exactly that
 Network latency
14
Statistic Event Generators
•
Statistic Event Generators monitor a specific Domino or platform
statistic
 They can let you know when a stat goes over a particular
threshold
 These stat event generators are extremely valuable
 Smart administrators use them every day!
15
Complete Listing of All Statistics Is in EVENTS4.NSF
•
•
The Monitoring Configuration (EVENTS4.NSF) supplies document
detailing thresholds for each statistic
 1,193 statistic documents available
 The complete listing is in the view Statistics by Name
But only 166 of them are considered useful for setting thresholds
and are found in the default statistics view
 The default statistics thresholds view only shows documents
where the field “useful” is equal to the word “Yes”
16
Finding the “Not Useful” Stats
•
•
•
You might find that a statistic you need has been marked as not
useful
To see which are marked as not useful, full text index the
EVENTS4.nsf
Create an advanced query checking the field useful = “No”
 You might discover a statistic who’s threshold would be just
right for using
17
Why Are Most Stats Considered “Not Useful” for
Thresholds?
•
One setting on the advanced query that controls whether it will
appear in the drop-down list when you’re setting an event
generator
 Note that there are no Agent statistics in this list
18
Why No Agent Stats
•
•
It’s not that the Agent stats aren’t useful
 They might not be valuable for threshold tracking
In some releases, Agent.Hourly.UsedRunTime has a data type of
text
 We can’t set a threshold with text values
19
We Do Have a Nice Way of Seeing That Stat, Though
•
•
•
Technotics has created a
super-customized version of
the Monitoring Results
database, STATREP.NSF
 Technotics R8.5.3 statrep
 It’s the stock statrep with
added views
One of these valuable
views is Agent Stats view
You can download this from:
 www.andypedisich.com
 Look for the Admin2013 link
20
Show Me the Stats
•
•
When you issue a SHOW STAT command at the console, Domino
dumps every statistic it is tracking
Every one of these statistics is in every single one of the
documents in the STATREP.NSF database
 All you need is a view to see them
21
Static Statistics Are Not Useful for Thresholds
•
•
•
Statistics that don’t change usually represent the operating
environment of the server
 Server.Version.Notes = Release 8.5.3
 Server.Version.OS = Windows NT 5.0
 Server.CPU.Type = Intel Pentium
 Disk.D.Size = 71,847,784,448
 Mem.PhysicalRAM = 527,433,728
 Platform.Network.1.AdapterName = Intel[R] PRO_1000 MT
Server Adapter
Think these stats aren’t helpful? They are!
You can take a pretty detailed worldwide server inventory
 Just by looking at the fields in STATREP.NSF
22
Wizard Lets You Choose the Method of Handling the Event
•
•
There are lots of methods of event handing
 Which one you choose depends a lot on your infrastructure
 We’re going to talk more about the notification methods in the
next section of the presentation
For now, just remember that an event generator is fairly worthless
by itself
 Unless you have an effective event handler that tells you, in its
own way, what is going on with your servers
23
Event Handlers Are an Exquisite Gift
•
•
•
•
They can give you a heads-up about issues provided by event
generators
They also give you a free-form way of being alerted of anything
that happens in the Domino server log and most of what happens
on the Domino server console
You can use event handlers to respond to generators and certain
add-in tasks
 They are most valuable for picking out text on the console that
will mean trouble if ignored
We’re going to focus on this type of event handling, since it is
less intuitive than responding to generators or add-ins
24
Basics of the Event Handler Configuration
•
•
•
3 screens to deal with
Decide whether you want to track
an event on just a few servers or
all servers
 You might want to track a
particular event on mail
servers only
Decide what triggers a
notification
 We’re going for free-form, so
we will select “any event that
matches a criteria”
25
Second Set of Choice for Event Handling
•
•
When working with console
events, select:
 “Events can be of any type”
 “Events can be of any severity”
Then look for a particular string of
text in the event message
 This can be absolutely any text
that appears on the console
 We will explain why we are
picking the text “full
administrator access” in a
moment
26
Final Set-Up Tab for Event Handling
•
•
•
Define action to occur when the
text appears
We’ve selected email notification
 But there are over a dozen
others that we will discuss in a
few moments
Note: You can control the time of
day the event handler is on the job
 I wish they did that for event
generators
27
Why Did We Monitor the Text Full Access Administrator?
•
•
It is the highest level of administrative access to the server
 Manager access with all access privileges enabled to all
databases on the server, regardless of the ACL settings or
readername settings
 Access to any unencrypted data on the server
Your security model should make FAA almost unnecessary
 When full FAA is turned on, you want to know about it to
prevent some hooligan from doing shenanigans
28
Other Words You Should Track with Event Handlers
•
“Deleted by”
 This generally means someone has deleted a database
 Usually their mail file if they have manager access
 You’ll be getting out the back-up tapes in a minute
01/05/2013 04:02:17 PM Opened live remote console session for Andrew M Pedisich/DomLab
01/05/2013 04:04:50 PM Database ArchiveOfIncriminatingPhotos.nsf deleted by Andrew M Pedisich/DomLab
29
Other Bad Words to Watch for Extremely Inefficient
•
Here are some other words and expressions to watch for:
Expression
Issue
An exception occurred while
writing data into database
Bad news all around. You’re going have to get to the database and
run some maintenance.
Replication cannot proceed
Replication cannot proceed because it cannot maintain uniform
access control list on replicas.
This is a result of “Enforce Consistent ACL.”
RRV bucket is corrupt
RRV stands for Record Relocation Vector. It is a pointer that tells
Notes where to find a specific NoteID, and it is bad if it’s corrupted.
You can try a fixup, but it might be borked and needs a new replica.
Truncated
Try fixup. Maybe. Maybe not.
Device error
Uh oh
Database is corrupt; cannot
allocate space
This one is bad, too
B-tree structure is invalid
You never want to see a b-tree error. It usually means you have to
replace the database.
Extremely inefficient
Agent Manager: Full text operations on database “xyz.nsf” which is
not full-text indexed
30
What We’ll Cover …
•
•
•
•
•
•
•
•
Setting up the foundation for guarding your domain
Working with event generators and event handlers
Selecting a notification method
Customizing recommended actions in Domino Domain Monitoring
Tracking problem servers
Finding and tracking events that show on the console, but not in
the log
Using LotusScript to access server statistics
Wrap-up
31
We’re Circling Back to Notification Methods
•
•
Here is the panoply of notification methods
The most widely-used notification method is to send an email to
an admin group when a problem occurs
 And yet, that is also very risky, since the email system itself
might be the problem
32
Paging Dr. Howard, Dr. Fine, Dr. Howard …
•
•
•
14 ways to be notified – these 2 are the most widely used
 But not necessarily the best to use
Paging notification is a good choice, but not if you are paging
through a third-party phone system, like Verizon or AT&T
 They generally require an email to be sent
 They have no Service Level Agreement – NONE!
Sadly, due to budget and resource constraints, we generally see
these two mail or paging methods used the most in production
environments
Method
Result
Mail
Mails the event to a person Good for most events in multi-protocol environments, but as
or to a mail-in database
mentioned, it’s bad if the mail system goes down
Pager
Uses the mail address of
an alphanumeric pager
Comments
OK, but limited value because it uses mail system; if mail
itself is down, there are issues
33
The Most Important Notification Options
•
These two are the best, and there’s one more that’s not listed
Method
Result
Comments
SNMP Trap
Sends the event as an SNMP trap. Select this
method only if the specified server is running the
Event Interceptor task and the Domino SNMP
Agent.
This is truly an ideal
notification method because
it does not depend on Notes
protocols actually working
Forward event
to Tivoli Event
Console
Allows the Tivoli Enterprise Console (TEC) to
receive IBM Domino events and reformat them as
TEC events. The reformatted TEC event is then
sent to the TEC server that you specify in the
Configuration Settings document.
Check with the Tivoli team to
see if it’s possible to use this
in your environment
34
Customized Tivoli Package
•
In one case, I developed a custom monitoring solution that fed
trouble tickets into a version of the Tivoli Event Console that was
not supported by the Domino Tivoli event handler system
 When you have to deal with extreme monitoring capability with
high reliability, you sometimes need to get in deep
 This is very effective because it uses that postemsg.exe
executable on the OS level to send the message to the TEC
 Note that the message is carefully crafted to form a large
command string which sends the ticket to Tivoli
 Check with your Tivoli team to see if you can take advantage
of this method
35
Customized Tivoli Package (cont.)
•
•
As someone who creates a lot of Domino monitoring solutions, I
often have to bend the rules and do some development (Ugh!)
 Executable called postemsg.exe was placed on the c: drive of a
Windows server that was the central Domino monitoring hub
This is very effective because it uses that postemsg.exe
executable on the OS level to send the message to the TEC
 With some knowledge of LotusScript, I crafted a system to
monitor servers and send results back to the Tivoli event
console
vMess1 = {C:\Windows\System32\postemsg.exe -f F:\TECAlerts\tecserver.cfg -r CRITICAL -m "} + vLongMessage + {" }
vMess2 = {hostname="} + vReportServerName + {" }
vMess3 = {sub_source="MESSAGINGLOTUS" Mynotify_supportfilter="1" MyNotify_severity="2" }
vMess4 = {MyNotify_tin=“0066" MyNotify_atin="0066" MyNotify_msg="Domino mail server outage" }
vMess5 = {MyNotify_srcplatform="W" MyNotify_processreturncode="0" MyNotify_correlation="0" }
vMess6 = {MyNotify_app="DominoMail" MyNotify_env="Production" MESSAGING_LOTUS MESSAGING}
vMess = vMess1+ vMess2 + vMess3 + vMess4 + vMess5 +vMess6
result = Shell( vmess , 6 )
36
What We’ll Cover …
•
•
•
•
•
•
•
•
Setting up the foundation for guarding your domain
Working with event generators and event handlers
Selecting a notification method
Customizing recommended actions in Domino Domain Monitoring
Tracking problem servers
Finding and tracking events that show on the console, but not in
the log
Using LotusScript to access server statistics
Wrap-up
37
DDM Is an Advanced Topic and Is Best Used by New Admins
•
•
•
Domino Domain Monitoring (DDM) is a powerful, yet complex tool,
that is often overlooked by administrators
If you are using Domino 6, 7, or 8, you are already a proud owner
of Domino Domain Monitoring Database, and could already be
using its powerful functionality
If you’re not using DDM, you see this with each server start
01/22/2013 11:49:08 AM Warning: All Domino Domain Monitoring probes are
disabled resulting in the loss of valuable diagnostic information.
Please configure DDM probes in events4.nsf. Assess DDM reports in ddm.nsf.
38
DDM Backs Up Its Discoveries with Explanations
•
•
DDM explains the probable cause, possible solution, and
sometimes corrective actions
 That’s right; actions that will actually correct the problem
you’re experiencing
These are stored in the EVENTS4.NSF and are configurable by
you
 Let’s look for the error “ATTEMPT TO ACCESS DATABASE BY”
39
Looking in the View, “Event Messages by Text”
•
We can find that error message in the EVENTS4.NSF
 And discover how we might change report DDM produces
40
The Cause, Solution, and Corrective Action Are Listed
•
This document has all the probable cause, possible solution, and
corrective action
 These are supplied by Lotus and include the code in the
corrective action
41
Click the Link to the Modular Corrective Action
•
Clicking the link will take you to the code
 This could be in formula language, LotusScript
42
The Modular Corrective Action Is Re-Usable
•
At the bottom of the modular action, there is a list of other error
text messages that also use this action
 That same action that was written only a single time can be
used as a corrective action multiple times
43
Modular Documents – Cause, Solution, and Corrective
Actions
•
Domino 8 comes with over 1,000 modular documents
 Chances are your solutions are already there for most issues
 You can use any of the same solutions provided by IBM for
your custom solution
 Or you can add brand new ones
44
Modular Documents Let You Create Describe Issues
•
Modular documents let you add your own probable cause and
possible solution text
 And create corrective actions that are created with
formula code and LotusScript agents
45
You Can Add to the Solutions That Will Display with
the Error
•
•
Select the custom entries tab and add the description
A custom solution of composing an email to the target user can
be inserted
46
Changes the DDM Report
•
The modular document now has the “compose an email” choice
47
It Starts the Email for You
•
The code plugs in the user’s name and the database that was
being accessed
 And it’s all done with modular documents in EVENTS4.SNF
48
Role in DDM ACL That Will Restrict Who Can Use Actions
•
Many events have corrective actions associated with them
 Only users with the Execute CA role in the DDM ACL are able
to access the command actions and the corrective action
text and links
 This ensures that only qualified team members will be able to
make the changes
49
What We’ll Cover …
•
•
•
•
•
•
•
•
Setting up the foundation for guarding your domain
Working with event generators and event handlers
Selecting a notification method
Customizing recommended actions in Domino Domain Monitoring
Tracking problem servers
Finding and tracking events that show on the console, but not in
the log
Using LotusScript to access server statistics
Wrap-up
50
Dealing with Problematic Servers
•
Sometimes there are servers with issues that crop up
 We would like to collect statistics for analysis from these
systems more frequently than we do from the standard
statistics collection interval
 If you try to add a second collection interval on a server,
you’ll get this:
51
Each Server Is Allowed to Collect Stats with Only
One Interval
•
•
•
A server can only have one
collection interval
 You must create a second
collection document for another
server
 Don’t forget to add the “collect”
task to servertasks= in
NOTES.INI
Let’s look at a server that has
CPU spikes
First, we create a statistics
collection document for a second
server to take statistics from our
problem server
52
Set the Collection Interval for Five Minutes
•
•
Set collection interval for 5 minutes
 Do not check any filters!!!
 They tell the collector to ignore
the statistics you checked
Note that stats are being logged to a
database called ProblemServer.NSF
 Used exclusively to track CPU util
of Traveler task
 Note that the data in this
example has been fictionalized
for effect
53
Create a Special View That Tracks CPU Utilization
for Traveler
•
•
In this case, it’s the Traveler CPU we want to track
We create a custom view for the collecting database that only has
the server name, the time of collection, and the statistic called
Platform.Process.Traveler.1.PctCpuUtil
 This will be used to easily create a graph of the CPU activity
54
Collect the Data, Copy It as a Table from the Custom View
•
•
•
After collecting a week’s worth of data, we experience the CPU
utilization
All the data in the view is selected using Ctrl-A
 It is copied as a table
 Copying views as a table is my favorite feature in Notes
A Monitoring Results template is posted on my Web site
 A URL to this template is included at the end of the
presentation
55
Data Has Been Copied to a Spreadsheet
•
A simple paste of the data puts it into a spreadsheet where we are
ready to turn it into a chart
56
Use the Tools in Your Spreadsheet to Create a Graph
•
•
Select the columns Collection Time
and Traveler CPU
Create a graph from the data
 In this example, a scatter chart
type with smooth lines is being
used
57
The Resulting Graph
•
This produces an excellent graph of the CPU utilization over a tenday period with samples being taken at intervals of 5 minutes
 And it took less than 5 minutes to make this chart
 One adjustment was made to the x-axis formatting and the
legend was removed
58
Demonstration
•
Creating a graph of results from a custom view of collected data
59
What We’ll Cover …
•
•
•
•
•
•
•
•
Setting up the foundation for guarding your domain
Working with event generators and event handlers
Selecting a notification method
Customizing recommended actions in Domino Domain Monitoring
Tracking problem servers
Finding and tracking events that show on the console, but not in
the log
Using LotusScript to access server statistics
Wrap-up
60
Some Events Occur on the Console, but Not in the Log
•
•
•
Note: In this example, the server stops reporting at 11:04 pm
Then, at 11:27 pm, it is back on line
What happened in the interim?
Name:
Time:
Mail1/domlab
01/04 11:02:05 PM
Miscellaneous Events:
01/04/2013 11:04:17 PM
01/04/2013 11:04:31 PM
01/04/2013 11:04:31 PM
01/04/2013 11:04:33 PM
01/04/2013 11:04:35 PM
01/04/2013 11:04:38 PM
01/04/2013 11:04:43 PM
SMTP
01/04/2013 11:04:51 PM
Name:
Time:
Pulling icl.ntf from Maill2/domlab icl.ntf
Access control is set in catalog.nsf to not allow replication from BES02/domlab catalog.nsf
Access control is set in mail2/domlab catalog.nsf to not allow replication from catalog.nsf
Pulling ddm.nsf from Mail2/domlab ddm.nsf
Pushing ddm.nsf to Mail2/domlab ddm.nsf
Finished replication with server Mail2/domlab
Router: Transferred 1 messages to MAIL2.domlab.COM (host MAIL02.domlabUSA.COM) via
Opened session for Mail2/domlab (Release 8.5.2FP1)
Mail1/domlab
01/04 11:27:11 PM - 01/04 11:27:47 PM
Miscellaneous Events:
01/04/2013 11:27:11 PM Recovery Manager: Restart Recovery complete. (196/1686 databases needed full/partial recovery)
01/04/2013 11:27:11 PM Informational - The DAOS catalog is not synchronized. Deletions will be postponed. Please run 'tell
daosmgr resync' at the next convenient opportunity to re-synchronize.
01/04/2013 11:27:12 PM Event Monitor started
01/04/2013 11:27:12 PM Warning: All Domino Domain Monitoring probes are disabled res
61
There Is Action in the CONSOLE.LOG
•
•
CONSOLE.LOG and other logs are in the folder called
IBM_TECHNICAL_SUPPORT under the data folder
The CONSOLE.LOG on a server often contains data that has been
seen on the Domino server console, but not in the server log
 It shows there was a Long Held Lock Dump and then a panic!
Lock(Mode=SIX* LockID(DB DB=G:\Lotus\Domino\Data\mail\web\Complaints.nsf)) Waiters
countNonIntentLocks = 1 countIntentLocks = 1, queuLength = 95
[Req(Status=Granted Mode=IS Class=Manual Nest=0 Cnt=1
Tran=0 Func=N/A m\lkmgr.cpp:159 [0D64:0002-0D60])
rm_lkmgr_cpp:2070
rm_lkmgr_cpp:1306
nsfsem1_c:169
nsfsem1_c:1020
nsfsem6_c:503
Req(Status=Granted Mode=SIX Class=Manual Nest=0 Cnt=1
Tran=0 Func=N/A inplace.c:153 [099C:0165-12FC])
LkMgr END Long Held Lock Dump -----------------01/04/2013 11:04:51 PM Opened session for Terry Mallory/domlab (Release 8.5.2FP2)
01/04/2013 11:04:51 PM Closed session for Terry Mallory/domlab Databases accessed: 1 Documents read:
0 Documents written: 0
The server process terminated abnormally with the exit status = 1. Please send this information and the
collected nsd log to IBM Support. This process will now Panic in order to start fault recovery operations.
62
Why Did This Happen?
•
•
•
In this case, there was a large number of email messages with big
attachments waiting to be processed in the MAIL.BOXES
The server was relatively underpowered
Plus, I think the messages were part of an emailing made by
a CEO
 And we all know, the mostly visible executives have the worst
time with any piece of messaging software
63
Here’s Another Example of Helpful Console Logging
•
•
I entered the following into the
Domino server console
 Tell traveler stat show
That command generates
hundreds of lines of statistics
and other information
 It shows clearly on the
server console
64
Here’s Another Reason for Console Logging
•
•
Here’s the Domino server log showing me doing several furious
requests to the Traveler task to Tell traveler stat show
I get nothing
> tell traveler stat show
01/06/2013 12:24:49 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
> tell traveler stat show
> tell traveler stat show
01/06/2013 12:24:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
01/06/2013 12:24:55 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
> tell traveler stat show
01/06/2013 12:24:55 PM
01/06/2013 12:25:43 PM
01/06/2013 12:25:43 PM
01/06/2013 12:25:43 PM
01/06/2013 12:25:44 PM
01/06/2013 12:25:44 PM
01/06/2013 12:25:44 PM
01/06/2013 12:25:52 PM
Directory Cataloger finished processing names.nsf: Directory Catalog has no Configuration record
AMgr: Start executing agent 'PullFromAdmin4' in 'certreq.nsf' by Executive '1'
AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'PullFromAdmin4' in 'certreq.nsf'
AMgr: 'Agent 'PullFromAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab'
AMgr: Start executing agent 'SubmitToAdmin4' in 'certreq.nsf' by Executive '1'
AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'SubmitToAdmin4' in 'certreq.nsf'
AMgr: 'Agent 'SubmitToAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab'
Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
> tell traveler stat show
65
Check the IBM_TECHNICAL_SUPPORT Folder
•
•
CONSOLE.LOG from the IBM_TECHNICAL_SUPPORT folder on
the server
Whenever there are server issues, don’t forget to check the
console.log for evidence
01/06/2013 12:25:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
tell traveler stat show
CPU.Pct.000-010 = 7
ClusterCache.Access = 1
Constrained.count = 0
Constrained.state = false
DB.Connections = 1
DB.Connections.Idle = 1
DB.Connections.Max = 7000
DCA.C.CheckAccessRights = 2
DCA.C.Count.NSFDbClose = 3
DCA.C.Count.NSFDbOpen = 3
DCA.C.Count.NSFNoteClose = 2
DCA.C.Count.NSFNoteOpen = 2
DCA.C.HTMLCreateConverter = 1
DCA.C.HTMLDestroyConverter = 1
DCA.C.ModDoc.RunCount = 1
DCA.C.ModDoc.SyncableDocs = 1
66
Console Logging Configuration
•
•
•
•
•
To start a console log permanently on your servers, add this to
the NOTES.INI
 Console_Log_Enabled = 1
Use the following values
 0 – Disable Console Log file logging
 1 – Enable Console Log file logging
You can also toggle logging to the Console Log file from the
server console
 Use the start consolelog and stop consolelog commands
Obviously, this is an important feature and you’d want it to be
enabled all the time
Set a maximum size of almost 100MB for the console log using
the following parameter
 Console_Log_Max_Kbytes = 100000
67
Console Mirroring
•
•
•
You can also use Console Mirroring, which is slightly different
than just the normal console logging
Console log mirroring causes a new server thread to be created
 It monitors all messages written to the Console Log file and
duplicates these messages into another file
 When this file is filled, the thread closes the mirrored file and
creates a new file into which subsequent messages are written
Console log mirroring has three related NOTES.INI settings:
 Console_Log_Mirror=1 – Enables the mirroring feature
 Retain_Mirror_Logs=1 – Prevents deletion of previous mirrors
when Domino starts
 Console_Log_Max_Kbytes= – Sets the max size of the Console
Log/mirror files
68
What We’ll Cover …
•
•
•
•
•
•
•
•
Setting up the foundation for guarding your domain
Working with event generators and event handlers
Selecting a notification method
Customizing recommended actions in Domino Domain Monitoring
Tracking problem servers
Finding and tracking events that show on the console, but not in
the log
Using LotusScript to access server statistics
Wrap-up
69
Can You Be an Admin/Dev Person?
•
•
•
•
When you’re an admin, there are a lot of reasons to learn
LotusScript
 Write your own agents that gather statistics and monitor servers
LotusScript lets you ask for a statistic on all of your servers, one by
one, then store it in a database and produce alerts and notifications
 These can be more sophisticated than native Notes monitoring
The following are two examples of coding that you might find
helpful
 If you have buddies in the Dev side of the house, they might find
this interesting
 Generally, Dev people don’t do applications that help
administrators
 Their focus is on user applications
These two snippets can give you an idea of the potential you have
when dealing with statistics and LotusScript
70
Gathering Script Using LotusScript Is Easy
•
•
Here’s an agent that simply issues a Domino server console
command
 Then shows you the value in a MessageBox
It’s pretty cool for 10 lines of code
Sub Initialize
Dim session As New NotesSession
Dim vServername As String
Dim vConsoleCommand As String
Dim vConsoleReturn As String
vConsoleCommand = "sho stat server.trans.total“
vServerName = "admin/domlab“
vConsoleReturn = session.sendConsoleCommand(vServerName,vConsoleCommand)
MessageBox(vConsoleReturn)
End Sub
71
The Mail.TotalPending Statistic
•
•
•
•
This stat was introduced in Release 5, and I use it all the time in
monitoring servers for mail backing up
From SPR# BSAW4HFMPY
 www-304.ibm.com/support/docview.wss?uid=sim43d86a0d3e79
e0e6785256a8500737f2b
 Added a new Mail.TotalPending statistic that shows the count
of messages pending in mail.box
This statistic is updated once every 5 minutes by the Server task
 Does not depend on the Router task for updates
Provides information about total backlog of mail in the event that
the router is hung or not started
 High value indicates that a mail routing problem needs further
investigation
72
Here’s a Similar Code Snippet That Gets Total Pending Mail
•
•
•
•
This is from a much larger agent that runs every 5 minutes on 70
servers
Remember, LotusScript lets you issue console commands
 Then, take the results of the command and take other actions
Our job is to parse out the number 130 from the show stat
command
 Show stat mail.totalpending
We’re grabbing the stat mail.waiting, which looks like this on the
console
Mail.TotalPending = 130
1 statistics found
73
Here’s the Meat and Potatoes
•
•
Mail.TotalPending = 130
1 statistics found
Then, it’s being parsed out so that only the number is grabbed
 vLocStart = InStr(1,vConsoleReturn,"=",5 )+2
 Gives location 2 chars past = sign where the number starts
 vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStart
 Gives location of end of number at line feed CHR(13)
 vStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd)
 That’s the number as a string, which is converted to a
number
74
Here’s the Meat and Potatoes (cont.)
•
•
Mail.TotalPending = 130
1 statistics found
Here’s a snippet of code that gets you the mail.totalpending
statistic
vConsoleCommandPending = "sh stat mail.pending“
'lets ask the console how many messages are pending
vConsoleReturn = session.SendConsoleCommand(vServerName, vConsoleCommandPending)
vLocStart = InStr(1,vConsoleReturn,"=",5 )+2
vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStart
vStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd)
'Print "Pending: " + Str(vMailTotalPending) + " Pending: " + vStatStr
vMailPending = Val(vStatStr)
75
LotusScript and Monitoring/Alerting – A Great Pair of Tools
•
•
•
You get the advantage of automation with the power of monitoring
and alerting
Stop issues before they become problems
Don’t forget, download the custom statrep Technotics Statrep
8.5.3 from:
 www.andypedisich.com
76
What We’ll Cover …
•
•
•
•
•
•
•
•
Setting up the foundation for guarding your domain
Working with event generators and event handlers
Selecting a notification method
Customizing recommended actions in Domino Domain Monitoring
Tracking problem servers
Finding and tracking events that show on the console, but not in
the log
Using LotusScript to access server statistics
Wrap-up
77
Where to Find More Information
•
•
•
•
www-01.ibm.com/support/docview.wss?uid=swg27008849
 Notes/Domino Best Practices: Performance (IBM, 2010).
www-10.lotus.com/ldd/__00256C3E0030650D.nsf/0/1F2EBFCA1F3
5CA71852571DB00618159?Open
 Harry Peebles, “Domino Domain Monitoring (DDM) Educational
Resources” (IBM, 2006).
www-01.ibm.com/support/docview.wss?uid=swg21293213
 How Does the notes.ini File Parameter ‘server_session_timeout’
Affect Server Performance? (IBM, 2010).
www.ibm.com/developerworks/lotus/library/domino-servercrashes/
 Kiran Bellari, “Troubleshooting Lotus Domino Hangs and
Crashes” (developerWorks, 2006).
78
7 Key Points to Take Home
•
•
•
•
Write your own program in LotusScript or formula language and
add it to DDM’s corrective actions
Collect statistics from problem servers by creating a second
collecting server in your domain
Console logs collect everything that happens on the console,
including messages from tasks and from NOTES.INI debug
parameters
Check the replica ID for the Events4.NSF in your domain to ensure
it is the same on all servers
79
7 Key Points to Take Home (cont.)
•
•
•
Full Administrator Access is a powerful tool that should be
monitored for proper usage
Event handlers can notify you about any message that appears on
the console
Email is the most widely-used notification system, but is also the
most risky
80
Thank You for Attending Our Session!
•
•
Please don’t forget to fill out your evaluations. We read them all!
Please feel free to stop us and ask questions or just have pleasant
conversations
Contact us!
[email protected]
www.technotics.com
www.andypedisich.com
81