JMP404 - Andy`s Blog

Download Report

Transcript JMP404 - Andy`s Blog

JMP404 Master Class: Advanced
Techniques for Domino Server
Monitoring and Alerting
Andy Pedisich | President – Technotics, Inc.
Rob Axelrod | Vice-president – Technotics, Inc.
© 2013 IBM Corporation
Your presenters
 They are two hard working IBM® Notes® Administrators/Developers who have
worked with IBM® Notes® and IBM Domino® since version 2.1
– From Technotics, Inc. in Philadelphia, Pennsylvania - USA
 Andy Pedisich
– 28 years in IT
– 19 years with Lotus Notes
 Rob Axelrod
– 23 years in IT
– 19 years with Lotus Notes
2
About Technotics, Inc.
Technotics was founded in 1998 as a consultancy to
focus on collaboration in the enterprise. Since that
time we have provided strategic advice, project
management and technical support to organizations
world wide, focusing on high levels of customer
engagement and long term relationships.
Rob Axelrod
Our services include environmental audits, premium
support, executive briefings on cloud based
collaboration and migrations between messaging and
collaboration systems.
Contact Andy at [email protected] or
[email protected].
Andy Pedisich
3
What we are working with during this session
 Our host laptop is a Dell Studio™ 1555
– Intel® Core™ 2Duo CPU T9600 2.80 GHz
– 8 GB memory
 The operating system is Microsoft Windows 7 Ultimate – 64 bit
– Copyright © 2009 Microsoft Corporation. All rights reserved
– Microsoft and Windows are trademarks of Microsoft Corporation in the United States,
other countries, or both.
 IBM® Domino® Server Release 8.5.3 FP3 64 bit
– Running on host system, Using IP address of the laptop's network interface card
– Running IBM Traveler
 IBM Notes® Release 8.5.3 FP3
4
Agenda
5

Setting up the foundation for guarding your domain

Working with event generators and event handlers

Selecting a notification method

Customizing recommended actions in Domino Domain Monitoring

Tracking problem servers

Finding and tracking events that show on the console but not in the log

Using Lotusscript to access server statistics
© 2013 IBM Corporation
The humble yet mighty Monitoring Configuration Applicaiton
 This is also known as EVENTS4.NSF
– This controls all the monitoring and notification in your domain
• Unless you use 3rd Party software
• Unless someone like me gets involved because I custom develop Domino domain
monitoring applications
Yes, I am both a full time administrator and a developer when required
6
Requirements for efficient and accurate statistics collection
 Two things are required for statistics collection:
– The Collect task must be running on any server that is designated to collect the statistics
• And Not all servers should run the Collect task
• Only servers designated as collecting servers
– The EVENTS4 database must have at least one Statistics Collection document
7
A few other important items are needed
 Statistics should be collected centrally on one or two servers so that the data is
easy to get to
– If you have offices in Europe and in Australia, you should probably have at least two
servers collecting stats, one in each location
• EUStats.nsf
• AUStats.nsf
– Replicate them to a central location
– Stats should be collected at least every hour to be effective
 EVENTS4 should be the same replica on all servers in the domain
– That’s right! You should be able to put all the EVENTS4.NSF’s stacked up together on
your desktop
– If you can’t maybe you have around 200 servers and stacking that many is impossible
• OR maybe all of your EVENTS4.NSFs are not the same replica ID
8
There is a special replica ID for your EVENTS4.NSF
 The replica ID of system databases, such as EVENTS4, is derived from the
replica ID of the Domino directory
Database
NAMES.NSF
CATALOG.NSF
EVENTS4.NSF
ADMIN4.NSF
Replica ID
852564AC:004EBCCF
852564AC:014EBCCF
852564AC:024EBCCF
852564AC:034EBCCF
– Notice that the first two numbers after the colon for the EVENTS4.NSF replica are 02
– Make sure that EVENTS4.NSF is the same replica ID throughout the domain by opening
a copy from every server and putting it on your desktop
• Here’s some code to help you do that
9
Add a button to your toolbar
 Add this code to a button on your toolbar
– This is courtesy of Thomas Bahn
– He’s a smart guy, nice guy, and sometimes brings chocolates to his friends from Europe
• http://www.assono.de/blog
_names := @Subset(@MailDbName; 1) : "names.nsf";
_servers := @PickList([Custom]; _names; "Servers"; "Select servers";
"Select servers to add database from"; 3);
_db := @Prompt([OkCancelEdit]; "Enter database"; "Enter the file
name and path of the database to add."; "log.nsf");
@For( n := 1; n <= @Elements(_servers); n := n + 1;
@Command([AddDatabase]; _servers[n] : _db) )
10
Add a database icon from all servers to the desktop
 This code will prompt you to pick the servers that have the database you want on
your desktop
– Then it will prompt for the name of the database
• And open it on all the servers you’ve selected
 Use it to make sure all the EVENTS4.NSF are the same replica in your domain
11
A Single collection document looks like many in the view
 A single document will look like it is multiple documents in the EVENTS4 database
– It’s one document with a multi-value field containing all the server names
– Make sure administrators know this, or they might delete everything by mistake
• Guess how I know this?
12
Agenda

Setting up the foundation for guarding your domain

Working with event generators and event handlers

Selecting a notification method

Customizing recommended actions in Domino Domain Monitoring

Tracking problem servers

Finding and tracking events that show on the console but not in the log

Using Lotusscript to access server statistics
13
© 2013 IBM Corporation
Event Monitoring Details
 Enough setting up already!
 Event monitors of all types are set in the EVENTS4
database
 Two broad categories of events:
– Event handlers
• Specify the action that Domino takes when a specific
event occurs
– Event generators
• Each type of event generator has a view that
provides a list of all event generators, plus additional
configuration information
14
Event Generators
 Event generators deal with specific Notes/Domino issues
 There are six types of event generators:
– Database Event Generator
– Domino Server Response Event Generator
– Mail Routing Event Generator
– Statistic Event Generator
– Task Status Event Generator
– TCP Server Event Generator
• Some are used more than others
 We’ll stick to the more popular ones that every administrator should use for
starters
15
Database Event Generator

Use Database Event
Generators to monitor:
– Database activity
– Free space
– Frequency and success
of database replication
– ACLs
• And get reports
on ACL change
• Including those made by replication or
an API program
 Monitor specific servers or every server in
the domain
16
Here’s one that everyone should use
 The ACL of Names.nsf should be monitored for
changes in every Notes domain
– Once properly set, the ACL of Names.nsf should
rarely change!
• Alarms should go off when it does change
 Select Names.nsf
– You can choose either a single server, such as the
administration server for the address book, OR
– All servers in the domain
 I like to pick all servers in the domain
– Admins won’t get away with anything!
– But I do get a storm of messages when an ACL
change occurs
• Every server tells me about
the change
17
Unused Space event generator
 This one is an interesting example of the Events system actually doing something
automatically when a certain condition exists
– It’s questionable in that it is going to execute the Compact task immediately upon
detection of the free space threshold being exceeded
• I could see this event being used on archive servers
• And I wish there was a way to run it during specific hours
18
Server Response Generator
 Domino Server Response Event Generator
– Checks connectivity/port status of server’s network
 One server checks others by sending a probe
– It’s a good idea to try opening Names.nsf
• If you can’t open Names.nsf, then something is wrong!
19
What’s Your Response Tolerance?
 You set the interval for checking Names.nsf
– Default is every three minutes
 And your response time tolerance
– Default is 1,000 Msecs (one second)
• These will both depend on your own environment
20
More About Probes
 The response time is a bit on the harsh side
– If you leave it at 1,000 Msecs (one second) you will receive a lot of notifications
• You should make it ten seconds or whatever the metrics in your Service Level
Agreement (SLA) requires
 Also, be careful what servers you choose to probe other servers
– Try to pick probing servers that are in the same LAN as the probed servers
• Otherwise, your probing will actually be testing network latency rather than the
servers themselves
 However, have used these probes as a method of testing exactly that
– Network latency
21
Statistic Event Generators
 Statistic Event Generators monitor a specific Domino or platform statistic
– They can let you know when a stat goes over a particular threshold
• These stat event generators are extremely valuable
Smart administrators use them every day!
22
Complete listing of all statistics are in EVENTS4.NSF
 The complete listing is in the view Statistics by Name
 The default statistics thresholds view only shows documents where the field
“useful” is equal to the word “Yes”
23
EVENTS4.NSF database has all the stats and thresholds
 The Monitoring Configuration (EVENTS4.NSF) supplies document detailing
thresholds for each statistic
– There are 1,193 statistic documents available
• But only 166 of them are considered useful
for setting thresholds and are found in the
 Each document contains other
information about what kind of stat it is
 Plus info about the default threshold
– And yes, you can change these settings
24
Why are most statd considered “not useful” for thresholds?
 There is one setting on the advanced that
controls whether it will appear in the dropdown
list when you’re setting an event generator
– Note that there are no Agent statistics in this list
25
Why no agent stats
 It’s not that the Agent stats aren’t useful
– They might not be valuable for threshold tracking
 In some releases, Agent.Hourly.UsedRunTime has a data type of text
– We can’t set a threshold with text values
26
We do have a nice way of seeing that stat though
 Technotics has created a super customized version of the Monitoring Results
database, STATREP.NSF
 It’s called the Technotics R8.5.3 statrep and it is the stock IBM statrep with added
views
 One of these valuable
views is the Agent Stats view
 You can download this from
 http://www.andypedisich.com
– Look for the Connect2013 link
27
Static statistics are not useful for thresholds
 Statistics that don’t change usually represent the operating environment of the
server
– Server.Version.Notes = Release 8.5.3
– Server.Version.OS = Windows NT 5.0
– Server.CPU.Type = Intel Pentium
– Disk.D.Size = 71,847,784,448
– Mem.PhysicalRAM = 527,433,728
– Platform.Network.1.AdapterName = Intel[R] PRO_1000 MT Server Adapter
28
Show Me the Stats
 When you issue a SHOW STAT command at the console, Domino dumps every
statistic it is tracking
 Every one of these statistics is in every single one of the documents in the
STATREP.NSF database
– All you need is a view to see them
29
What Good Are These?
 Think these stats aren’t helpful? They are!
 If you are collecting stats correctly from all your servers, you can take a pretty
detailed server inventory
– Without leaving your desk
• From servers all around the world, just by looking at the data collected in the
Monitoring Results database
This database is also known by its filename: Statrep.nsf
30
Finding the “not useful” stats
 You might find that a statistic you need has been marked as not useful
 To see which are marked as not useful, full text index the EVENTS4.nsf
 Create an advanced query checking the field useful = “No” to find them
– You might discovery a statistic who’s threshold would be right for using
31
Wizard follows up event generator with event handler
 As you complete the form for an event generator, you’ll see the button to create a
new event handler
 When you click it, you are walked through the process of creating an event
handler
32
Wizard lets you choose the method of handling the event
 There are lots of methods of event handing
– Which one you choose depends a lot on your infrastructure
– We’re going to talk more about the notification methods in the next section of the
presentation
 For now just remember that an event generator is fairly worthless by itself
– Unless you have effective event handler that tells you, in it’s own way, what is going on
with your servers
33
Event handlers are a an exquisite gift
 They provide a way to give you a heads-up about issues provided by event
generators
 They also give you a free-form way of being alerted of anything that happens in
the Domino server log and most of what happens on the Domino server console
 You can use event handlers to respond to generators and certain add-in tasks
– They are most valuable for picking out text on the console that will mean trouble if
ignored
 We’re going to focus on this type of event handling since it is less intuitive than
responding to generators or add-ins
34
Basics of the event handler configuration
 There are three screens to deal with
 Decide whether you want to track an event on just
a few servers or all servers
– You might want to track a particular event on mail
servers only
 Decide what triggers a notification
– We’re going for free-form, so we will select “any
event that matches a criteria”
35
Second set of choice for event handling
 When working with console events, select
“events can be of any type”
 And “events can be of any severity
 Then look for a particular string of text in the
event message
– This can be absolutley any text that appears on
the console
• We will explain why we are picking the text
“full administrator access in a moment
36
Final setup tab for event handling
 Lastly , we define what action will take place
when the text appears
 We’ve selected email notification as the method
we will use
– But there are over a dozen others that we will
discuss in a few moments
 Note that you can control the time of day the
event handler is on the job
– I wish they did that for event generators
37
Why did we choose the text Full Access Administrator
 Full access administrator is the highest level of administrative access to the server
– Here are just some of the rights available:
• Manager access, with all access privileges enabled, to all databases on the server,
regardless of the ACL settings
• Access to all documents in all databases, regardless of Reader names fields
• The ability to create agents that run in unrestricted mode with full administration
rights
• Access to any unencrypted data on the server
 The act of turning on full access administrator should not be taken lightly
– Your security model should make it almost unnecessary to ever turn it on
 Therefore when a privileged user activates full access administrative access, you
want to know about it to prevent some hooligan from doing shenanigans
38
When the privilege is turned on, it’s logged
 When an admin turns on Full Administrator
Access (FAA) it appears on the server
console
– It is grabbed by the event handler and I get
an email
 Each time the admin moves to another
server I get another email
– Until eventually I call the admin and ask why
does he need FAA power!
– Usually the admin has forgotten they turned it
on and stops using it.
39
Other words you should track with event handlers
 “deleted by” – This generally means someone has deleted a database
– Usually their mail file if they have manager access
– You’ll be getting the out the backup tapes in a minute
01/05/2013 04:02:17 PM Opened live remote console session for Andrew M Pedisich/DomLab
01/05/2013 04:04:50 PM Database ArchiveOfIncriminatingPhotos.nsf deleted by Andrew M Pedisich/DomLab
40
Other bad words to watch for extremely inefficient
 Here are some other words and expressions to watch for:
41
Expression
Issue
An exception occurred while
writing data into database
Bad news all round. You’re going have to get to the db and run some
maintenance.
Replication cannot proceed
Replication cannot proceed because cannot maintain uniform access
control list on replicas.
This is a result of “Enforce Consistent ACL”
RRV bucket is corrupt
RRV stands for Record Relocation Vector. It is a pointer that tells
Notes where to find a specific NoteID and it is bad if it’s corrupted.
You can try a fixup, but it might be borked and needs a new replica.
truncated
Try fixup. Maybe. Maybe not.
Device error
Uh oh.
Database is corrupt; cannot
allocate space
This one is bad too.
B-tree structure is invalid
You never want to see a b-tree error. It usually means you have to
replace the database.
extremely inefficient
Agent Manager: Full text operations on database 'xyz.nsf' which is not
full text indexed.
Agenda

Setting up the foundation for guarding your domain

Working with event generators and event handlers

Selecting a notification method

Customizing recommended actions in Domino Domain Monitoring

Tracking problem servers

Finding and tracking events that show on the console but not in the log

Using Lotusscript to access server statistics
42
© 2013 IBM Corporation
We’re circling back to notification methods
 Here is the panoply of notification methods
 The most widely used notification method is to send an email to an admin group
when a problem occurs
– And yet that is also very risky, since the email system itself might be the problem
43
The Most Important Notification Options
 There are 14 ways to be notified – these are the best
Method
Result
Comments
Log to
Database
Logs the event to a database, typically
STATREP.NSF, on a local server
Always record any event in STATREP.NSF
for historical purposes regardless of what
else you do
Mail
Mails the event to a person or to a mailin database
Good for most events in multi-protocol
environments, but as mentioned, it’s bad
if the mail system goes down
Pager
Uses the mail address of an
alphanumeric pager
OK, but limited value because
it uses mail system; if mail itself is down,
there are issues
44
Paging Dr. Howard, Dr. Fine, Dr. Howard …
 Paging notification is a good choice
– But not if you are paging through a third-party phone system like Verizon or AT&T
• They generally require an email to be sent
• They have no Service Level Agreement – NONE!
 Sadly, due to budget and resource constraints, we generally see these two mail or
paging methods used the most in production environments
45
The Most Important Notification Options (cont.)
 These two are the best, and there’s one more that’s not listed
Method
Result
Comments
SNMP Trap
Sends the event as an SNMP trap. Select this
method only if the specified server is running the
Event Interceptor task and the Domino SNMP
Agent.
This is truly an ideal
notification method because
it does not depend on Notes
protocols actually working
Forward event
to Tivoli Event
Console
Allows the Tivoli Enterprise Console (TEC) to
receive IBM Domino events and reformat them as
TEC events. The reformatted TEC event is then
sent to the TEC server that you specify in the
Configuration Settings document.
Check with the Tivoli team to
see if it’s possible to use this
in your environment
46
Customized Tivoli package
 As someone who develops a lot of monitoring solutions, I often have to bend the
rules and do some development (Ugh!)
– I was given an executable called postemsg.exe which was placed on the c: drive of a
Windows based server that was the central hub for monitoring servers
 With some knowledge of Lotusscript I was able to craft a system to monitor
servers and send the results back to the Tivoli event console
vMess1 = {C:\Windows\System32\postemsg.exe -f F:\TECAlerts\tecserver.cfg -r CRITICAL -m "} + vLongMessage +
{" }
vMess2 = {hostname="} + vReportServerName + {" }
vMess3 = {sub_source="MESSAGINGLOTUS" Mynotify_supportfilter="1" MyNotify_severity="2" }
vMess4 = {MyNotify_tin=“0066" MyNotify_atin="0066" MyNotify_msg="Domino mail server outage" }
vMess5 = {MyNotify_srcplatform="W" MyNotify_processreturncode="0" MyNotify_correlation="0" }
vMess6 = {MyNotify_app="DominoMail" MyNotify_env="Production" MESSAGING_LOTUS MESSAGING}
vMess = vMess1+ vMess2 + vMess3 + vMess4 + vMess5 +vMess6
result = Shell( vmess , 6 )
47
Customized Tivoli package
 In this case I developed a custom monitoring solution that fed trouble tickets into a
version of the Tivoli Event Console that was not supported by the Tivoli event
handler system
– When you have to deal with extreme monitoring capability with high reliability you
sometimes need to get in deep
– This is very effective because it uses that postemsg.exe executable on the OS level to
send the message to the TEC
– Note that the message is carefully crafted to form a large command string which sends
the ticket to Tivoli
• Check with your Tivoli team to see if you can take advantage of this method
vMess1 = {C:\Windows\System32\postemsg.exe -f F:\TECAlerts\tecserver.cfg -r CRITICAL -m "} + vLongMessage + {"
}
vMess2 = {hostname="} + vReportServerName + {" }
vMess3 = {sub_source="MESSAGINGLOTUS" MyNotify_supportfilter="1" MyNotify_severity="2" }
vMess4 = {MyNotify_tin=“0066" MyNotify_atin="0066" MyNotify_msg="Domino mail server outage" }
vMess5 = {MyNotify_srcplatform="W" MyNotify_processreturncode="0" MyNotify_correlation="0" }
vMess6 = {MyNotify_app="DominoMail" MyNotify_env="Production" MESSAGING_LOTUS MESSAGING}
vMess = vMess1+ vMess2 + vMess3 + vMess4 + vMess5 +vMess6
result = Shell( vmess , 6 )
48
Agenda

Setting up the foundation for guarding your domain

Working with event generators and event handlers

Selecting a notification method

Customizing recommended actions in Domino Domain Monitoring

Tracking problem servers

Finding and tracking events that show on the console but not in the log

Using Lotusscript to access server statistics
49
© 2013 IBM Corporation
If you’re not using DDM, you see this with each server start
01/22/2013 11:49:08 AM Warning: All Domino Domain
Monitoring probes are disabled resulting in the loss of
valuable diagnostic information. Please configure DDM
probes in events4.nsf. Assess DDM reports in ddm.nsf.
50
DDM is an advanced topic and is best used by new admins
 Domino Domain Monitoring (DDM) is a powerful, yet complex tool that is often
overlooked by administrators
 If you are using Domino 6, 7, or 8 you are already a proud owner of Domino
Domain Monitoring Databas and could already be using it’s powerful functionality
51
51
DDM backs up its discoveries with explanations
 DDM explains the probable cause, possible solution, and sometimes corrective
actions
– That’s right, actions that will actually correct the problem you’re experiencing
 These are stored in the EVENTS4.NSF and are configurable by you
– Let’s look for the error “ATTEMPT TO ACCESS DATABASE BY”
52
Looking in the view, “Event Messages by Text”
 We can find that error message in the EVENTS4.NSF
– And discover how we might change report DDM produces
53
The cause, solution, and corrective action is listed
 This document has all the probable cause, possible solution and corrective action
– These are supplied by Lotus and include the code in the corrective action
54
Click the link to the modular corrective action
 Clicking the link will take you to the code
– This could be in formula language, Lotusscript
55
The modular corrective action is re-usable
 At the bottom of the modular action there is a list of other error text messages that
also use this action
– That same action that was written only a single time can be used as a corrective action
multiple times
56
Modular Documents for cause, solution and corrective actions
 Domino 8 comes with over 1,000 modular documents
– Chances are your solutions are already there for most issues
– You can add new ones
57
Modular Documents were new in Domino 8
 Modular documents are a welcome addition to DDM
– But to appreciate why they are so cool, we must first go back in time to see how similar
functionality was accomplished in Release 7
 In R7 DDM, some events could have automated solutions
– These automated solutions were hard-coded into the Events documents in the
Monitoring Configuration application EVENTS4.NSF
58
Modular documents let you create describe issues
 Modular documents let you add your own probable cause and possible solution
text
– And create corrective actions that are created with formula code, Lotusscript, and agents
59
The re-usable modular solution saves you time and work
 You can use any of the same solutions provided by IBM for your custom solution
60
You can add to the solutions that will display with the error
 A custom solution of composing an email to the target user can be inserted
61
Changes the DDM report
 The modular document now has the “compose an e-mail” choice
62
It starts the email for you
 The code plugs in the user’s name and the database that was being accessed
– And it’s all done with modular documents in EVENTS4.SNF
63
Remember, actions are matched with events
 Match up the modular document with the event in the Monitoring Configuration
application
64
Changes Might Take Time
 Events and modular documents are cached
– You might find that updates to events and modular documents are not reflected in
DDM.NSF right away
– Be patient!
• If you’re not a patient person, restart the Event task to ensure
updates to Events and modular documents are reflected
in DDM.NSF immediately
65
Don’t touch the IBM entries
 Event documents have three categories of Probable Cause/Possible Solution and
Corrective Actions
 The first tab contains the IBM Entries
– These are, of course, provided by Lotus
• Do not modify or delete these entries
• If you want to disassociate the entry with the event, simply edit the document and
uncheck the Enabled box
66
Your Custom Entries
 Add your own references to PC, PS, and CA on the Custom Entries tab
– The interface looks similar to the Lotus Entries tab, but you can only add up to two
Probable Cause/Possible Solution actions
• And there is no Enabled setting as with the Lotus entries
 These settings will be retained as you move forward upgrading from Domino 8.x
67
Role in DDM ACL that will restrict who can use actions
 Many events have corrective actions associated
with them
– Only users with the Execute CA role in the DDM ACL are able
to access the command actions and the corrective action
text and links
• This ensures that only qualified team members will be able to make the changes
68
Agenda

Setting up the foundation for guarding your domain

Working with event generators and event handlers

Selecting a notification method

Customizing recommended actions in Domino Domain Monitoring

Tracking problem servers

Finding and tracking events that show on the console but not in the log

Using Lotusscript to access server statistics
69
© 2013 IBM Corporation
Dealing with problematic servers
 Sometimes there are servers with issues that crop up
– We would like to collect statistics for analysis from these systems more frequently than
we do from the standard statistics collection interval
• If you try to add a second collection interval on a server you’ll get this:
70
Each server is allowed to collect stats with only one interval
 It makes sense that a server can only have
one collection interval
– You must create a second collection document
for another server
– And don’t forget to add the “collect” task to the
servertasks= parameter in the server’s
NOTES.INI
 Let’s look at a server that has CPU spikes
– We want to determine exactly when they are
happening by creating a chart
 First we create a statistics collection
document for a second server to take
statistics from our problem server
71
Set the collection interval for five minutes
 In the statistics collection document set
the collection interval for 5 minutes
 By the way, do not check any filters
– They tell the collector to ignore the
statistics you have checked
 Note that statistics are being logged to a
database called ProblemServer.NSF
– This database will be used exclusively to
track CPU utilization of the Traveler task
 Please note that the data in this example
has been fictionalized for effect
– This is not actual data from a real server
– It is being used as an example of
capturing and analyzing data on a
problematic server
72
Create a special view that tracks CPU utilization for Traveler
 In this case it’s the Traveler CPU we want to track
 We create a custom view for the collecting database that only has the server
name, the time of collection, and the statistic called
Platform.Process.Traveler.1.PctCpuUtil
– This will be used to easily create a graph of the CPU activity
73
Collect the data, then copy it as a table from the custom view
 After collecting a week’s worth of data, we experience the CPU utilization
 All the data in the view is selected using Ctrl-A
– It is copied as a table
• Copying views as a table is one of my favorite features in Notes
 A Monitoring Results template is posted on my web site
– A URL to this template is included at the end of the presentation
74
Data has been copied to a spreadsheet
 A simple paste of the data puts it into a spreadsheet where we are ready to turn it
into a chart
75
Use the tools in your spreadsheet to create a graph
 Select the columns Collection Time and
Traveler CPU
 Create a graph from the data
– In this example, a scatter chart type with
smooth lines is being used
76
The resulting graph
 This produces an excellent graph of the CPU utilization over a ten day period with
samples being taken at intervals of 5 minutes
– And it took less that 5 minutes to make this chart
• One adjustment was made to the x-axis formatting and the legend was removed
77
Demonstration
 Creating a graph of results from a custom view of collected data
78
Agenda

Setting up the foundation for guarding your domain

Working with event generators and event handlers

Selecting a notification method

Customizing recommended actions in Domino Domain Monitoring

Tracking problem servers

Finding and tracking events that show on the console but not in the log

Using Lotusscript to access server statistics
79
© 2013 IBM Corporation
Some events occur on the console, but not in the log
 Note in this example the server stops reporting at 11:04 PM
 Then at 11:27 PM it is back on line
 What happened in the interim?
Name:
Time:
Mail1/domlab
01/04 11:02:05 PM
Miscellaneous Events:
01/04/2013 11:04:17 PM
01/04/2013 11:04:31 PM
01/04/2013 11:04:31 PM
01/04/2013 11:04:33 PM
01/04/2013 11:04:35 PM
01/04/2013 11:04:38 PM
01/04/2013 11:04:43 PM
01/04/2013 11:04:51 PM
Name:
Time:
Pulling icl.ntf from Maill2/domlab icl.ntf
Access control is set in catalog.nsf to not allow replication from BES02/domlab catalog.nsf
Access control is set in mail2/domlab catalog.nsf to not allow replication from catalog.nsf
Pulling ddm.nsf from Mail2/domlab ddm.nsf
Pushing ddm.nsf to Mail2/domlab ddm.nsf
Finished replication with server Mail2/domlab
Router: Transferred 1 messages to MAIL2.domlab.COM (host MAIL02.domlabUSA.COM) via SMTP
Opened session for Mail2/domlab (Release 8.5.2FP1)
Mail1/domlab
01/04 11:27:11 PM - 01/04 11:27:47 PM
Miscellaneous Events:
01/04/2013 11:27:11 PM Recovery Manager: Restart Recovery complete. (196/1686 databases needed full/partial recovery)
01/04/2013 11:27:11 PM Informational - The DAOS catalog is not synchronized. Deletions will be postponed. Please run 'tell daosmgr
resync' at the next convenient opportunity to re-synchronize.
01/04/2013 11:27:12 PM Event Monitor started
01/04/2013 11:27:12 PM Warning: All Domino Domain Monitoring probes are disabled res
80
There is action in the CONSOLE.LOG
 CONSOLE.LOG and other logs are in the folder called
IBM_TECHNICAL_SUPPORT under the data folder on servers and on clients
 The CONSOLE.LOG on a server often contains data that has been seen on the
Domino server console, but not in the Domino server log
– It shows there was a Long Held Lock Dump and then a panic!
Lock(Mode=SIX* LockID(DB DB=G:\Lotus\Domino\Data\mail\web\Complaints.nsf)) Waiters countNonIntentLocks =
1 countIntentLocks = 1, queuLength = 95
[Req(Status=Granted Mode=IS Class=Manual Nest=0 Cnt=1
Tran=0 Func=N/A m\lkmgr.cpp:159 [0D64:0002-0D60])
rm_lkmgr_cpp:2070
rm_lkmgr_cpp:1306
nsfsem1_c:169
nsfsem1_c:1020
nsfsem6_c:503
Req(Status=Granted Mode=SIX Class=Manual Nest=0 Cnt=1
Tran=0 Func=N/A inplace.c:153 [099C:0165-12FC])
LkMgr END Long Held Lock Dump -----------------01/04/2013 11:04:51 PM Opened session for Terry Mallory/domlab (Release 8.5.2FP2)
01/04/2013 11:04:51 PM Closed session for Terry Mallory/domlab Databases accessed: 1 Documents read: 0
Documents written: 0
The server process terminated abnormally with the exit status = 1. Please send this information and the
collected nsd log to IBM Support. This process will now Panic in order to start fault recovery operations.
81
Why did this happen?
 In this case there were a large number of email messages with big attachments
waiting to be processed in the MAIL.BOXES
 The server was relatively underpowered
 Plus I think the messages were part of an emailing made by a CEO
– And we all know, the mostly visible executives have the worst time with any piece of
messaging software
82
Here’s another example of helpful Console logging
 I entered the following into the Domino
server console
 Tell traveler stat show
 That command generates hundreds of
lines of statistics and other information
 Clearly it shows on the server console.
83
Here’s another reason for Console logging
 Here’s the Domino server log, showing me doing several furious requests to the
Traveler task to Tell traveler stat show
 I get nothing
> tell traveler stat show
01/06/2013 12:24:49 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
> tell traveler stat show
> tell traveler stat show
01/06/2013 12:24:52 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
01/06/2013 12:24:55 PM Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
> tell traveler stat show
01/06/2013 12:24:55 PM
01/06/2013 12:25:43 PM
01/06/2013 12:25:43 PM
01/06/2013 12:25:43 PM
01/06/2013 12:25:44 PM
01/06/2013 12:25:44 PM
01/06/2013 12:25:44 PM
01/06/2013 12:25:52 PM
> tell traveler stat show
84
Directory Cataloger finished processing names.nsf: Directory Catalog has no Configuration record
AMgr: Start executing agent 'PullFromAdmin4' in 'certreq.nsf' by Executive '1'
AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'PullFromAdmin4' in 'certreq.nsf'
AMgr: 'Agent 'PullFromAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab'
AMgr: Start executing agent 'SubmitToAdmin4' in 'certreq.nsf' by Executive '1'
AMgr: 'Admin/Servers/DomLab' is the agent signer of agent 'SubmitToAdmin4' in 'certreq.nsf'
AMgr: 'Agent 'SubmitToAdmin4' in 'certreq.nsf' will run on behalf of 'Andrew M Pedisich/DomLab'
Remote console command issued by Andrew M Pedisich/DomLab: tell traveler stat show
Check the IBM_TECHNICAL_SUPPORT folder
 CONSOLE.LOG from the IBM_TECHNICAL_SUPPORT folder on the server
 Whenever there are server issues, don’t forget to check the console.log for
evidence
01/06/2013 12:25:52 PM Remote console command issued by Andrew M Pedisich/DomLab:
tell traveler stat show
tell traveler stat show
CPU.Pct.000-010 = 7
ClusterCache.Access = 1
Constrained.count = 0
Constrained.state = false
DB.Connections = 1
DB.Connections.Idle = 1
DB.Connections.Max = 7000
DCA.C.CheckAccessRights = 2
DCA.C.Count.NSFDbClose = 3
DCA.C.Count.NSFDbOpen = 3
DCA.C.Count.NSFNoteClose = 2
DCA.C.Count.NSFNoteOpen = 2
DCA.C.HTMLCreateConverter = 1
DCA.C.HTMLDestroyConverter = 1
DCA.C.ModDoc.RunCount = 1
DCA.C.ModDoc.SyncableDocs = 1
85
Console logging configuration
 To start a console log permanently on your servers, add this to the NOTES.INI
– Console_Log_Enabled = 1
 Use the following values
– 0 - Disable Console Log file logging
– 1 - Enable Console Log file logging
 You can also toggle logging to the Console Log file from the server console
– Use the start consolelog and stop consolelog commands
 Obviously this is an important feature and you’d want it to be enabled all the time
 Set a maximum size of almost 100MB for the console log using the following
parameter
– Console_Log_Max_Kbytes = 100000
86
Console Mirroring
 You can also use Console Mirroring which is slightly different that just the normal
console logging
 Console log mirroring causes a new server thread to be created
– It monitors all messages written to the Console Log file and duplicates these messages
into another file
– When this new file is filled, the thread closes the mirrored file and creates a new file into
which subsequent messages are written. You can delete the closed mirrored files at
your discretion.
 Console log mirroring has three related NOTES.INI settings:
– Console_Log_Mirror=1 -- Enables the mirroring feature
– Retain_Mirror_Logs=1 -- Prevents deletion of previous mirrors when Domino starts
– Console_Log_Max_Kbytes= -- Sets the maximum size of the Console Log/mirror files
87
A little more about mirroring
 If the NOTES.INI setting Retain_Mirror_Logs=1 is not set, when the new task
starts it begins deleting previous mirror files
 Then a new file is created and assigned the name of the log with a number in it
– For example CONSOLE1.LOG is created
 When the log fills to the configured capacity it closes the current log and starts a
new one with a new number
88
Agenda

Setting up the foundation for guarding your domain

Working with event generators and event handlers

Selecting a notification method

Customizing recommended actions in Domino Domain Monitoring

Tracking problem servers

Finding and tracking events that show on the console but not in the log

Using Lotusscript to access server statistics
89
© 2013 IBM Corporation
Can you be an Admin/Dev person?
 When you’re an admin there are a lot of reasons to learn Lotusscript
 You can write your own agents that gather statistics and monitor servers
 Lotusscript lets you ask for a statistic on all of your servers, one by one, then store
it in a database and produce alerts and notifications that are more sophisticated
than native Notes monitoring
 The following are two examples of coding that you might find helpful
– If you have buddies in the Dev side of the house they might find this interesting
• Generally dev people don’t do applications that help administrators
• Their focus is on user applications
 These two snippets can give you an idea of the potential you have when dealing
with statistics and Lotusscript
90
Gathering script using Lotusscript is easy
 Here’s an agent that simply issues a Domino server console command
– Then show you the value in a messagebox
 It’s pretty cool for 11 lines of code
Sub Initialize
Dim session As New NotesSession
Dim vServername As String
Dim vConsoleCommand As String
Dim vConsoleReturn As String
vConsoleCommand = "sho stat server.trans.total“
vServerName = "admin/domlab“
vConsoleReturn = session.sendConsoleCommand(vServerName,vConsoleCommand)
MessageBox(vConsoleReturn)
Exit Sub
End Sub
91
The Mail.TotalPending statistic
 This stat was introduced in Release 5, and I use it all the time in monitoring
servers for mail backups
 From SPR# BSAW4HFMPY
– https://www304.ibm.com/support/docview.wss?uid=sim43d86a0d3e79e0e6785256a8500737f2b
 Added a new Mail.TotalPending statistic that shows the count of messages
pending in mail.box.
 This statistic is updated once every 5 minutes by the Server task, and therefore
does not depend on the Router task for updates.
 This provides information about total backlog of mail in the event that the router is
hung or not started, and may be useful to indicate that a mail routing problem
needs further investigation.
92
Here’s a similar code snippet that gets total pending mail
 This is from a much larger agent that runs every 5 minutes on 70 servers
 Remember, Lotusscript lets you issue console commands
– Then take the results of the command and take other actions
 Our job is to parse out the number 130 from the show stat command
– Show stat mail.totalpending
 We’re grabbing the stat mail.waiting, which looks like this on the console
Mail.TotalPending = 130
1 statistics found
93
Here’s the meat and potatoes
 Mail.TotalPending = 130
1 statistics found
 Then it’s being parsed out so that only the number is grabbed
– vLocStart = InStr(1,vConsoleReturn,"=",5 )+2
• Gives the location of 2 characters past the equal sign where the number starts
– vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStart
• Gives the location of the end of the number where there is a line feed (CHR(13)
– vStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd)
• That’s the number as a string, which is converted to a number
vConsoleCommandPending = "sh stat mail.pending“
'lets ask the console how many messages are pending
vConsoleReturn = session.SendConsoleCommand(vServerName, vConsoleCommandPending)
vLocStart = InStr(1,vConsoleReturn,"=",5 )+2
vLocEnd = InStr(1,vConsoleReturn,Chr(13),5 ) - vLocStart
vStatStr = Mid(vConsoleReturn,vLocStart,vLocEnd)
'Print "Pending: " + Str(vMailTotalPending) + " Pending: " + vStatStr
94
vMailPending = Val(vStatStr)
Lotusscript and monitoring/alerting – a great pair of tools
 You get the advantage of automation with the power of monitoring and alerting
 Stop issues before they become problems
 Don’t forget, download the custom statrep Technotics Statrep 8.5.3 from
– Http://www.andypedisich.com
95
Thank you for attending our session!
 Please don’t forget to fill out your evaluations. We read them all!
 Please feel free to stop us and ask questions or just have pleasant conversations
Contact us!
[email protected]
[email protected]
http://www.technotics.com
http://www.andypedisich.com
96
Legal disclaimer
© IBM Corporation 2013. All Rights Reserved.
The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it
is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM
shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect
of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in
this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any
way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary
depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed.
Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.
I
Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
97
© 2013 IBM Corporation