Distributed Monitoring
Download
Report
Transcript Distributed Monitoring
NSClient++:
Distributed Monitoring
Adventures into the unknown
Disclaimer!
These slides represent the work and opinions
of the author and do not constitute official
positions of any organization sponsoring the
author’s work
This material has not been peer reviewed and
is presented here as-is with the permission of
the author.
The author assumes no liability for any
content or opinion expressed in this
presentation and or use of content herein.
My Background
Developer (not system manager)
Accidentally ended up in our NOC
2003: The birth of NSClient++
2004: The open source of NSClient++
2007: The rebirth of NSClient++
2011: The Present
◦ Not working with Nagios
◦ Hated BB so we migrated to Nagios
◦ NSClient/NRPE_NT was not to my liking
◦ “just for fun”
◦ Got a lot of emails and hits on the webpage
◦ 0.3.9 out last may
◦ 0.4.0 RC out
Agenda
About
◦ NSClient++
Distributed monitoring
◦ About
◦ Concepts
◦ Protocols
Distributed monitoring with NSClient++
◦ Configuration Changes
◦ Configuration Concepts
◦ Scenarios
Q/A
NSClient++
Quick Introduction
About NSClient++
Internals:
Runs on:
Current Version:
◦
◦
◦
◦
C++
Around 75.000 lines of code
Actively developed (unfortunately only by me)
Modularized design (use what you need)
◦ Windows: NT4, w2k, XP, w2k3, Vista, w2k8, X64, X86 …
◦ Unix: Linux/Debian (probably many/most others as well)
◦ 0.3.9 with 0.4.0 in beta
Most features require NRPE or NSCA (or NSCP)
Documentation online (WIKI)
◦ http://nsclient.org
About NSClient++ (cont.)
Not supported by a commercial entity
◦ Donations welcome
◦ Sponsoring available (contact me for details)
Used by a lot of people (I think)
◦ Impossible to estimate any figures
Please, Help out!
◦
◦
◦
◦
Add documentation
Report problems
Come with ideas, thoughts, etc…
Tell me what sucks!
Why should you use NSClient++
NSClient++
◦ Freedom!
Custom scripts
Decentralized or centralized
Active or Passive
Can monitor “anything” (including your application)
Can perform “tasks” (fix your problems)
Other options:
◦ SNMP
Generally complex to use and limited on “standard” hardware
◦ pNSClient/NRPE_NT/OpMonAgent/*
Old, outdated and usually limited functionality
◦ “Agentless” WMI
Limited functionality
Enforces centralized and active monitoring
But...
◦ I am biased, so might not want to take my word for it...
Thank you!
Using NSClient++ (0.4.0)
New command line syntax!
◦ nscp <context> [<options>]
◦ nscp service --start
◦ nscp help
Testing
◦ nscp test
Configuration:
◦ nscp settings --help
◦ nscp settings --migrate-to ini
◦ nscp settings --set …
Run scripts:
◦ nscp python --exec run --script test.py
◦ nscp nrpe –-query command –-host 192.168.0.1
Roadmap (rough)
0.4.3
0.4.2
0.4.0
• Core switch
0.3.9
• Last 0.3.x
• Linux
support
• Distributed
Monitoring
(v1)
0.4.1
• Monitoring
Kits
• Bugfixes
• New windows
checksubsytem
• True passive
checks
• Distributed
Monitoring
(v2)
•Bugfixes
What’s new 0.4.0
Brand new core based upon libraries
Unix support
New settings subsystem
New protocol
Distributed monitoring
Python scripting
Updated installer
Many many more things
◦ Things should *work* not just “work”
◦ Even more modular and extensible
◦ Both as a client and server
◦ Registry, improved ini support, http, etc
◦ NSCP (HTTP(s), MQ, Native)
◦ Built for distributed monitoring
◦ Allows (for me) unit testing
◦ Wix 3.5, more customizable
Distributed Monitoring
Introduction
Distributed monitoring
What is it?
◦
◦
◦
◦
◦
◦
Passive checks?
Clustered Nagios?
Nagios front ends?
Distributed Nagios?
Distributed Check “thingies”?
…
For me:
◦ The ability to distribute check metrics across the
network!
So…uhmm…
Should be simple right?
Internet
Distributed Monitoring
Concepts
Requirements
Support
Security
Extensible
◦
◦
◦
◦
All paradigms
Any size payload
Multiple commands
Multiple arguments
◦ Encryption
◦ Strong authentication
◦ No transport restrictions
◦ Allow customization
Firewall friendly (HTTP?)
…
3 main paradigms
Query
◦ Ask the status of something
◦ (Sometimes called Active)
Notification
◦ Send a notification to someone
◦ (sometimes called Passive, Message or Channel)
Command
◦ Tell someone what to do
But also:
◦
◦
◦
◦
Configuration
Installation
Upgrade
Push/pull file (scripts)
Query
Modeled on NRPE
The normal (active) scenario in Nagios/Icinga
Are you ok?
◦ Yes / no (or, OK, WARN, CRIT, UNKNOWN)
Query
NSClient++
check_nrpe
Network
NRPEServer
forward request
CheckEventlog
CheckEventlog
CheckEventlog
Submission
Modeled on NSCA
The passive scenario in Nagios/Icinga
I am (not) ok!
◦ (or, OK, WARN, CRIT, UNKNOWN)
Submission
NSClient++
send_nsca
Network
NSCAServer
CheckEventlog
channel
channel
nsca
Network
CheckEventlog
CheckEventlog
NSCAClient
Command
Event handlers (but more)
Normally implemented via NRPE (ish)
Restart service X, run script Y
Technically similar to Query
Command
NSClient++
check_nrpe
Network
NRPEServer
forward request
CheckEventlog
CheckEventlog
CheckEventlog
Distributed Monitoring
Protocols
Distributed via NRPE?
Does not support:
◦
◦
◦
◦
◦
◦
◦
Passive
Encryption (for real)
Authentication
Big payloads
Multiple commands
Firewall friendly protocol (ie. HTTP)
…
But we can still use it
◦ Inside the network
◦ Where big payloads are not required
◦ When we only need active monitoring
Distributed via NSCA?
Does not support
◦
◦
◦
◦
Active
Big payloads
Firewall friendly protocol (ie. HTTP)
…
But we can still use it
◦ Inside the network.
◦ Where big payloads are not required
◦ When we only need passive monitoring
Distributed via NRDP?
Does not support
◦ Active
◦ Strong authentication
◦ …
But we can still use it
◦ When we only need passive monitoring
Distributed via SSH?
Does not support
◦
◦
◦
◦
◦
Passive (?)
Not very Windows friendly
Not very firewall friendly
Cumbersome to manage certificates
…
But we can still use it
◦ On our *nix machines
Distributed via NSCP?
Does not support:
◦ Firewall friendly protocol (ie. HTTP)
Will come in next major release
◦ Experimental
But we can still use it
◦ When we want play around
◦ Inside the network (currently)
Other options
Distributed NSCP
◦ No encryption support
◦ Not firewall friendly protocol (ie. HTTP)
Syslog
◦ Not really good for metrics/status
◦ No support for active checks
◦ Not firewall friendly protocol (ie. HTTP)
SMTP
◦ Not practical
◦ No support for active checks
Will come in next major release
◦ Not real time
◦ Firewall friendly?
Summary of protocols
Protocol
Paradigm
Encryption
Auth
Payload
NSClient
NRPE
NSCA
NSCP
D-NSCP
Syslog
SMTP
check_mk
NRDP
Active
Active
Passive
All
MQ
?
?
Active
Passive
No
No
Yes
Yes
No
No
Yes
?
Yes
Yes
No
Yes
No
1024
512
∞
∞
1024
∞
∞
Yes
Yes
No
?
No
Yes
∞
Multiple- MultipleArguments commands
Yes
Yes
Yes
Yes
Yes
N/A
Yes
No
Yes
No
No
Yes
Yes
Yes
N/A
Yes
Yes
Yes
HTTP
No
No
No
Yes
No
No
No
No
Yes
Distributed monitoring
with NSClient++
Configuration changes
NRPE
Main configuration change
◦ Allow multiple modules
◦ Allow more configuration
◦ No “NRPE Handler” support
Replaced by CheckExternalScripts
Compatible
◦ Except for NRPE Handlers
Upgradable
◦ nscp settings --migrate-to ini
Old configuration: NRPE
[modules]
NRPEListener.dll
CheckExternalScripts.dll
[NRPE]
port=5666
allow_arguments=0
allow_nasty_meta_chars=0
allowed_hosts=192.168.0.1
[External Script]
allow_arguments=0
allow_nasty_meta_chars=0
[External Scripts]
check_es_ok=scripts\ok.bat
[External Alias]
alias_cpu=checkCPU warn=80 time=1m
New configuration: NRPE
[/modules]
NRPEListener=
CheckExternalScripts=
[/settings/NRPE/server]
port=5666
allow arguments=0
allow nasty characters=0
use ssl=true
allowed hosts=192.168.0.1
certificate=${certificate-path}/nrpe_dh_512.pem
[/settings/external scripts]
allow arguments=0
allow nasty characters=0
[/settings/external scripts/scrips]
check_es_ok=scripts\ok.bat
[/settings/external scripts/alias]
alias_cpu=checkCPU warn=80 time=1m
NSCA
Main changes
◦ Scheduling is a separate module
Main Configuration changes
◦ Schedules are much more configurable
◦ Supports multiple NSCA servers
Compatible
◦ Should be
Upgradable
◦ nscp settings --migrate-to ini
Old configuration: NSCA
[modules]
NSCAAgent.dll
CheckExternalScripts.dll
[NSCA Agent]
interval=5
encryption_method=14
password=foobar
nsca_host=192.168.0.1
nsca_port=5667
[NSCA Commands]
cpu=checkCPU warn=80 time=1m
host_check=check_ok
New configuration: NSCA
[/modules]
NSCAClient=
[/modules]
Scheduler=
[/settings/NSCA/client/targets/default]
[/settings/scheduler/schedules/default]
host=192.168.0.1
port=5667
password=secret
encryption=none
time offset=-1h
channel=NSCA
interval=5s
[/settings/scheduler/schedules]
cpu=checkCPU warn=80 time=1m
host_check=check_ok
Distribute monitoring
with NSClient++
New configuration concepts
Targets
A target defines a “target” host
There is usually a “default” target
There can be any number of targets
Targets can be either local or global
Targets consist of:
◦
◦
◦
◦
◦
◦
Host
port
address (=host:port)
alias
parent
And any arbitrary strings required
Targets (sample)
[/settings/NRPE/client/targets]
test=192.168.0.1:5666
[/settings/NRPE/client/targets/foobar]
address=192.168.0.1:5666
ssl=false
[/targets]
foobar=192.168.0.1:5666
Command Handlers
(Command) Handlers defines command
A list of command handlers
◦ <name> = <command line>
Syntax is the “same” (as for nscp.exe)
◦ In the future you will be able to configure these more
[/settings/NRPE/client/handlers]
test=query --host 192.168.0.1
--command $ARG1$
Distribute monitoring
with NSClient++
Scenarios
NRPE to NSCA proxy
Purpose
◦ Setup checking by proxy
Required components
◦ Scheduler
Running our checks
◦ NRPEClient
Execute checks
◦ NSCAClient
Forward results
Experimentalness
◦ Low
The Concept
NSClient++
Scheduler
nrpe
Network
forward request
nsca
nsca
NRPEClient
NSCAClient
Network
nsca
Config: Schedule commands
[/modules]
Scheduler=1
[/settings/scheduler/schedules/sample]
channel=NSCA
alias=system_x_ok
command=check_r_ok x
interval=5s
Config: Execute Commands
[/modules]
NRPEClient=1
[/settings/nrpe/client/targets]
x=nrpe://192.168.0.1:5666
[/settings/nrpe/client/handlers]
check_r_ok=query --command check_ok
--target $ARG1$
Config: Forward results
[/modules]
NSCAClient=1
[/settings/nsca/client/targets/default]
host=192.168.0.1
password=secret
encryption=none
time offset=-1h
Testing
nscp test
Demo?
Eventlog to syslog forwarder
Purpose
◦ Forward eventlog errors to syslogserver
Required components
◦ CheckEventlog
Running in ”active mode”
◦ SyslogClient
Setup to forward notifications
Experimentalness
◦ Medium
The Concept
NSClient++
CheckEventlog
syslog
syslog
syslog
Network
SyslogClient
Config: Listening for events
[/modules]
CheckEventLog=1
[/settings/eventlog/real-time]
enabled=true
destination=syslog
filter=type NOT IN ('success', 'info', 'auditSuccess')
log=application,system
Config: Listening for events (Short)
[/modules]
CheckEventLog=1
[/settings/eventlog/real-time]
enabled=true
destination=syslog
Config: Forward to syslog
[/modules]
SyslogClient=1
[/settings/syslog/client/targets/default]
facility=kernel
tag template=NSClient
message template=%message%
host=192.168.0.1
Config: Forward to syslog
(short)
[/modules]
SyslogClient=1
[/settings/syslog/client/targets]
default=192.168.0.1
Testing
nscp eventlog --exec insert
--source SQLBrowser
49230 = 1100 0000 0100 1110
--id 3
(error + 78)
--type warning
--event-argument a --event-argument b
--facility 78 --severity error
<Event>
<System>
<Provider Name="SQLBrowser" />
<EventID Qualifiers="49230">3</EventID>
<Level>3</Level>
<!-- ... -->
</System>
<EventData>
<Data>a</Data>
<Data>b</Data>
</EventData>
</Event>
Demo?
Scripting
Purpose
◦ Automatically add Nagios configuration
Required components
◦ PythonScript
Running the script
◦ NSCAServer
Responds to submissions
◦ NSCAClient
Forwars modified submissions
Experimentalness
◦ High
The Concept
NSClient++
send_nsca
Network
NSCAServer
Channel
Channel 11
PythonScript
Channel
Channel 22
nsca
Network
NSCAClient
Configuration: Receive Results
[/modules]
NSCAServer=1
[/settings/nsca/server]
port=5668
inbox=channel_1
encryption=none
password=secret
allowed hosts=192.168.0.1,127.0.0.1
Config: Forward results
[/modules]
NSCAClient=1
[/settings/nsca/client]
channel=channel_2
[/settings/nsca/client/targets/default]
host=192.168.0.1
password=secret
encryption=none
time offset=-1h
Configuration: Setup Python
[/modules]
PythonScript=1
[/settings/python/scripts]
f=forward.py
Writing a Script
from NSCP import Registry, Core, log, log_error
import unicodedata
core = Core.get()
def process(channel, source, command, code, message, perf):
message = unicodedata.normalize('NFKD', message).encode('ascii','ignore')
core.simple_submit('channel_2', command, code,
'PythonEnhanced: %s'%message, perf)
def init(plugin_id, plugin_alias, script_alias):
reg = Registry.get(plugin_id)
reg.simple_subscription('channel_1', process)
def shutdown():
None
Writing a Script
from NSCP import Registry, Core
core = Core.get()
def process(channel, src, cmd, code, msg, perf):
core.simple_submit('channel_2', cmd, code,
'PythonEnhanced: %s'%msg, perf)
def init(plugin_id, plugin_alias, script_alias):
reg = Registry.get(plugin_id)
reg.simple_subscription('channel_1', process)
def shutdown():
None
Testing
nscp nsca
–-exec submit
-–message “Hello World”
--host 192.168.0.1
--password secret –-encryption none
Distribute monitoring
with NSClient++
Summary
My vision for the future…
Should be simple right?
Distributed Internet
Monitoring Network!
Questions?
Thank You!
[email protected]
http://www.
.com/in/mickem
http://blog.medin.name/
http://nsclient.org
facebook.com/nsclient
http://nsclient.org/nscp/conferances/osmc/2011/