Transcript Nagios

Nagios
System monitoring, the easy way
What is Nagios
• Nagios watches your computers through
user-defined commands
• It can be set to inform you when a service
or host becomes unavailable
• In fact, it can inform you, the Sysadmin,
your best friend, and even run commands
to try to bring a system back up
Nagios config
• The main configuration file is “nagios.cfg”
in /etc
–
–
–
–
–
–
–
–
cfg_file=/etc/contactgroups.cfg
cfg_file=/etc/contacts.cfg
cfg_file=/etc/dependencies.cfg
cfg_file=/etc/escalations.cfg
cfg_file=/etc/hostgroups.cfg
cfg_file=/etc/hosts.cfg
cfg_file=/etc/services.cfg
cfg_file=/etc/timeperiods.cfg
• These are much like #include statements,
allowing you to structure your files.
Nagios.cfg
• There are a number of other controls for
nagios, set through flags.
• These are beyond the scope of my
presentation
• Next, we must set up a plan for what
nagios will monitor
Monitoring plan
• Our main server hosts various services:
– Mail
– DNS
– DHCP
• Our second server hosts:
– DNS slave
– WWW – apache
– NFS shares
Hosts.cfg
define host{
use
host_name
alias
address
check_command
max_check_attempts
notification_interval
notification_period
notification_options
}
generic-host
; Name of host template
server1
; name of computer
server1.localdomain; canonical name
10.0.0.1
; ip address
check-host-alive
; defined in commands.cfg
10
; used when check fails
60
; how long between notification events
24x7
; defined in timeperiods.cfg
d,u,r
;
• Note that the services are not checked in this file.
• When the check command fails, the services associated
are not checked
Services.cfg
define service{
use
generic-service ; template
host_name
server1
; defined in hosts.cfg
service_description
PING
;
is_volatile
0
check_period
24x7
max_check_attempts
3
normal_check_interval
5
retry_check_interval
1
contact_groups
peoplewhocare ;defined in contactgroups
notification_interval
60
notification_period
24x7
notification_options
c,r
check_command
check_ping!100.0,20%!500.0,60%
}
• This pings the server, and notifies if the ping fails
Commands
• Installed with nagios, have various
formats.
• When they return a failure, nagios marks
that against the check attempts.
Contacts.cfg
define contact{
contact_name
nagios
alias
Nagios Admin
service_notification_period
24x7
host_notification_period
24x7
service_notification_options
w,u,c,r
host_notification_options
d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email
[email protected]
pager
[email protected]
}
• Note: a contact can have different notification than a host
• It may be a good idea to have email go to an outside address
Contactgroups.cfg
define contactgroup{
contactgroup_name
alias
members
Root
}
crit-admin
critical services
define contactgroup{
contactgroup_name
peoplewhocare
alias
minions
members
nagios
; defined in contacts
}
• A host refers a contactgroup, which contains contacts who get
notified according to their notification call
Timeperiods.cfg
define timeperiod{
timeperiod_name workhours
alias
"Normal" Working Hours
monday
08:00-17:00
tuesday
08:00-17:00
wednesday
08:00-17:00
thursday
08:00-17:00
friday
08:00-17:00
}
• This allows for anouncement only during certain times.
• Maybe you don’t want your pager going off at night?
Awesome tactics
• Oh noes, the service is down!
– So, try to stop and start it
– Then get a person involved
• Perhaps we have something like snort that
should signal
– We can look at the signal with a script run by
nagios, which can then signal with the nagios
method
Thus ends the Nagios Brief
• Everyone go back to their stuff which is
not paying attention