Reporting Status or Progress

Download Report

Transcript Reporting Status or Progress

New features in PATROL version 3
Michael Jung (TU-Berlin), Waltraut Niepraschk (DESY)
System overview
Patrol actions and resources control
Configuration
WWW Interface
Patrol usage
August 28, 1998
New features in PATROL 3.0
1
Patrol 3.0



based on SLAC patrol by C. Boheim
modified and extended by M. Jung
WWW interface in Javascript available
supported architectures
AIX
IRIX
SunOS
Linux
Solaris
DEC-Unix
HP-UX
easy adaptation to new architectures by specifying patterns for the
output of certain system commands
August 28, 1998
New features in PATROL 3.0
2
Patrol resources control
Obtaining information on
processes and daemons (ps)
file systems (df)
file sizes (ls)
services and ports (netstat)
hosts (uptime)
return codes or timing (timeout) of arbitrary commands
Resource checks are based on value and change of value
(as compared to last run of patrol)
Tests
on limits (value, value+delta, (val1, val2, val3+delta)
with relops (>, <, =, !=, =~/regexp/)
August 28, 1998
New features in PATROL 3.0
3
Patrol actions
If tests fulfil specified criteria, perform actions:
mail (to users, to admins)
kill (processes)
nice (processes)
restart (daemons)
write (to syslog)
execute system commands
execute snippets of perl code (access to patrol internal variables)
Mail texts, system commands and perl snippets are defined in blocks for
easier reference
August 28, 1998
New features in PATROL 3.0
4
Patrol configuration


patrol actions are defined as rules in a configuration file
rules do act on targets (identified by hostname, ostype, netgroup, ...)
rule format: rule_type target resource condidion action
rule types:
F
file system
HL
host, load limit
D
daemon
HT
host, uptime limit
SP system port
HU
host, user limit
PC process, cpu limit
CC
command, exit code
PM process, memory limit CT
command, time out
PT process, time limit
W
file size
PN process, number limit SP
service port
August 28, 1998
New features in PATROL 3.0
5
Configuration examples



restart sshd and notify admins by email
D *
sshd
restart(“sshd”), mail(admin, MD)
renice some jobs on IRIX systems (not Codine batch jobs)
PC [irix]
!{cod_} >50% nice(8), mail($user, admin, MPC)
watch the /usr1 file system on host hydra
F hydra

>95%
mail(admin, MF1)
notify admins, if load on aisa machines is above 2
HL aisa[0-9]

/usr1
>2
mail(admin, MHL)
notify admins, if /etc/check has nonzero return code (netgroup hps)
CC (hps)
August 28, 1998
/etc/check >0
mail(admin, MCC)
New features in PATROL 3.0
6
WWW Interface


patrol runs periodically (cron) on all hosts to be checked, no
communication between hosts, no central information retrieval
WWW interface runs periodically on a single host (WWW server)





gathers information on all hosts over a (configurable) period of time
consists of a perl script (part of patrol) and Javascript HTML files
(generated by patrol)
provides both global view of the system, information on (configurable)
subgroups and on individual hosts
can also process and display data from other (monitoring) tools
see screen dump of our system in routine use
August 28, 1998
New features in PATROL 3.0
7
Patrol usage at DESY Zeuthen
presently approximately 100 hosts controlled by patrol

patrol started by cron every 15 minutes
Controlling tasks on all hosts

Load monitoring

Execution time monitoring of user processes (except batch)

Presence of important daemons (cron, xntp, syslog, afs, batch, …)
Tasks on selected hosts (usually servers)

File system usage (/, /usr, /tmp, /home, …)

presence of daemons (named, sendmail, …)
Depending on the problem appropriate actions are taken
(mail, restart, log, …)
Observed increased stability of services for users

August 28, 1998
New features in PATROL 3.0
8