Unclear on mapping of passive checks to state changes Was: dry alarm contact monitor.

Stanley Hopcroft Stanley.Hopcroft at IPAustralia.Gov.AU
Thu Feb 12 21:58:42 CET 2004


Dear Sir,

I am writing to thank you for your letter and say,

On Thu, Feb 12, 2004 at 10:54:53AM -0600, Neil wrote:
> Stanley Hopcroft writes: 
> 
> > Dear Sir, 
> > 
> > 
> > Monitoring log files works like this this 
> > 
> > 1 Daemon (Sec) _or_ code like check_log 'tails' (reads the records from
> > where it last finished reading to the end of file) the log file and
> > recognises any records of interest 
> > 
> > 2 This code (the daemon or the custom check or check log) determines
> > through some internal logic such as pattern matching whether there has
> > been a state change, whether this is a duplicate record, and whether any
> > state change is significant.
> 
> Hi Stanley, 
> 
> I tried searching google about Sec and got tons of return results. Can you 
> share me the link to this daemon.

Sec is distributed under the terms of GNU General Public License, 
and can be downloaded from http://kodu.neti.ee/~risto/sec/

>If it's ok, can I also get a copy of the 
> script that checks the logs? 
>

There is _no_ script that checks the logs. That is why it is so cool.

1 Sec startup parms indicate which log(s) it will check

2 Sec configuration must 

- recognise events (ie messages in the log you want Nag to be aware of -
by submitting passive service check results)

- define actions corresponding to those events - in this case formatting
a Nag passive service check result of the correct severity, for the Nag
service on the Nag host, and containing the 'plugin output'.

Each event (set of messages) corresponds to a rule in the Sec
configuration. Rules match patterns against the messages to recognise
the event and then complete actions such as sending email, writing to a
file, spawning a process ..

Here is a complete Sec rule to process 'SNMP Authentication Trap
failure' messages in the snmptrapd log.

# 5 Auth traps caused by spooler scanning for printers with Snmp
 
# Nov 11 08:53:21 tsitc snmptrapd[24368]: 10.0.0.2: Authentication
Failure Trap (0) Uptime: 226
# Nov 11 08:53:21 tsitc snmptrapd[24368]: 10.0.0.25: Authentication
Failure Trap (0) Uptime: 243, OLD-CISCO-CPU-MIB::lcpu.5.0 =
IpAddress: 10.0.100.128
 
# $1 == IP Address responsible for Auth failure
 
# %n    Name (if spooler) of host responsible for trap
# %a    IP address responsible for trap
# %o    'plugin output'
 
# compress Auth Failure Trap events (1 each 15 minutes) _and_ set a
context (effectively an alarm) that will either
# be reset      if there is another event between 15 (900 secs) and 16
minutes (960 secs) after the first  one
# time out      if no events occur for 16 minutes after the first one.
 
# If the context times out, then a reset message will be sent to Nagios.
 
type=SingleWithSuppress
ptype=RegExp
pattern=Authentication Failure Trap .+?IpAddress: (\S+)
desc=Authentication traps
action=assign %a $1;                                                                                                                     
   eval %n ( $a = '%a'; %%hn = ('10.0.100.128' => 'foo',
                                '10.0.100.252' => 'bar',
                                '10.0.100.29'  => 'baz',
                                '10.0.0.201'   => 'blech');  
             exists $hn{$a} ? "$hn{$a}/$a" : "Unknown hostname/$a" ;);                                                               
   eval %o ('Trap from %n. Print spooler may be scanning all addresses
with Snmp to discover an offline printer.') ;                 
   assign %h tsitc;                                                                                                                  
   delete auth_traps_seen;                                                                                                           
   create auth_traps_seen 960 ( assign %o No auth traps caused by %n for
last 16 minutes.;                                      
                                write
/usr/local/nagios/var/rw/nagios.cmd ([%u]
PROCESS_SERVICE_CHECK_RESULT;%h;%s;0;%o);      
                              ) ;                                                                                               
   write  /usr/local/nagios/var/rw/nagios.cmd ([%u]
PROCESS_SERVICE_CHECK_RESULT;%h;%s;2;%o)
window=900

The last write(..;2;..) sends Nag a CRITICAL passive service check
result if the Auth Trap messages are found.

The first write(..;0;..) sends Nag an OK passive service check result if
after 16 minutes there are no more trap messages.

This ability to detect the absence of an event (by timeout) and thereby
reset the Nagios passive service check is very cool indeed.

The first part of the rule (the eval %n stuff) determines the Nag host
name from a few known scanning culprits.

Here is what Nag sees.

Thu Feb 12 17:03:38 EXTERNAL
COMMAND: PROCESS_SERVICE_CHECK_RESULT;tsitc;Authentication traps;2;Trap
from IpaPrint/10.0.100.128. Print spooler may be scanning all addresses
with Snmp to discover an offline printer.
Thu Feb 12 17:03:38 SERVICE ALERT: tsitc;Authentication
traps;CRITICAL;HARD;1;Trap from IpaPrint/10.0.100.128. Print spooler may
be scanning all addresses with Snmp to discover an offline printer.
Thu Feb 12 17:03:38 SERVICE NOTIFICATION: anwsum2;tsitc;Authentication
traps;CRITICAL;notify-by-epager;Trap from IpaPrint/10.0.100.128. Print
spooler may be scanning all addresses with Snmp to discover an offline
printer.

.. still scanning ...

Thu Feb 12 17:18:38 EXTERNAL
COMMAND: PROCESS_SERVICE_CHECK_RESULT;tsitc;Authentication traps;2;Trap
from IpaPrint/10.0.100.128. Print spooler may be scanning all addresses
with Snmp to discover an offline printer.
Thu Feb 12 17:18:38 SERVICE ALERT: tsitc;Authentication
traps;CRITICAL;HARD;1;Trap from IpaPrint/10.0.100.128. Print spooler may
be scanning all addresses with Snmp to discover an offline printer.
Thu Feb 12 17:18:38 SERVICE NOTIFICATION: anwsum2;tsitc;Authentication
traps;CRITICAL;notify-by-epager;Trap from IpaPrint/10.0.100.128. Print
spooler may be scanning all addresses with Snmp to discover an offline
printer.

.. stopped ..

Thu Feb 12 17:34:38 EXTERNAL
COMMAND: PROCESS_SERVICE_CHECK_RESULT;tsitc;Authentication traps;0;No
auth traps caused by IpaPrint/10.0.100.128 for last 16 minutes.
Thu Feb 12 17:34:38 SERVICE ALERT: tsitc;Authentication
traps;OK;HARD;1;No auth traps caused by IpaPrint/10.0.100.128 for last
16 minutes.
Thu Feb 12 17:34:38 SERVICE NOTIFICATION: anwsum2;tsitc;Authentication
traps;OK;notify-by-epager;No auth traps caused by IpaPrint/10.0.100.128
for last 16 minutes.

Sec is _not_ easy to get started with (like Nag a little) but it is
_very_ easy to live with.

 
> Thank you in advance. 
> 
> Ronneil

Yours sincerely.

-- 
------------------------------------------------------------------------
Stanley Hopcroft
------------------------------------------------------------------------

'...No man is an island, entire of itself; every man is a piece of the
continent, a part of the main. If a clod be washed away by the sea,
Europe is the less, as well as if a promontory were, as well as if a
manor of thy friend's or of thine own were. Any man's death diminishes
me, because I am involved in mankind; and therefore never send to know
for whom the bell tolls; it tolls for thee...'

from Meditation 17, J Donne.


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list