Event handler OK from command line, not from nagios

Glenn A. Meisenheimer gmeisenheimer at itgroundwork.com
Wed Jul 7 23:31:44 CEST 2004


Friends,

My problem is this.  The print spooler on a Win 2k machine keeps hanging up.
Remedial action typically requires a sysadmin to stop and restart the 
print  spooler
on the windows box (There are multiple boxes, btw).

So I generated this .BAT file on the windows host:

ECHO. >> c:\nrpe-nt\rspooler.log
ECHO |DATE |find "current" >> c:\nrpe-nt\rspooler.log
ECHO |TIME |find "current" >> c:\nrpe-nt\rspooler.log
ECHO Resetting the Print Spooler >> c:\nrpe-nt\rspooler.log
NET STOP SPOOLER >> c:\nrpe-nt\rspooler.log
NET START SPOOLER >> c:\nrpe-nt\rspooler.log

If the logfile rspooler.log already exists this script appends a 
date/time stamp,
echos that it is resetting the spooler, then redirects the output from the
NET STOP and NET START command into that log file as well.

This gives us a record of restarts.

I have installed nrpe-nt on the windows box, configured nrpe.cfg thusly:

command[check_rspooler]=C:\nrpe-nt\rspooler.bat

So if check_nrpe on the Nagios server calls check_rspooler on the win 2k
box, it should run the rspooler.bat script listed above.

On the Nagios server check_nrpe is configured like this:

#
#  NRPE Command
define command{
        command_name    check_nrpe
        command_line    /usr/local/nagios/libexec/check_nrpe -H 
$HOSTADDRESS$ -c $ARG1$
        }

And indeed if , from the command line I type:

/usr/local/nagios/libexec/check_nrpe -H 192.168.1.31 -c check_rspooler

The rspooler.bat file on the win 2k Box does run, and does log the event.

So what I need to do is write an event handler which calls rspooler.bat 
using the check_nrpe
command above.   That event handler is located in 
/usr/local/nagios/eventhandler and is
named reset_spooler_04. It is owned by nagios/nagios. 

The reset_spooler_04 event handler script follows:

                                                     cut here
****************************************************************
#!/bin/bash
#
# Event handler script for executing nrpe recovery scripts on a remote 
machine
#
# Note: This script will only execute a recovery script if the service is
# retried 3 times (in a "soft" state) or if the associated monitor somehow
# manages to fall into a "hard" error state.
#
# What state is the service in?

case "$1" in
OK)
     # The service just came back up, so don't do anything...
     ;;

WARNING)
     # We don't really care about warning states, since the service is 
probably still running...
     ;;
UNKNOWN)
     # We don't know what might be causing an unknown error, so don't do 
anything...
     ;;

CRITICAL)
     # Aha! The service appears to have a problem - perhaps we should 
run the recovery script...

     # Is this a "soft" or a "hard" state?

     case "$2" in

          # We're in a "soft" state, meaning that Nagios is in the 
middle of retrying the
          # check before it turns into a "hard" state and contacts get 
notified...

          SOFT)
 
               # What check attempt are we on? We don't want to restart 
the web server on the first
               # check, because it may just be a fluke!

               case "$3" in

                    # Wait until the check has been tried 4 times before 
running the recovery script.
                    # If the check fails on the 4th time (after 
recovery), the state type will turn to
                    # "hard" and contacts will be notified of the 
problem.  Hopefully this will restart
                    # things successfully, so the 4th check will result 
in a "soft" recovery. If that
                    # happens no one gets notified because we # fixed 
the problem!

               4)

                    echo -n "Restarting print spooler  service (4th soft 
critical state)..."
                    # Call the check_nrpe plugin to execute the recovery 
script.

                    /usr/local/nagios/libexec/check_nrpe -H 192.168.1.31 
-c check_rspooler

               ;;
               esac

          ;;
    
               # The monitor somehow managed to turn into a hard error 
without getting fixed.
               # It should have been restarted by the code above, but 
for some reason it didn't.
               # Let's give it one last try, shall we?
               # Note: Contacts have already been notified of a problem 
with the service at this
               # point (unless you disabled notifications for this service)

          HARD)

               echo -n "Restarting service..."

               /usr/local/nagios/libexec/check_nrpe -H 192.168.1.31 -c 
check_rspooler

          ;;
          esac
;;
esac
exit 0

****************************************************************
                                                     cut here

Now if I cd to /usr/local/nagios/eventhandler and enter this command:

./reset_spooler_04 CRITICAL HARD 4

The rspooler.bat file on the Win 2k box actually runs and logs the reset.

So far so good.  Now I need to set up a service definition for the event 
handler:

#
#  reset_spooler_04 Command
define command{
        command_name    reset_spooler_04
        command_line    /usr/local/nagios/eventhandler/reset_spooler_04
        }

And I need to include the event handler in the service definition:

define service{
        host_name            fc-ctx-04
        use                        rs-windows-service
        service_description        printq_service
        max_check_attempts         5
       event_handler        reset_spooler_04
        normal_check_interval      10
        retry_check_interval       1
        check_command            check_nt_perf!"\\Print 
Queue(_Total)\\Jobs"!4!5
        }

So now the big picture.  We have a working monitor on the Nagios server 
which monitors
the number of print jobs in the Win 2k print queue.  We generate an 
alarm when the number
of jobs in the queue exceeds 5.  I can test this by pulling the paper 
tray on the printer and
queueing up print jobs.  The montor works fine..  It goes into alarm 
when the print queue
gets up to 5 jobs. 

But the event handler doesn't work properly.  Here's the log:

[07-02-2004 13:51:41] SERVICE EVENT HANDLER: 
fc-ctx-04;printq_service;CRITICAL;HARD;5;reset_spooler_04
[07-02-2004 13:51:41] SERVICE ALERT: 
fc-ctx-04;printq_service;CRITICAL;HARD;5;6
[07-02-2004 13:50:41] SERVICE EVENT HANDLER: 
fc-ctx-04;printq_service;CRITICAL;SOFT;4;reset_spooler_04
[07-02-2004 13:50:41] SERVICE ALERT: 
fc-ctx-04;printq_service;CRITICAL;SOFT;4;6
[07-02-2004 13:49:42] SERVICE EVENT HANDLER: 
fc-ctx-04;printq_service;CRITICAL;SOFT;3;reset_spooler_04
[07-02-2004 13:49:42] SERVICE ALERT: 
fc-ctx-04;printq_service;CRITICAL;SOFT;3;6
[07-02-2004 13:48:36] SERVICE EVENT HANDLER: 
fc-ctx-04;printq_service;CRITICAL;SOFT;2;reset_spooler_04
[07-02-2004 13:48:36] SERVICE ALERT: 
fc-ctx-04;printq_service;CRITICAL;SOFT;2;7
[07-02-2004 13:47:37] SERVICE EVENT HANDLER: 
fc-ctx-04;printq_service;CRITICAL;SOFT;1;reset_spooler_04
[07-02-2004 13:47:37] SERVICE ALERT: 
fc-ctx-04;printq_service;CRITICAL;SOFT;1;7

So, it seems to me that Nagios is calling the event handler, and is 
calling it n a CRITICAL HARD state.  Why, oh why, I
wonder, isn't the rest of the system working?  The above event did NOT 
result in an entry in the rspooler.log on the
Win 2k machine.

As I said earlier, if I call the event handler as user nagios from the 
command line, it does run rspooler and makes the
log entries on the Win 2k machine.

Any help would be appreciated...

Glenn Meisenheimer







-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list