Weirdness with remote (passive) checks. Critical on remote, OK on local?

Brian Smith bsmith at fusionbroadband.com
Tue Aug 2 21:30:05 CEST 2005


Thanks Marc for the tips, but this has just gotten weirder - neither
submit_check_result, nor submit_check_result_via_nsca, seem to ever run.
NSCA is being invoked, I see its process pop up when checks happen.
Checks are being delivered to home base, because manual Critical states
get overridden after a few minutes.  Also, I can invoke this command and
deliver a single distributed check successfully to home base:

(folder)/submit_check_result_via_nsca remotehost 'Telnet' 2 'Because I
said so'

That command successfully sends the service into a soft critical state
on the home server, and running it multiple times sends it to hard
critical.

I've tacked little "debug" lines into the submit_check_result and
submit_..._via_nsca scripts to echo their commands into a log file, and
the log file never gets appended.  So I put commands in to echo the word
'test' into the logfile, and that word never gets put in there either.  

>From the end of checkcommands.cfg:

   # 'submit_check_result' command definition
   define command{
           command_name    submit_check_result
           command_line    $USER1$/eventhandlers/distributed-
               monitoring/submit_check_result_via_nsca 
               $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$OUTPUT$'
        }

(except without the line breaks I inserted to make it behave in the
email.)

It appears I will have to trace, from the check queue to NSCA, how this
is being executed.  Can anyone tell me where in the config files the
following things could be set:

location of a custom script, if it's not set in the lines from
checkcommands.cfg above?

What return codes are used for OK, Critical, Warning, etc?  So far it
appears Nagios is sending a 0 for all cases.  If not Nagios, whatever is
invoking NSCA is sending it, or whatever is invoking the script that
invokes NSCA is.  I can't figure out what the chain of commands is here,
but I know that home base Nagios is working correctly and NSCA is
sending / receiving correctly, and remote Nagios is writing Critical in
the status logs.


And, by the way, am I correct in assuming people on the mailing list
prefer text-only emails?  Otherwise I will send as html.

Thanks again,
-- Brian




> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users- 
> admin at lists.sourceforge.net] On Behalf Of Marc Powell
> Sent: Monday, August 01, 2005 4:46 PM
> To: nagios-users at lists.sourceforge.net
> Subject: RE: [Nagios-users] Weirdness with remote (passive) checks.
> Critical on remote, OK on local?
> 
> 
> 
> > -----Original Message-----
> > From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-

> > admin at lists.sourceforge.net] On Behalf Of Brian Smith
> > Sent: Monday, August 01, 2005 4:28 PM
> > To: nagios-users at lists.sourceforge.net
> > Subject: [Nagios-users] Weirdness with remote (passive) checks.
> Critical
> > on remote, OK on local?
> >
> > Hello again guys, and thanks for the previous useful replies to 
> > other questions.
> >
> >
> >
> > Weird problem going on here, will provide as much detail as I can.
> >
> >
> >
> > We have some hosts on private IPs being monitored passively through
> NSCA
> > using remote servers running Nagios.  It's basically your textbook
> passive
> > monitoring system.
> >
> >
> >
> > Currently every switch being monitored this way that is (in real 
> > life) down or unreachable, is showing as "Status: OK,  Status
Information:
> > Connection refused or timed out."
> 
> [Aggressive snip]
> 
> >
> > Submitting a manual Critical check result puts the host properly 
> > into Critical, but it pops back to OK in a few minutes when a 
> > passive check comes in.  (so passive checks are coming in and are 
> > setting the
> state.)
> >
> >
> >
> > In the Nagios web interface it shows the hosts as a nice green OK,
> with
> > details "connection refused or timed out."
> 
> It looks like your submit_check_result script isn't sending the proper

> return code. If you look at the example script at 
> http://nagios.sourceforge.net/docs/1_0/distributed.html and the 
> arguments passed to it in the command definition, does yours set it 
> properly? That return code is how nagios determines what state a 
> service is in, not the human readable text or plugin output. It will 
> correspond to the 3rd field passed to send_nsca.
> 
> --
> Marc
> 
> 
> -------------------------------------------------------
> SF.Net email is sponsored by: Discover Easy Linux Migration Strategies

> from IBM. Find simple to follow Roadmaps, straightforward articles, 
> informative Webcasts and more! Get everything you need to get up to 
> speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=ick
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when 
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null


-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list