Service Configuration Question

joerg.helmert at aracomp.de joerg.helmert at aracomp.de
Wed Mar 24 15:08:10 CET 2004


> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net 
> [mailto:nagios-users-admin at lists.sourceforge.net] On Behalf 
> Of Paul L. Allen
> Sent: Wednesday, March 24, 2004 12:36 PM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] Re: Service Configuration Question
> 
> Andreas Ericsson writes: 
> 
> > I totally disagree. If the plugin fetches disk status input from 
> > nsclient
> > or nrpe (or snmp, for that matter) and can't get it, it's a 
> critical error 
> > (service not running).
> 
> Think it through.  Assume check_by_ssh is used to check disk 
> space on a remote machine.  Further assume that sshd on the 
> remote machine is down.  There is a critical problem because 
> sshd is down, but you already know that because you're using 
> check_ssh to test whether or not sshd is up, right?  The 
> state of the disk, however, is UNKNOWN.  In this situation 
> you don't know what state the disk is in, whether it is good, 
> bad or ugly. 
> 
> The same thing applies however you monitor the remote 
> service.  If whatever transfers the data about service X is 
> not working then the status of service X is unknown but the 
> transport mechanism itself is critical.  Doing things your 
> way confuses the issue and reports a critical error in the 
> wrong place (the service rather than the the transport). 
> Doing it your way, if the transport fails you get told that 
> all services monitored that way are critical (which they 
> probably aren't) when the actual failure is elsewhere.  Your 
> way means that "sshd is down and needs to be fixed" would 
> mutate into "everything on that box is dead." 
> 
> Of course, by that argument, the Nagios behaviour with 
> passive service checks is wrong. 
> 
> -- 
> Paul Allen
> Softflare Support 
This is true, but you need additional service checks for this.
Maybe ssh is not the best example, cause it might be needed for some other
production work, not only for monitoring.

Take snmp. It is there only for monitoring.
If a plugin like check_snmp returns "unable to connect" I see, that this is
not the status of my disk and react accordingly.
Advantage is, I do not need an additional check for snmp.
Of course, if I run 20 or so checks by snmp against one host, it might be
wise to implement a check if snmp is up itself.
That way making one able to make all the "checks by snmp" dependent on that
"check of snmp".
Would give 1 notification instead of 20 if snmpd dies...

Anyway,
A similar discussion was going on at nagiosplug-devel at lists.sourceforge.net.
At the end Karl DeBisschop and others helped me to come to following
conclusion:

<cited>
> > gets corrupted or whatever) It is true, that the status of
> that check
> > in reality is unknown. But for me the overal picture is more
> > important. Something is going wrong after it was ok.
> > I want to KNOW a status but only find out that the status 
> is unknown.
> > That is critical for me.
> 
> Then you set nagios to page you for UNKNOWN.
> 
Hmmh, do not know why my mind assumed that only CRITICAL sends
Notifications...

You're right.

So I'll:
- Implement plugins with notifications for unknown disabled
- check if I get unknowns from missing options or similar
  and correct if necessary
- let it run a while
- enable notifications for unknown
</cited>

The conclusion is, that it is enough to send an unknown state and that what
_I_ wanted to achieve can be done with that.

Bye,

Joerg



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list