fun with (silent) change from HARD to SOFT state

Michal Svoboda pht at spatium.org
Fri Jan 23 15:59:15 CET 2009


Hello,

I've discovered a weird behavior, which can be replicated thus:

1. Let a service be configured for max attempts N before going to HARD
   non-ok state

2. Make the service fail and wait for N checks to pass (ie. until the
   service enters N/N HARD non-ok state); at this point notifications
   are sent, etc.

3. Change the configuration of the service to have M > N max attempts
   and restart nagios

4. Now the state of the service is N/M _HARD_ non-ok

5. If the N+1th check results in non-ok, then the service state goes to
   N+1/M _SOFT_

6. If some future check results in ok, then the service performs a SOFT
   recovery; this results at least in no recovery notifications

6a. if the condition in (5) does not occur, ie. the N+1th check results
    immediately in ok, the service still performs a SOFT recovery from
    an apparently HARD state (even according to the logs)

Now, one way to look at this behavior is that it is logical, because
I've fiddled with the config, and I can expect anomalies and blah blah.

Another way to look at it is that there have been notifications sent in
step (2), yet there are no recovery notifications; in other words, once
the sirens have been sounded (and the fire brigade is on the way, and
the president is being woken up), they should be also properly shut off.

So the question is, whether or not introduce a patch that prevents
entering a SOFT state once a service (or a host) is already in a HARD
non-ok state?


With regards,
Michal Svoboda

------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword




More information about the Developers mailing list