several host notifications for 1 host

Hendrik Bäcker andurin at process-zero.de
Thu May 24 20:51:01 CEST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Sorry for double posting...

I have some additional Information.

In checks.c around line 1189 the service checks goes their ways.

The doubled host notifications seems only to get out if you are not
using the aggressive_host_checking so there seem to be a logical race
condition and nagios just fakes the host check and doesn't care about
hard or soft state.

In my opinion it would be nicer to do one further check to determine if
the cached host check is in hard or soft state to supress doubled host
notification while doing parallel regular host checks.

Please don't ask me what to do if the cached host check lookup results
in a soft state... there are some difficulties...

Hopefully we can discuss this:

If I am right we have two parallel ways. First the regular host checks,
say with a max_attempt of 10 and a retry interval of 1 (just say 60
seconds per interval). No aggressive host checking enabled!

While doing host checking on a DOWN host we have:

1. Host check 1 -> DOWN (soft)
tick tack
2. Host check 2 -> DOWN (soft)
tick (no tack)
2a. a service check resulting in non-OK state ==> host checking!
2b. last host check (2.) was down
 ==> Nagios fakes the last known host check as a hard state and sends
out a host notification
...
...
...
10. Host check 10 -> DOWN (HARD) (max_attempts reached)

The option use_aggressive_host_checking = 1 does only force a real host
check instead of just look at the last cached check.

I think the main improvements of the new check logic lies in using
cached host checks, but since I know Nagios there was the hard rule "No
Notifications until HARD State".

One step against faking the host state blindly would be to check if the
host state is in hard or soft state.
When in soft, don't send a Host Notification but what else?
The cached Host checks tells us that the host is down/unreachable, the
service is in a non-OK State, so how should we decide what to do?
Should we just ignore this?

I really don't have an idea.

Another point might be to check the last notification time and suppress
any host notifcation after the first until the notification interval
reach the end?
This would mean to break the Notification on HARD State rule but solves
the problem with more then one Host Notification.

Any Ideas?

(as in my other posts: hit me if I am wrong :-) )

Hendrik
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (MingW32)

iD8DBQFGVd6UlI0PwfxLQjkRAm4sAJ0c804AdQmrVfrFbAtNZnSs68P4zgCfWSaO
QOcMHxvDaA9ft0lI/QZESYQ=
=CZ9E
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/




More information about the Developers mailing list