Host check running before service check retry interval

Robert Nelson rnelson at windchannel.com
Wed Sep 22 12:16:50 CEST 2004


Hello,

I'm having a problem with a few hosts on our network. We're a WISP, and there are a few clients who create their own problems. Like the construction site that parks a crane in front of their radio for 10-15 minutes at a time while loading materials. However, if the radio stays down for more than 30 minutes, we care about it (Funny story, a very special crane operator lifted some steel beams up and caught them under the edge of the trailer, almost flipping it. He then proceeded to snag the steel beams on our cat5 cable going to the radio...).

I set the service checks for this one host to have a max_checks of 3 and a retry_interval of 10, which should give me 30 minutes. This never seems to happen, though. As soon as it fails once, a host check is run that fails, it puts it in a hard down state, and we're back to being notified immediately.

"When a service check results in a non-OK state, Nagios will check the host that the service is associated with to determine whether or not is up (see the note below for info on how this is done). If the host is not up (i.e. it is either down or unreachable), Nagios will immediately put the service into a hard non-OK state and it will reset the current attempt number to 1."

If I read the above correctly, that's why this is happening! Is there a suggested way to get around this and have an effective 30 minute non-OK interval before ANY notifications?


Two ways I can think of:

1) Use check_dummy for the host check. Downside to this is that host reporting will be broken and we'll be relying exclusively on the service check for reporting. This will also break the parent-child relationship built up for host UNREACHABLE notifications.

2) Find some check_whatever plugin that returns the last HARD value state for the services. i.e. if there's at least one service that is HARD OK/WARNING or in a SOFT change, return OK. If all the services are in a HARD CRITICAL/UNREACHABLE state, return a DOWN. Seems like a useful check plugin to me but I haven't found it.

Am I going about this the wrong way? I could also do escalations, but in the example I gave, I'd have to break that radio out of the radios hostgroup to eliminate early notifications, which would just plain break the usefulness of hostgroups.

Rob Nelson


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list