host checks not running until services are restored

Marc Powell marc at ena.com
Fri Apr 8 23:41:39 CEST 2005



> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of frank
> Sent: Friday, April 08, 2005 3:24 PM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] host checks not running until services are
> restored
> 
> 
> Nagios 2.0b2 on Debian Sarge. Been running for about a month.
> 
> All my hosts use check_icmp (symlinked as check_host) for HOST checks.
> Service checks for this particular host are done over SNMP.
> 
> I had a host go down last night at 1:16am. When it was rebooted, the
SNMP
> daemon was not restarted because I failed to add it to system startup
> scripts. My fault of course. So I would expect all the _service_
checks to
> fail in this case. And they did.
> 
> What confuses me is that the HOST checks (icmp) weren't running at all
> until I restarted snmpd, allowing the service checks to complete
properly.
> It appears that the host checks ran 10 times at 1:16am (per our global
> host config), sent out the "host down" alert, and then slept for over
9.5
> hours while the SNMP daemon was down. Meanwhile, service checks
continued
> to run and return "UNKNOWN" values because of their inability to
contact
> snmpd.
> 
> Is this expected behavior? Is it because the service checks return
UNKNOWN

Yes, this is expected behavior. Host status is not regularly checked
under any circumstances. Host checks are essentially only run under two
conditions --

	1) A service goes down and the host is currently OK
	2) A service recovers and the host is currently not OK


> instead of CRITICAL? I thought the proper action to be taken when a
> service check returns not-OK is to re-execute the host-check. Is this
> incorrect?

Your assumption is incorrect. See above. The exception to this appears
to be if you have aggressive host checks enabled which forces a check
regardless of the previous host state. That will likely increase your
check latency and is discouraged. Nagios will stop _everything_ else it
is doing while checking your host so if you check it every time you
check a service that isn't OK you're going to really limit the number of
checks you can perform. In the event of a major outage nagios will be so
bogged down trying to accomplish host checks that it'll be useless for
actually determining what's down on your network.

I'd suggest you just add a ping service check in addition to your SNMP
based checks. If the host is up, ping will respond and your host status
will be checked appropriately regardless of whether your snmpd service
is running or not.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list