Service hard state generation and host hard or soft down status

Andreas Ericsson ae at op5.se
Fri May 4 13:30:03 CEST 2012


On 05/04/2012 12:16 PM, Paul Ezvan wrote:
> Hi dear Nagios users,
> 
> I have some interrogation about hard state generation.
> 
> According to the documentation, one of the condition to create a hard
> non-ok state for a service is to get a check in a non-ok state while the
> associated host is down. But it is not stated if the host should be down
> HARD or not.
> 
> The current behavior of Nagios is clearly ignoring if the host is in
> SOFT or HARD down state, for example :
> 
> [1336039429] INITIAL HOST STATE: ces;UP;HARD;1;PING OK - Packet loss =
> 0%, RTA = 0.42 ms
> [1336039429] INITIAL SERVICE STATE:
> ces;SV-SE-Linux-Memoire;OK;HARD;1;OK: Memory Usage (W>  95): 12%Swap
> Usage (W>  95, C>  99): 0%
> [1336039429] INITIAL SERVICE STATE: ces;SV-SE-Linux-SWAP;OK;HARD;1;SWAP
> OK - 100% free (3999 MB out of 3999 MB)
> [1336039747] HOST ALERT: ces;DOWN;SOFT;1;CRITICAL - Host Unreachable
> (10.235.72.159)
> [1336039812] HOST ALERT: ces;DOWN;SOFT;2;CRITICAL - Host Unreachable
> (10.235.72.159)
> [1336039822] SERVICE ALERT:
> ces;SV-SE-Linux-SWAP;CRITICAL;HARD;1;Connection refused or timed out
> [1336039822] SERVICE ALERT:
> ces;SV-SE-Linux-Memoire;CRITICAL;HARD;1;Connection refused or timed out
> [1336039877] HOST ALERT: ces;DOWN;HARD;3;CRITICAL - Host Unreachable
> (10.235.72.159)
> [1336040122] SERVICE ALERT:
> ces;SV-SE-Linux-Memoire;CRITICAL;HARD;1;Connection refused or timed out
> [1336040122] SERVICE ALERT:
> ces;SV-SE-Linux-SWAP;CRITICAL;HARD;1;Connection refused or timed out
> 
> The associated service immediately get an HARD non-ok state even if the
> host is in a SOFT down state.
> 
> In the Nagios code I found in base/checks.c in non-ok state processing
> logic :
> 
> 		/* if the host is down or unreachable ... */
> 		/* 05/29/2007 NOTE: The host might be in a SOFT problem state due to
> host check retries/caching.  Not sure if we should take that into
> account and do something different or not... */
> 		if(route_result != HOST_UP) {
> 
> I think we should take into account the SOFT or HARD host state to
> ensure consistency between host and service hard/soft state.
> 
> Is my analysis correct ?
> 

Yes.

> What is your point of view about the above proposition ?
> 

That, from a practical perspective, Nagios is doing the Right Thing(tm)
already. Avoiding false positives is almost as important as catching
all true negatives, and adding soft-state logic to this would mean we
send one false positive for each failing host that happens to have a
service-check occur after the soft state but before the hard state.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list