Nagios v3.5.0 transitioning immediately to a HARD state upon host problem

C. Bensend benny at bennyvision.com
Sat May 25 16:11:56 CEST 2013


> On 2013-05-23 17:43, C. Bensend wrote:
>>
>> Hey folks,
>>
>>     I recently made two major changes to my Nagios environment:
>>
>> 1) I upgraded to v3.5.0.
>> 2) I moved from a single server to two pollers sending passive
>>     results to one central console server.
>>
>>     Now, this new distributed system was in place for several months
>> while I tested, and it worked fine.  HOWEVER, since this was running
>> in parallel with my production system, notifications were disabled.
>> Hence, I didn't see this problem until I cut over for real and
>> enabled notifications.
>>
>> (please excuse any cut-n-paste ugliness, had to send this info from
>> my work account via Outlook and then try to cleanse and reformat
>> via Squirrelmail)
>>
>>     As a test and to capture information, I reboot 'hostname'.  This
>> log is from the nagios-console host, which is the host that accepts
>> the passive check results and sends notifications.  Here is the
>> console host receiving a service check failure when the host is
>> restarting:
>>
>> May 22 15:57:10 nagios-console nagios: SERVICE ALERT: hostname;/var disk
>> queue;CRITICAL;SOFT;1;Connection refused by host
>>
>>
>> So, the distributed poller system checks the host and sends its
>> results to the console server:
>>
>> May 22 15:57:30 nagios-console nagios: HOST
>> ALERT:hostname;DOWN;SOFT;1;CRITICAL - Host Unreachable (a.b.c.d)
>>
>>
>> And then the centralized server IMMEDIATELY goes into a hard state,
>> which triggers a  notification:
>>
>> May 22 15:57:30 nagios-console nagios: HOST ALERT:
>> hostname;DOWN;HARD;1;CRITICAL - Host Unreachable (a.b.c.d)
>> May 22 15:57:30 nagios-console nagios: HOST NOTIFICATION:
>> cbensend;hostname;DOWN;host-notify-by-email-test;CRITICAL -
>> Host Unreachable (a.b.c.d)
>>
>>
>>     Um.  Wat?  Why would the console immediately trigger a hard
>> state? The config files don't support this decision.  And this
>> IS a problem with the console server - the distributed monitors
>> continue checking the host for 6 times like they should.  But
>> for some reason, the centralized console just immediately
>> calls it a hard state.

*snip*

>
>
> Set passive_host_checks_are_soft=1 in nagios.cfg on your master
> server and things should start working as intended.
>
> --
> Andreas Ericsson                   andreas.ericsson at op5.se

Oh lord, THANK YOU.  That appears to have fixed that problem, which
was a pain in the ass.  In my defense, I *did* see that option, but
the way I interpreted the comments didn't quite match up with the
behavior I was seeing.  I should have experimented with it, I guess.
A slight adjustment to the comments would have thrown a red flag for
me - perhaps this is just a matter of personal interpretation, but
maybe the comments could be a bit more specific:


diff -uNp nagios-updated.cfg nagios.cfg
--- nagios-updated.cfg  Sat May 25 09:05:09 2013
+++ nagios.cfg  Sat May 25 09:02:37 2013
@@ -981,9 +981,9 @@ translate_passive_host_checks=0

 # PASSIVE HOST CHECKS ARE SOFT OPTION
 # This determines whether or not Nagios will treat passive host
-# checks as being HARD or SOFT.  By default, a single passive host
-# check result will put a host into an immediate HARD state type.
-# This can be changed by enabling this option.
+# checks as being HARD or SOFT.  By default, a passive host check
+# result will put a host into a HARD state type.  This can be changed
+# by enabling this option.
 # Values: 0 = passive checks are HARD, 1 = passive checks are SOFT

 passive_host_checks_are_soft=0


Does that make sense?  If I had read something like that, it would
have been immediately clear to me what was happening.

Thank you so much, Andreas!  On to the next problem with the
upgrade (something that can wait until next week)...

Benny


-- 
"The very existence of flamethrowers proves that sometime, somewhere,
someone said to themselves, 'You know, I want to set those people
over there on fire, but I'm just not close enough to get the job
done.'"                          -- George Carlin


------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list