Notifications or host checks stopped working

Andrew Laden Andrew.Laden at tudor.com
Wed Oct 19 15:09:31 CEST 2005


I was thinking about it a bit more, and yup, its hard to think of a use for
a notification interval with no notification, other then the first (which is
where I am having the problem of course) that can't be done with suitable
settings in escalations, as long as you have the notification interval in
the host or service definition set higher then any escalation interval.
Unless you are dealing with timeperiods as well. I.e. an escalation that is
only valid during certain time periods (only escalate to the operations
center if it is running hours) in which case, you could have an escalation
defined with a valid group, but the filters kick in at the contactgroup
settings, stopping notification. Not sure how nagios would behave, if it
would increment the Notification Number.

As for the second. Yup, a host that was unreachable that gets an ok service
becomes reachable. There is no way to set is back to unreachable if the
service that was ok becomes critical again though. Would better logic be to
say that if a host is down, and its parents are down, then it should be
unreachable, regardless of its services status? Don't know. Just a question.
Sometimes you can have serivces that are out of band that may be monitored.
Ie. I want to monitor the console connection to the host, which is out of
band. That connection may stay up, even though the network may go down and
all meaningful services go down. I would want the host to be unreachable,
even though the console service may still be up, else I will get lots of
notifications that I don't need.


-----Original Message-----
From: Andreas Ericsson [mailto:ae at op5.se] 
Sent: Tuesday, October 18, 2005 5:43 PM
To: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Notifications or host checks stopped working

Andrew Laden wrote:
> Ran a few more tests, and it seems that the notification issue with 
> escalations was the issue.
> 
> If you use escalations, and you configure such that you do not have 
> any escalations in the 1st notification interval, nagios assumes there 
> are no notifications to be sent, and never increments the Notification 
> Number, and never runs through the rest of the notifications. I didn't 
> test further, but I suspect if you ever have a level with no 
> notifications, it will not continue.
> 
> I had one user left in the 1st notification interval, and he was 
> removed this morning. To workaround, I created a dummy user, with a 
> no-op notification command, and put him(her?it?) in the 1st round. 
> Host notifications immediately started working again)
> 
> I'd consider this a design bug, I can see many uses for notification 
> intervals with no notification.
> 

It's more likely just a common everyday kind of bug. I don't really see any
uses for notification intervals with no notifications though, unless you're
talking about any notification but the first.

> Still have the issue with an unreachable host being marked as down, 
> but as that was caused by a buggy service check reporting OK for an 
> unreachable host, I am not going to spend a lot of time on that.
> 

Hosts with OK services are never unreachable, insofar as Nagios is
concerned. I remember a discussion about that exact thing quite some time
ago.

> 
> -----Original Message-----
> From: Andreas Ericsson [mailto:ae at op5.se]
> Sent: Tuesday, October 18, 2005 2:30 PM
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Notifications or host checks stopped 
> working
> 
> Andrew Laden wrote:
> 
>>I just recently upgraded to 2.0b4.
> 
> 
> From?
> 
> 
>>Notifications were working ok when I
>>first upgraded. 
>>
> 
> 
> Not from 1.x then, since the macros have changed between the versions.
> 
> 
>>Our company is having a DR test. So we shut down the routers 
>>connecting one of our sites.
>>
>>The GUI shows mostly correct. The two routers are listed in Network 
>>outages, And it seems that the hosts that are children of those 
>>routers are all being marked as unreachable instead of down.
>>
>>But I am seeing some oddities. It looks like host checks are no longer 
>>being scheduled at all. I have host escalations in place, and there 
>>are no notifications going out on the two down routers. Current 
>>Notification Number isnt increasing. They are in a Down Hard state, 
>>but current attempt is stuck at a 1/5 count.
>>
> 
> 
> Are they behind the outage, or are they the ones causingt the outage?
> 
> 
>>So, questions
>>Is there a way to tell if host checks are being run?
> 
> 
> Yes. By the status data age on the host detail view.
> 
> 
>>They aren't in the
>>scheduled queue. I set one of the down routers to up using a passive
> 
> check.
> 
>>And it looks like even when the service for it went down, the host 
>>check never ran. Though when I forced the check, it ran ok.
>>
> 
> 
> This is weird. I expect you've double-checked check_period for the 
> host definitions?
> 
> 
>>I had a host that was in an unreachable state. I ran a service check 
>>for that host that suceeded. The host went into a down state. But 
>>again, no further host checks seem to have been run. And no 
>>notifications have been sent out.
>>
>>Any ideas where I can look for problems?
>>
> 
> 
> You could try re-compiling Nagios with debug-output enabled 
> (./configure --help to know which debug-options to enable) and then 
> run the same scenario while running nagios in the foreground. This 
> will produce quite a bit of output, so you'll likely want to pipe it 
> through tee for later perusal as well.
> 
> Please don't post the debug output to the list though. If you need 
> help with viewing it you can put it on a web-page somewhere and then
submit a link.
> Sourceforge is quite busy enough without hauling 5mb files to 6000 
> subscribers.
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list