Notifications or host checks stopped working

Andrew Laden Andrew.Laden at tudor.com
Tue Oct 18 21:48:18 CEST 2005


Ran a few more tests, and it seems that the notification issue with
escalations was the issue.

If you use escalations, and you configure such that you do not have any
escalations in the 1st notification interval, nagios assumes there are no
notifications to be sent, and never increments the Notification Number, and
never runs through the rest of the notifications. I didn't test further, but
I suspect if you ever have a level with no notifications, it will not
continue. 

I had one user left in the 1st notification interval, and he was removed
this morning. To workaround, I created a dummy user, with a no-op
notification command, and put him(her?it?) in the 1st round. Host
notifications immediately started working again)

I'd consider this a design bug, I can see many uses for notification
intervals with no notification.

Still have the issue with an unreachable host being marked as down, but as
that was caused by a buggy service check reporting OK for an unreachable
host, I am not going to spend a lot of time on that.


-----Original Message-----
From: Andreas Ericsson [mailto:ae at op5.se] 
Sent: Tuesday, October 18, 2005 2:30 PM
To: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Notifications or host checks stopped working

Andrew Laden wrote:
> I just recently upgraded to 2.0b4.

From?

> Notifications were working ok when I
> first upgraded. 
> 

Not from 1.x then, since the macros have changed between the versions.

> Our company is having a DR test. So we shut down the routers 
> connecting one of our sites.
> 
> The GUI shows mostly correct. The two routers are listed in Network 
> outages, And it seems that the hosts that are children of those 
> routers are all being marked as unreachable instead of down.
> 
> But I am seeing some oddities. It looks like host checks are no longer 
> being scheduled at all. I have host escalations in place, and there 
> are no notifications going out on the two down routers. Current 
> Notification Number isnt increasing. They are in a Down Hard state, 
> but current attempt is stuck at a 1/5 count.
> 

Are they behind the outage, or are they the ones causingt the outage?

> 
> So, questions
> Is there a way to tell if host checks are being run?

Yes. By the status data age on the host detail view.

> They aren't in the
> scheduled queue. I set one of the down routers to up using a passive
check.
> And it looks like even when the service for it went down, the host 
> check never ran. Though when I forced the check, it ran ok.
> 

This is weird. I expect you've double-checked check_period for the host
definitions?

> I had a host that was in an unreachable state. I ran a service check 
> for that host that suceeded. The host went into a down state. But 
> again, no further host checks seem to have been run. And no 
> notifications have been sent out.
> 
> Any ideas where I can look for problems?
> 

You could try re-compiling Nagios with debug-output enabled (./configure
--help to know which debug-options to enable) and then run the same scenario
while running nagios in the foreground. This will produce quite a bit of
output, so you'll likely want to pipe it through tee for later perusal as
well.

Please don't post the debug output to the list though. If you need help with
viewing it you can put it on a web-page somewhere and then submit a link.
Sourceforge is quite busy enough without hauling 5mb files to 6000
subscribers.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list