HOST DOWN notification not getting resent

Andreas Ericsson ae at op5.se
Fri Aug 27 08:46:56 CEST 2004


Quanah Gibson-Mount wrote:
>> 6. Have you tried running Nagios as a foreground process while producing
>> errors like this in the configuration?
> 
> I'm not quite sure what you mean here.  We always check Nagios through 
> "-v" before we apply our configuration, and our script that applies our 
> configuration won't let you install a bad configuration.  So I'm not 
> sure what "errors likes this in the configuration" you are referring to?
> 

If you start nagios without -d it runs as a foreground process, and 
prints a lot of messages to stdout. You should try running it as such; 
nagios nagios.cfg 2>&1 | tee nagios.output
(stdout and stderr to stdout and the file named nagios.output).

When I say 'errors like this', I mean you should set up a host that 
doesn't exist and watch it fail. You can do this with a separate 
instance of only the configuration files, and merrily run it as a 
foreground process on the same system without having to impact your 
production server in the slighest.

I would recommend configuring only one host in total, and with just one 
service (any service, really) and watch it fail. Set timingperiod to 30 
and notification interval to 1. Then you should get a second 
notification going out 30 seconds after the first one. If you don't, 
kill the process and parse the output to see what Nagios is doing. You 
might want to recompile and add debugging support, so it can tell you a 
bit more precisely.

> 
>> 8. What's the normal load on the machine you're running Nagios at?
> 
> 3-4 in the Solaris world.  Note again that all service checks work just 
> fine at this load level.
> 

That's a bit steep, and should cause Nagios to lack behind. It should be 
unrelated though, as you've pointed out already.

>> 9. Are you using the default notification commands, or have you written
>> your own ones? If so, do they adhere to the NOTIFICATIONNUMBER macro?
> 
> 
> I'm using the default notification command that came with Nagios.
> 

Try writing your own (it's really easy), and have it write its own very 
simple log-file of what its being asked to do. Also include the 
notification number in there.

>> 13. If you're still out of luck then set up the simplest possible
>> configuration (one host that you can bring up and down at wish), and make
>> sure several notifications go out before you move to more advanced
>> configuration. Make a host-template that you KNOW works with this, and
>> use it for all hosts you want to resend notifications with.
> 
> I'll do that as soon as I have a secondary host to fiddle with the 
> configurations on.  I can't just take out our production monitoring 
> service. ;)
> 

You can run it perfectly fine on the same host if you like. Just set up 
a different directory with configuration files, and make sure the test 
one doesn't overwrite the production one's logs.

>> 17. If the problems still persist, buy 3 hours of support from someone,
>> and send them your configuration in a gzipped tarball.
> 
> 
> No thanks.
> 
> Before even implementing Nagios here at Stanford, I read through the 
> configuration files & played with the setup for a few weeks.  Then we 
> implemented it, and pushed it out.  The configuration pieces are rather 
> simple, and the documentation was quite thorough.  I'm not some 2-bit 
> hack who has problems understanding command prompts, etc.

I've noticed. You seem far too intelligent to remain in the questioning 
end on this list for the period of time you currently have been now.

There is however another arrangement that wouldn't cost neither you nor 
stanford anything at all, and could possibly gain some goodwill for the 
school as well;
There's a program called fping. It was written by someone at stanford, 
and was once, and possibly still is, licensed under the Stanford General 
Public Software License (I might be off on the name, but it was 
something in that direction anyways). This program was hacked to pieces 
by myself to produce a very fast and efficient version of check_ping 
(it's called check_icmp so as to not confuse anyone) that doesn't rely 
on the output of the ping residing on whatever system it may be running 
on, so we could most likely cut the noise a bit in plugin-devel if this 
got included.

However, the check_icmp plugin can't be included in the standard plugin 
distribution until we've cleared the licensing issues, so if you could 
ask one of the lawyer/licensing folks over at stanford to have a look at 
how fping is currently licensed (it might be in the public domain or 
still under the stanford license. noone knows for sure and the current 
maintainer seems to be off on a very long email-less vacation), I'd be 
more than willing to run your configuration on a variety of test-hosts 
available at our software-lab.

> I've been  administering UNIX based systems & applications for over 10
> years.  I've  yet to see anyone be able to find anything in our configuration
> that explains Nagios' behavior.  Personally, I think it is a bug in Nagios 
> running under Solaris, and I've yet to see anything that contradicts 
> that assumption at all.  We will be moving our Nagios service onto 
> Debian soon, and I'm most curious to see if the problem disappears at 
> that time.  If it does, then at least I'll be able to point at the root 
> cause.
> 

If you send me your configuration, I'd be able to tell you if running 
Solaris or Linux makes a difference (although I must admit I've never 
run into this problem on any of our (Linux-based) installations).

> --Quanah
> 

Oh.. Almost forgot. I don't remember seeing this, but I'm sure you've 
already checked you're running latest stable anyway.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list