Getting Nagios to "go active"

Andreas Ericsson ae at op5.se
Tue Jun 30 09:00:09 CEST 2009


David Krider wrote:
> I've gotten a second Nagios server setup to work as a failover for a
> primary server. I think I've been thorough. The secondary server is
> successfully receiving and processing both passive host and passive
> service checks. Notifications and both kinds of active checks are turned
> off. When I stop the Nagios process on the primary server, the secondary
> fails the freshness check of the primary instance check. However,
> nothing happens after this, and it seems that I have 2 problems.
> 
> 1) Even though I have max_check_attempts set to 1 on the master server's
> "check_nagios" check, it just continues to force active checks when the
> freshness times out. I expect it to fail hard, and stop checking the
> freshness. Maybe I'm wrong, though. Maybe the expected behavior here is
> to get the active host check going, and then the freshness will stop
> complaining.
> 
> 2) All of my scripts seem to be lined up. The event handler fires, and I
> see proper things in the nagios.cmd file.
> 
> [1246294798] ENABLE_NOTIFICATIONS
> [1246294798] START_EXECUTING_SVC_CHECKS
> [1246294820] START_EXECUTING_HOST_CHECKS
> 
> I know the command file is being processed because I can get the
> secondary server to force checks from the cgi's. However, none of these
> things commands ever work, whether I force them from the command line,
> or from the cgi's. What could be keeping these from taking effect? I've
> been all over this thing for a couple days now, and I think my eyes are
> starting to glaze over.
> 
> The only thing I can think of would be to enable all of these things in
> the master config file, but then immediately force them "off" when I
> start up the process. Then maybe it will work to turn them back on
> later? That can't be right...
> 
> Desperately,

Use merlin. It was designed for setting up redundant/loadbalanced systems
and will transfer your check-results between your two nagios instances
seemlessly. Check takeover happens automagically too, since both servers
will try to schedule the check and whichever instance happens to execute
it first will keep on executing the check until its latency reaches 15
seconds, where the server that didn't execute the check originally will
automatically take it over because it'll be in its scheduling queue at
the right moment and time.

You can find merlin at http://git.op5.org/git/nagios/merlin.git. The
project page is at http://www.op5.org/community/projects/merlin

HTH

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list