Way to replicate external commands to failover server?

Mark Wagner markwag at u.washington.edu
Wed May 14 02:54:17 CEST 2008


> On Mon, May 12, 2008 at 12:11:44PM -0700, nagios-users-request at lists.sourceforge.net wrote:

> What has me a little concerned is that if someone went into the web 
> interface on the main server and say scheduled downtime or disabled 
> notifications, the backup server would never know about it.  In the even 
> to failure people could find themselves getting alerts for a host that 
> should have been in scheduled downtime (or it was on the main server).

> While I realize I would not want to capture and retransmit *all* 
> external commands to the backup host, if I could somehow get at them I 
> could filter them over to the backup host (i.e. "ignore most commands, 
> but pass a few like downtime or host notifications", etc).

While I'm at it here are more wrinkles.

Suppose your main and backup web servers are truly passive. What happens
when somebody runs the SCHEDULE_FORCED_SVC_CHECK command from the web
interface? Nothing, unless you relay this command to the Nagios box that
actually does the checking (i.e. the collector).

Now, you may not care about this. Perhaps you are thinking "I'll just set my
retry_interval to 1 minute for everything" but now you have constrained
yourself. We use Nagios to check SSL certs with a day-long interval
for checking and retrying. In this case setting the retry interval to 1 min
doesn't hurt but there will come a day when you have a service that requires
a different retry_interval.

Alternatively you can educate the ops people that some things in the web
interface don't work (SCHEDULE_FORCED_SVC_CHECK). Educating ops is hard,
especially when commands are presented that actually do nothing.

What happens when Nagios is beating up a box and you want to
DISABLE_HOST_SVC_CHECKS? Again, you'll need to relay this to the
collector.

Possibly you just tell the ops people to do these things through the
web interface on the collector. I think that would lead to
confusion. Assuming you have top-notch ops and they won't get confused
you still have to run Apache on the collector and manage users on it now.

In our simple redundant collector config there are at least two collectors
checking each service. Now you have to do the above twice, after you
have figured out which collectors are checking a service.

>From considerations like these I don't think Nagios works well in a
distributed config. It works "good enough" and for me the other features
far outweigh these issues.

I've been talking about 2.x. The 3.0 version may have solved these issues
by making a distributed config something more than a hack. I envision
adding the following to a host config stanza:

	web		<web1>,<web2>
	collector	<collector1>,<collector2>

Then Nagios can figure out the details of synchronizing the information
in itself.

-- 
Mark Wagner <markwag at u.washington.edu>
System Administrator, UW Medicine IT Services
206-616-6119

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list