Problem with OCP_daemon in distributesenvironment

Craig Stewart Craig.Stewart at corp.xplornet.com
Wed Aug 17 14:10:23 CEST 2011


Michel,

Okay, I understand now.

So, if I get this correctly, when you were using the obsessing method,
everything was working fine from the central server's point of view, but
when you moved one remote unit from the obsessing to the OCP_daemon, the
central server stopped doing all active checks?

The way I have it set up here for my central/probe configuration is that
the central server accepts passive checks through the nscad process.  On
my remote servers they send in either via the OCP_daemon (which calls
send_nsca) or a custom obsess script.  There are no changes to my
central server.

So, unless you are doing something strange, you should be able to get it
going and executing active checks as well as accepting passive checks on
the central.  The method the probe uses, as long as it's consistent with
the way the central server picks up check (send_nsca/nscad in my case)
is independent of the central server.  If you get this working,
switching the probe from the obsess method to the OCP_daemon method
should not affect the central server, or even require a restart.

Am I making any sense here or have I confused the issue?

Craig
--
Craig Stewart
Systems Integration Analyst
Craig.Stewart at corp.xplornet.com
Xplornet - Broadband, Everywhere

On 08/16/2011 05:02 PM, michel.vdv at wxs.nl wrote:
> Hello Craig,
>  
> First of all thanks for the fast response.
> Maybe i need to clear things out a bit more to why ACTIVE checks are
> happening on the central server.
> We have a distributed setup with a central machine in DMZ reachable for
> all remote nagios machines we have out there.
> One of those is the LAN machine i mentioned where OCP_daemon was setup
> today.
> The central Nagios machine in DMZ should/must perform active checks of
> all our equipment in the same DMZ, the others hosts only send passive data.
> The DMZ machine cannot perform ACTIVE checks on the services monitored
> by 1 or more of the remote machines.
> So, this is why there is a problem when the central server does not
> perform it's own checks.
>  
> I've been testing around with repear frequencies on the central server
> because i saw reaper frequency exceeded messages in the nagios.debug
> (-1) output.
> These now stay away but the result is still te same.
> Also lowered the frequency of all template related check_interval's on
> the OCP_daemon remote machine but that does not help either.
>  
> If you have any more suggestions, please let me know.
>  
> Regards,
>  
> Michel
> ------------------------------------------------------------------------
> *Van:* Craig Stewart [mailto:Craig.Stewart at corp.xplornet.com]
> *Verzonden:* di 16-8-2011 21:47
> *Aan:* Nagios Users List
> *CC:* michel.vdv at wxs.nl
> *Onderwerp:* Re: [Nagios-users] Problem with OCP_daemon in distributes
> environment
> 
> Michel,
> 
> I just did the same thing for my set up and I didn't see this issue.
> That being said, I don't *want* the central master to execute service
> checks at all unless it's stale.
> 
> What may be happening is that the remote passive check may be getting
> inserted while the system is waiting to execute the next check.  This is
> probably resetting the clock as it were and the count down starts over.
> 
> For example:
> 
> - NOW is an arbitrary point in time.
> - Nagios schedules the check to be executed at NOW + 5 min. (recheck
> interval)
> - The passive check comes in at NOW + 3 min.  Nagios resets the clock to
> NOW + 3 min + check interval.
> 
> If the remote is submitting checks at a frequency less than the
> central's recheck interval, I can see this happening.  The clock never
> runs out, unless the remote system doesn't submit a check.
> 
> A couple things to check are the check intervals on both the central and
> the probe, and if you can tolerate the  hit, shut down the probe and see
> if the central server starts executing checks on it's own.
> 
> I may be out in left field as well.
> 
> Cheers!
> 
> Craig
> --
> Craig Stewart
> Systems Integration Analyst
> Craig.Stewart at corp.xplornet.com
> Xplornet - Broadband, Everywhere
> 
> On 08/16/2011 04:22 PM, michel.vdv at wxs.nl wrote:
>> Dear readers,
>> 
>> I have a strange problem related to the use of OCP_daemon.
>> I've implemented this today on a "remote" nagios machine responsible for
>> monitoring our LAN hosts.
>> Until now all messages and performance data was sent from that machine
>> to our Central Nagios machine via obsess_over_hosts and
>> obsess_over_services.
>> But because a lot of services on the remote host combined with relative
>> short check_interval periods caused high service and host check
>> latencies i've started looking for an alternative and read about
> OCP_daemon.
>> I followed the install instructions and sending data via OCP_daemon
>> works fine and very fast, also the remote nagios machine's latencies
>> stay low.
>> However, the central server stays processing all passive service and
>> host check results (also from other send_nsca based hosts) but no longer
>> executes it's own ACTIVE checks.
>> Is soon as i stop nagios on the remote monitor and restart nagios on the
>> central server it starts executing ACTIVE checks again.
>> The load on both servers remained about the same since OCP_daemon and
>> the only thing i noticed is that the number of buffers/slots used for
>> the external command file (nagios.cmd) on the central server
>> reaches rather higher values than before but no more than 30 - 40% of
>> the available 4096 slots.
>> 
>> Please advice me.
>> 
>> Michel
>> 
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is
>> believed to be clean.
> 
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is
> believed to be clean.

------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list