Problem with OCP_daemon in distributes environment

michel.vdv at wxs.nl michel.vdv at wxs.nl
Tue Aug 16 22:02:39 CEST 2011


Hello Craig,
 
First of all thanks for the fast response.
Maybe i need to clear things out a bit more to why ACTIVE checks are happening on the central server.
We have a distributed setup with a central machine in DMZ reachable for all remote nagios machines we have out there.
One of those is the LAN machine i mentioned where OCP_daemon was setup today.
The central Nagios machine in DMZ should/must perform active checks of all our equipment in the same DMZ, the others hosts only send passive data.
The DMZ machine cannot perform ACTIVE checks on the services monitored by 1 or more of the remote machines.
So, this is why there is a problem when the central server does not perform it's own checks.
 
I've been testing around with repear frequencies on the central server because i saw reaper frequency exceeded messages in the nagios.debug (-1) output.
These now stay away but the result is still te same.
Also lowered the frequency of all template related check_interval's on the OCP_daemon remote machine but that does not help either.
 
If you have any more suggestions, please let me know.
 
Regards,
 
Michel

________________________________

Van: Craig Stewart [mailto:Craig.Stewart at corp.xplornet.com]
Verzonden: di 16-8-2011 21:47
Aan: Nagios Users List
CC: michel.vdv at wxs.nl
Onderwerp: Re: [Nagios-users] Problem with OCP_daemon in distributes environment



Michel,

I just did the same thing for my set up and I didn't see this issue.
That being said, I don't *want* the central master to execute service
checks at all unless it's stale.

What may be happening is that the remote passive check may be getting
inserted while the system is waiting to execute the next check.  This is
probably resetting the clock as it were and the count down starts over.

For example:

- NOW is an arbitrary point in time.
- Nagios schedules the check to be executed at NOW + 5 min. (recheck
interval)
- The passive check comes in at NOW + 3 min.  Nagios resets the clock to
NOW + 3 min + check interval.

If the remote is submitting checks at a frequency less than the
central's recheck interval, I can see this happening.  The clock never
runs out, unless the remote system doesn't submit a check.

A couple things to check are the check intervals on both the central and
the probe, and if you can tolerate the  hit, shut down the probe and see
if the central server starts executing checks on it's own.

I may be out in left field as well.

Cheers!

Craig
--
Craig Stewart
Systems Integration Analyst
Craig.Stewart at corp.xplornet.com
Xplornet - Broadband, Everywhere

On 08/16/2011 04:22 PM, michel.vdv at wxs.nl wrote:
> Dear readers,
> 
> I have a strange problem related to the use of OCP_daemon.
> I've implemented this today on a "remote" nagios machine responsible for
> monitoring our LAN hosts.
> Until now all messages and performance data was sent from that machine
> to our Central Nagios machine via obsess_over_hosts and
> obsess_over_services.
> But because a lot of services on the remote host combined with relative
> short check_interval periods caused high service and host check
> latencies i've started looking for an alternative and read about OCP_daemon.
> I followed the install instructions and sending data via OCP_daemon
> works fine and very fast, also the remote nagios machine's latencies
> stay low.
> However, the central server stays processing all passive service and
> host check results (also from other send_nsca based hosts) but no longer
> executes it's own ACTIVE checks.
> Is soon as i stop nagios on the remote monitor and restart nagios on the
> central server it starts executing ACTIVE checks again.
> The load on both servers remained about the same since OCP_daemon and
> the only thing i noticed is that the number of buffers/slots used for
> the external command file (nagios.cmd) on the central server
> reaches rather higher values than before but no more than 30 - 40% of
> the available 4096 slots.
> 
> Please advice me.
> 
> Michel
> 
>
> --
> This message has been scanned for viruses and
> dangerous content by *MailScanner* <http://www.mailscanner.info/>, and is
> believed to be clean.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20110816/e0427d53/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system, 
user administration capabilities and model configuration. Take 
the hassle out of deploying and managing Subversion and the 
tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list