Problems with distributed monitoring

Marcel mitsuto at gmail.com
Fri May 14 20:39:06 CEST 2010


Make sure you understand the underlying design of distributed nagios setup.
Obssessive-Compulsive (ocsp) command need to be correctly configurated on
nagios slave.

2010/5/14 Sérgio Afonso <sergioafonsojr at gmail.com>

> Hello Marcel,
>
> My nagios version is 3.2.0.  About my command_check_interval I
> couldn't understand very well what you mean. My command_check_interval
> is set to -1
>
> Rgs,
>
> Sérgio.
>
> On Fri, May 14, 2010 at 1:50 PM, Marcel <mitsuto at gmail.com> wrote:
> > With only 150 services, it should not delay that much nor stops execution
> of
> > the main process.
> > Please check you main nagios.cfg file and look for
> command_check_interval,
> > if the value attributed to that variable isn't "-1" then there is your
> > problem.
> >
> > Also, which nagios version are you running?
> >
> >
> > On Fri, May 14, 2010 at 2:28 PM, Trisha Hoang <trisha at rockyou.com>
> wrote:
> >>
> >> Hi Sergio,
> >> Some of the directives I found helpful for our MASTER server are listed
> >> below.
> >>
> >> Since status.dat and nagios.cmd are disk bound, put them on ramdisk will
> >> be faster.
> >> status_file=/mnt/ramdisk/status.dat
> >> command_file=/mnt/ramdisk/nagios.cmd
> >>
> >> I don't think aggressive_host_checking is needed as nagios checks for
> host
> >> when a service is in error anyway.
> >> use_aggressive_host_checking=0
> >> check_host_freshness=0
> >>
> >> Service freshness is important as the MASTER tends to process passive
> >> checks much slower so the services may go stale. However, since our
> checks
> >> are 5 min interval, having the MASTER wait for the next round of check
> is
> >> fine.
> >> check_service_freshness=1
> >> service_freshness_check_interval=420
> >>
> >> We use nagios-3.2.1 and I think these directives are still experimental
> >> but they seem to help. You will see defunct nagios processes that come
> and
> >> go. I think it's caused by child forked once instead of twice so one
> gets
> >> killed (my theory), but again, it seems to be running ok.
> >> use_large_installation_tweaks=0
> >> child_processes_fork_twice=0
> >>
> >> Our MASTER receives ~7000 passive checks from the SLAVE but it could
> only
> >> process max ~5000 passive checks per 5 min. The latency is about <10
> secs.
> >> For the rest, the MASTER actively checks them. If you or someone knows a
> way
> >> to improve passive check processing, that will be great.
> >>
> >> Also, in our setup, we don't use NSCA. The slaves have
> >> ocsp_command=send_service_check where this command inserts the checks
> into a
> >> file that gets sent every 5 sec to the master. On the master, there's a
> >> script that opens this file and inserts the lines directly into the
> >> nagios.cmd pipe every 5 sec.
> >>
> >> Trisha
> >>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >>
> >>
> >> _______________________________________________
> >> Nagios-users mailing list
> >> Nagios-users at lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/nagios-users
> >> ::: Please include Nagios version, plugin version (-v) and OS when
> >> reporting any issue.
> >> ::: Messages without supporting info will risk being sent to /dev/null
> >
> >
> >
> ------------------------------------------------------------------------------
> >
> >
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS when
> reporting
> > any issue.
> > ::: Messages without supporting info will risk being sent to /dev/null
> >
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100514/a21038ee/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------

-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list