More passive problems

Dan Rich drich at employees.org
Tue May 20 19:40:14 CEST 2003


Thanks to everyone who offered suggestions on this problem, I think I have it
licked.  As a few of you said, most if it comes down to the
command_check_interval.  Setting it to 1 seems to have taken care of most of
the problems.  I also dropped the service_reaper_frequency down to 5 and
turned off external command logging (just to take away the I/O).  I still see
occasional high loads, but they tend to be in the 15-20 range rather than in
the 100+ range.

I would have sent this out to everyone sooner, but I wanted to give things a
week or so to "burn in".  We even had an event the other day where most of our
farm when down, and Nagios handled it with flying colors (I wish the same
could be said of the person who got the flock of pages when things started to
come down. :)

Thanks again to everyone who offered suggestions!

Now back to trying to make the current plugin snapshot version of check_disk
stop dumping core on IRIX......


Erik Larkin said:
>
> You might want to look at "command_check_interval" in nagios.cfg.  I have a
> huge number of passive checks coming in as well (via nsca), and I found that
> leaving it at -1, which equates to "check as often as possible", didn't
> check often enough.  Consequently my pipe size would hit the kernel maximum
> (linux redhat) pretty quickly, and all new incoming nsca connections would
> hang until nagios cleared the pipe.  I set the command_check_interval to 1s,
> and it seemed to help a good deal.  You can also increase the allowed
> maximum size of a pipe, but for redhat at least that requires recompiling
> the kernel.
>
> -----Original Message-----
> From: Dan Rich [mailto:drich at employees.org]
> Sent: Saturday, May 10, 2003 12:20 PM
> To: nagios-users at lists.sourceforge.net;
> nagios-devel at lists.sourceforge.net
> Subject: [Nagios-devel] More passive problems
>
>
>
> I am concerned with the way Nagios appears to handle passive alerts.  As I
> mentioned before, I am using a script to monitor a system farm of several
> hundred machines.  Every five minutes this script submits passive checks for
> each machine into Nagios.
>
> Doing the above I frequently see many (for large values of many, sometimes >
> 100) of Nagios processes that are blocked on a lock file in the var
> directory.
>  It looks like this is due to the process that is reading the passive checks
> from the named pipe.  However, this has frequently led to system loads over
> 100, and this morning brought the system to a griding halt.
>
> Does anyone have any idea why the passive checks are causing this problem?
> If
> I stop the cron job that generates the checks and restart Nagios the load
> goes
> away and doesn't return.  By whole point in doing this in the first place
> with
> passive checks was to avoid the load on the system caused by hundreds of
> processes having to run every few minutes, but that seems to have backfired.
>
> --
> Dan Rich <drich at employees.org> |   http://www.employees.org/~drich/
>                                |  "Step up to red alert!"  "Are you sure,
> sir?
>                                |   It means changing the bulb in the
> sign..."
>                                |          - Red Dwarf (BBC)
>
>
>
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux enterprise solutions
> www.enterpriselinuxforum.com
>
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>
>
> -------------------------------------------------------
> Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
> The only event dedicated to issues related to Linux enterprise solutions
> www.enterpriselinuxforum.com
>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
>


-- 
Dan Rich <drich at employees.org> |   http://www.employees.org/~drich/
                               |  "Step up to red alert!"  "Are you sure, sir?
                               |   It means changing the bulb in the sign..."
                               |          - Red Dwarf (BBC)



-------------------------------------------------------
This SF.net email is sponsored by: ObjectStore.
If flattening out C++ or Java code to make your application fit in a
relational database is painful, don't do it! Check out ObjectStore.
Now part of Progress Software. http://www.objectstore.net/sourceforge
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list