More passive problems

Dan Rich drich at employees.org
Sat May 10 21:20:04 CEST 2003


I am concerned with the way Nagios appears to handle passive alerts.  As I
mentioned before, I am using a script to monitor a system farm of several
hundred machines.  Every five minutes this script submits passive checks for
each machine into Nagios.

Doing the above I frequently see many (for large values of many, sometimes >
100) of Nagios processes that are blocked on a lock file in the var directory.
 It looks like this is due to the process that is reading the passive checks
from the named pipe.  However, this has frequently led to system loads over
100, and this morning brought the system to a griding halt.

Does anyone have any idea why the passive checks are causing this problem?  If
I stop the cron job that generates the checks and restart Nagios the load goes
away and doesn't return.  By whole point in doing this in the first place with
passive checks was to avoid the load on the system caused by hundreds of
processes having to run every few minutes, but that seems to have backfired.

-- 
Dan Rich <drich at employees.org> |   http://www.employees.org/~drich/
                               |  "Step up to red alert!"  "Are you sure, sir?
                               |   It means changing the bulb in the sign..."
                               |          - Red Dwarf (BBC)



-------------------------------------------------------
Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara
The only event dedicated to issues related to Linux enterprise solutions
www.enterpriselinuxforum.com




More information about the Users mailing list