Hundreds of passive checks in a second

Dan Stromberg strombrg at dcs.nac.uci.edu
Fri Sep 3 23:44:42 CEST 2004


If you use check_cluster2, then you can write a wrapper around check_tcp
or whatever, pass the exit statuses to check_cluster2 and hence
serialize your checks, while giving a single result back to nagios.  You
might have to increase the plugin timeout pretty high though.  It's
helpful to write the complete status description to a file on the head
node, so you can easily find out specifically what's wrong, once nagios
tells you Something is wrong.

Not precisely the same thing, but we had a similar problem with cron
jobs and NFS servers - we wound up writing a "rand_sleep" program, that
sleeps for a random number of seconds.  So we do rand_sleep 600 &&
/dir/cronjob.  I guess it's sort of similar.

On Fri, 2004-09-03 at 14:00, Sean Dilda wrote:
> I have a plugin that I'm running for a few hundred hosts that checks
> what SGE (Sun GridEngine) thinks the status of the node is.  The thing
> is, these checks don't hit the actual nodes.  Instead they all interact
> with the main SGE master daemon.  This means that all these checks and
> quickly run up the load if they're run close enough.
> 
> In order to keep from running the load up I was thinking about setting
> up a cron job that would do a single query against the SGE master daemon
> for all the hosts (only slightly more overhead than querying about a
> single host) and running all the results to nagios as passive checks. 
> The problem is that nagios uses a named pipe for the command file, and
> the buffer on linux is only 4k.  So I can't write all those checks at
> once as it would overrun the 4k buffer.
> 
> I looked and there's an option for nagios to check the file as often as
> possible, but I read the code and found out that's a lie.  I would think
> that using select(2) (as opposed to sleep(3)) would really allow nagios
> to check as often as possible.
> 
> Does anyone have any ideas of how to work around this?  Or has anyone
> already tried replacing that sleep() call with a select call()?
> 
> Thanks,
> 
> 
> Sean
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <https://www.monitoring-lists.org/archive/users/attachments/20040903/feb928bd/attachment.sig>


More information about the Users mailing list