[Fwd: Hundreds of passive checks in a second]

Aaron K. Moore amoore at dekalbmemorial.com
Mon Sep 6 20:11:30 CEST 2004


 
You could buffer the results within the program that is doing the check, and then have it watch the size of the pipe.  It could then send responses until the pipe is full, and then back off until the pipe is sufficiently empty to start sending them again.
 
Just my 2 cents.
 
Aaron

________________________________

Message: 3
From: Sean Dilda <agrajag at dragaera.net>
To: nagios-devel at lists.sourceforge.net
Date: Mon, 06 Sep 2004 09:57:41 -0400
Subject: [Nagios-devel] [Fwd: Hundreds of passive checks in a second]


--=-4arZQmye5HI9Ja7yPgO1
Content-Type: text/plain
Content-Transfer-Encoding: 7bit

Someone on the nagios-users list indicated this email probably had some
reference on this list, so I'm sending it here as well.

--=-4arZQmye5HI9Ja7yPgO1
Content-Disposition: inline
Content-Description: Forwarded message - Hundreds of passive checks in a
        second
Content-Type: message/rfc822

Subject: Hundreds of passive checks in a second
From: Sean Dilda <agrajag at dragaera.net>
To: nagios-users at lists.sourceforge.net
Content-Type: text/plain
Message-Id: <1094245258.4398.105.camel at pel>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.4.5 (1.4.5-7)
Date: Fri, 03 Sep 2004 17:00:58 -0400
Content-Transfer-Encoding: 7bit

I have a plugin that I'm running for a few hundred hosts that checks
what SGE (Sun GridEngine) thinks the status of the node is.  The thing
is, these checks don't hit the actual nodes.  Instead they all interact
with the main SGE master daemon.  This means that all these checks and
quickly run up the load if they're run close enough.

In order to keep from running the load up I was thinking about setting
up a cron job that would do a single query against the SGE master daemon
for all the hosts (only slightly more overhead than querying about a
single host) and running all the results to nagios as passive checks.
The problem is that nagios uses a named pipe for the command file, and
the buffer on linux is only 4k.  So I can't write all those checks at
once as it would overrun the 4k buffer.

I looked and there's an option for nagios to check the file as often as
possible, but I read the code and found out that's a lie.  I would think
that using select(2) (as opposed to sleep(3)) would really allow nagios
to check as often as possible.

Does anyone have any ideas of how to work around this?  Or has anyone
already tried replacing that sleep() call with a select call()?

Thanks,


Sean


-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 5413 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20040906/51cc0728/attachment.bin>


More information about the Developers mailing list