Problem with high latencies after going distributed

Frost, Mark {PBG} mark.frost1 at pepsi.com
Thu Jan 24 04:47:37 CET 2008


 

>-----Original Message-----
>From: Thomas Guyot-Sionnest [mailto:dermoth at aei.ca] 
>Sent: Wednesday, January 23, 2008 10:24 PM
>To: Frost, Mark {PBG}
>Cc: Nagios Users
>Subject: Re: [Nagios-users] Problem with high latencies after 
>going distributed
>
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>On 23/01/08 10:41 AM, Frost, Mark {PBG} wrote:
>>  
>> 
>> 
>> This seems like a serious impediment to normal functioning of a
>> distributed
>> Nagios setup.  That is, in order to make all but the smallest
>> distributed node setups
>> work you have to come up with this roll-your-own setup.  I 
>haven't read
>> the
>> "new in Nagios 3" doc in a while.  Is this something that is fixed in
>> some way there?
>
>I don't think so. I remember an email from Ton Voon some time 
>ago asking
>Ethan why the oc[hs]p command are run serially but I don't recall if
>there was a reply or what else was said...
>
>I believe it's either documented in the official doc or some
>user-contributed doc that the oc[hs]p commands should return as soon as
>possible. It's usually done in Perl using a fork:
>
>if (fork==0) {
>  # send stuff via NSCA here...
>}
>exit(0);
>

I guess what I'm thinking here is that unlike a custom check, I can't
see most
people needing to customize the passive check result process.  All the
solutions I've
seen seem to include a named pipe.  So why couldn't Nagios support
making the ocsp/ochp
"commands" just named pipes instead.   Then instead of a standalone
send_nsca binary,
have the nsca source build a send_nscaD binary (I'm making that up) that
reads from the
pipe that nagios writes to and sends directly to nsca on the server.
That sort of
eliminates the middle-man in the process of reporting passive check
results.

I know, I know, I'm free to write the send_nscaD.c code and send it to
Ethan :-)

>Although it may work for you, that solution will not scale as 
>well as my
>OCP_Daemon because running the perl script to fork takes some 
>time. Just
>as an example, running the following command on my Nagios server takes
>between 1 and 2.5 second:
>
>$ time for ((i=0; i<100; i++)); do perl -e 'if (fork==0) { open (CAT,
>"|/bin/cat >/dev/null") or die $!; print CAT
>"$ARGV[0]\t$ARGV[1]\t$ARGV[2]\t$ARGV[3]\n"; close (CAT); }' 
>host service
>status result; done
>
>That's obviously not counting the time it takes for Nagios to process
>the macros, set the environment, etc. Send_nsca will also add much more
>load to the system than a "cat >/dev/null". On any system running near
>Nagios limitations that additional time will just be too much.
>
>I don't know how many people use OCP_Daemon but I had reports 
>from a few
>people that greatly reduced their latency using it and I 
>haven't had any
>bug reported yet. I believe it's well documented as well, but If you
>have any feedback on this I'll be happy to get it.
>
>
>Thomas

I'm playing with it a bit and have so far had good results.  I'll have
some
feedback after I've played with it a bit longer.  Thanks for writing it
and
writing up the docs for it as well!

Mark

>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.4.6 (GNU/Linux)
>Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
>iD8DBQFHmATN6dZ+Kt5BchYRApGBAJ4jvi3bJJYONRVUgebEa2WBYJuUFgCeNN+j
>tfBA9lbjORu63kPbg1aMpOo=
>=sNiQ
>-----END PGP SIGNATURE-----
>

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list