[PATCH] common/macros.c:2185:grab_standard_servicegroup_macro() speed up & Service check execution problem report

Thomas Guyot-Sionnest dermoth at aei.ca
Tue Jan 4 08:43:10 CET 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11-01-03 10:37 PM, Stephane LAPIE wrote:
> Hello list,
> 
> I apologize in advance should this topic have already been raised in the
> past.
> 
> 
> 
> We make fairly intensive use of Nagios at our company (around 1700
> machines, for 26000 services), using a cluster of OpenBSD machines.
> 
> We do distribution using NSCA (a re-made Ruby implementation of the
> server), and external handler programs to offload sending the packets
> (which leaves to Nagios the sole task of writing results to a named pipe).
> 
> While tuning my configuration and creating several service groups
> (simply for display purposes), I stumbled upon several problems :
> 
> 1) An actual bug : Beyond a certain number of members, Nagios simply
> fumbles at handling service checks for affected services within its
> child processes, and then reports the failure with a very misleading
> error message : "Warning : Return code 127 was out of bounds. Make sure
> the plugin you're trying to run actually exists". (when the EXACT same
> configuration, minus service groups, works perfectly fine)
> 
> I haven't pinpointed the final cause for this one, and I think I have
> simply found a triggering case, but this seems to hint at a deeper
> problem in the check handling. (Additionally, the message associated
> with code 127 should be made more accurate, as I spent several days
> figuring if any combination of funny PATH environment variables and such
> could prevent the execution of my scripts)
> 
> As a temporary fix for my setup, I removed the related servicegroups
> entries, and I am running fine for now, but I am hoping this will be
> fixed in a future version, as this is really more than just a small
> annoyance. :(
[...]
> Further about the aforementioned bug :
> 
> I somehow have a value at which (and probably beyond which) the bug can
> be reproduced (but it does not seem to be the direct cause). The
> "symptoms" can be tracked down to MACRO_SERVICEGROUPMEMBERS generating a
> 338084 bytes string (35 services, assigned to 294 machines via templates).

I believe this bug might have to do with the actual command line length
passed to popen. Is it possible somehow this macro ends up on the
command line?

- -- 
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0iz4YACgkQ6dZ+Kt5BchaPUwCgl2FosXu8j/pFY9V0BUgNxG8O
YrcAnip2qkAbZ8p1LY2zCWpBMm0GiyrE
=bJBa
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl




More information about the Developers mailing list