Suppress "Max concurrent service checks" messages.

Andreas Ericsson ae at op5.se
Fri Nov 12 22:22:01 CET 2010


On 11/12/2010 06:03 PM, Ton Voon wrote:
> 
> On 12 Nov 2010, at 15:30, Paul M. Dubuc wrote:
> 
>> We're running Nagios 3.2.3 with concurrent service checks set to
>> 40.  We can't
>> go much higher than this due to resource constraints outside of
>> Nagios but
>> we're running 329 services at 5 minute intervals (this is a "load
>> test" of
>> sorts not production load ... yet).  Average execution time/latency
>> is 36/11
>> seconds so we're seeing quite a few messages like this in the Nagios
>> log file:
>>
>> (Informational Message) [11-11-2010 14:55:57] Max concurrent service
>> checks
>> (40) has been reached. Nudging<host>:<service>  by 9 seconds...
>>
>> Is there any way to suppress these messages from being logged?  I
>> don't see an
>> option for logging these in the config file documentation.
> 
> I put those messages in.
> 
> Firstly, 40 doesn't necessarily mean there are 40 concurrent service
> checks running as they may have finished but not been reaped yet (to
> decrement the counter).
> 
> Secondly, if you are getting these messages, then either (1) this
> limit is too low - increase and keep an eye of the load on your nagios
> server; (2) you've got too many checks running - reduce frequencies/
> numbers or setup a slave server.
> 
> The trouble with the way the nudging works is that it hides the fact
> that you have latency issues (as the check is rescheduled to a future
> time). This means nagiostats will not include the additional latency
> time here.
> 
> If someone has a better way of working this out, I'm all ears.
> 

We could use something like pnp4nagios does, and issue a check to make
sure load is below a certain threshold before firing off new checks.
There's a (reasonably) portable way of getting the number of online
CPU's, so we could even make an educated guess at how many checks we
can run to saturate the CPU's while still not running too many checks.

Ofcourse, some checks are more heavy-duty than others. As a first stab
at maintaining reasonable load, we should probably ignore that. At a
later point, we might want to introduce "probably load increase of
running this check" and nudge checks into the future when we're in
danger of load / num_cpus > 0.9 or some other suitable number.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list