Nagios scheduling question

william(at)elan.net william at elan.net
Wed Apr 4 03:23:18 CEST 2007


Ok. So I ran it with debug enabled today. Looks like maximum number
of services nagios is willing to run at the same time is 90. How can
I control this number and force more then 90 to run?

$ grep "Current/Max Outstanding Service Checks"/var/log/nagios/nagios_debug.log | sort -r | uniq | more
Current/Max Outstanding Service Checks: 90/0
Current/Max Outstanding Service Checks: 9/0
Current/Max Outstanding Service Checks: 89/0
Current/Max Outstanding Service Checks: 88/0
Current/Max Outstanding Service Checks: 87/0
Current/Max Outstanding Service Checks: 86/0
Current/Max Outstanding Service Checks: 85/0
Current/Max Outstanding Service Checks: 84/0
Current/Max Outstanding Service Checks: 83/0
...

BTW - Host checks are disabled but they still show up in the "scheduling 
queue" although checks are not executed. I'm unsure if this should be 
considered a bug and if it is can it cause service check limits or not?

On Thu, 29 Mar 2007, william(at)elan.net wrote:

> On Wed, 28 Mar 2007, Ethan Galstad wrote:
>
>> Make sure the max_concurrent_checks is high enough.  Running nagios with
>> the -s option should throw a warning if it isn't.
>
> Its set to 0 so should be unlimited. This was the first thing I checked.
>
>> You can also compile the Nagios daemon with some debugging option for
>> printing information on scheduled tasks.  Run the configure script like
>> such:
>> 
>> ./configure --enable-DEBUG3
>> 
>> The run Nagios as a foreground process and pipe the output to a file
>> that you can examine for potential problem messages.
>
> I compiled new binary for 2.8 and one with debug options. I have to
> go through certain procedure of testing binaries outside production 
> environment before I can try it. Next time I'll be there to deal with
> it is Tuesday so I'll know more then. I'm also trying to convince them
> to try 3.0 as it has number of improvements for larger operations and
> no parallellism limitations for host checks.
>
>> william(at)elan.net wrote:
>>> Thanks for the pointer, I heard about DNX but did not look at it closely
>>> yet, I'll take a look. However I've bad feeling the company I'm setting
>>> it for would not like it because its listed as "alpha" software; also
>>> DNX is more for "distributed" monitoring where as there everything runs
>>> on same box.
>>> 
>>> I'm more interested in learning how nagios decides how many processes
>>> it can run based on system load so as to try to tune it to have more
>>> processes & service checks done simultaneously.
>>> 
>>> On Tue, 27 Mar 2007 bobi at netshel.net wrote:
>>> 
>>>> Have you checked out the Distributed Nagios eXecutive (DNX) at Source 
>>>> Forge?
>>>> 
>>>> The purpose of this project was to increase service check capacity and
>>>> throughput by creating a multi-threaded and distributed service check
>>>> architecture around Nagios (it's based on Nagios 2.7)
>>>> 
>>>> Bob
>>>> 
>>>>> I have an issue with one of the client nagios installations where
>>>>> nagios is executing checks too rarely and all the options to tune it
>>>>> I've tried did not help. Currently they have 2500 services on about
>>>>> 120 hosts and nagios seems to execute checks about every 8-9 minutes
>>>>> where as what is needed is about every 3-4 minutes. I've tried manual
>>>>> tuning with setting 'service_inter_check_delay_method' (I set it to
>>>>> 0.05 which is even more aggressively then needed, but it did cause
>>>>> slight  improvement over 's') and 'service_interleave_factor' (tried
>>>>> setting it to '1' and '2' but results were worse). Now as far as I
>>>>> can tell the issue is not scheduling (which nagios does correctly
>>>>> within range I want) but time of service check execution which is on
>>>>> average 1.5 seconds and nagios does not want to run more concurrent
>>>>> processes.
>>>>> 
>>>>> Now the question I have is how to best deal and tune it both using
>>>>> current config options and assuming that if I'm pointed to right
>>>>> direction that I'd be willing to look at source code and see if
>>>>> it can be improved in some way.
>>>>> 
>>>>> On a related note I was looking at the source code and before
>>>>> I always thought nagios was more of multi-threaded application
>>>>> but based on what I can see (utils.c) it does multi-process
>>>>> execution creating new process for each service check (my_system
>>>>> function). Is there any interest in improving it? What I'm
>>>>> particularly interesting is having several worker threads
>>>>> capable of executing embedded perl plugins and without going
>>>>> through creation of new process every time.
>>>>> 
>>>>> --
>>>>> William Leibzon
>>>>> Elan Networks
>>>>> william at elan.net

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list