accommodate 7426 passive checks on nagios 3.0.3

Mark Young myoung at nagios.org
Mon Nov 17 17:14:11 CET 2008


On Nov 17, 2008, at 9:40 AM, Marc Ismael wrote:

> Hi Mark,
>
> Thanks for the response. I just realized that I've opened the old  
> templates.cfg file (apologies, im an 1d1ot).
> My freshness_treshhold is actually 600 (10mins), but still im seeing  
> this issue.

Then it may be a problem that your system is unable to process all the  
results in the time you set (max_check_result_reaper_time = 90).   
Maybe try to double the time to 180 or even more.  This tells me that  
your system would be really bogged down and that you will require  
additional debugging.  Passive checks should not be that demanding on  
your system.

>
> template.cfg snippet
> ===============
> define service{
>         name                            e_passive
>         active_checks_enabled           0
>         passive_checks_enabled          1
>         parallelize_check               0
>         obsess_over_service             0
>         check_freshness                 1
>         freshness_threshold             600
>         check_command                   check_stale_passive
>         notifications_enabled           1
>         event_handler_enabled           0
>         flap_detection_enabled          1
>         failure_prediction_enabled      0
>         process_perf_data               0
>         retain_status_information       1
>         retain_nonstatus_information    1
>         is_volatile                     0
>         check_period                    e_reboots
>         max_check_attempts              1
>         normal_check_interval           1
>         retry_check_interval            1
>         contact_groups                  e_server_team
>         notification_options            c
>         notification_interval           0
>         notification_period             e_reboots
>         register                        0
>         }
> Thanks,
> Marc
>
> On Mon, Nov 17, 2008 at 11:25 PM, Mark Young <myoung at nagios.org>  
> wrote:
>>
>> On Nov 16, 2008, at 9:16 PM, Marc Ismael wrote:
>>>
>>> Hi all,
>>>
>>> I have 7426 incoming passive checks on my nagios server. I turned  
>>> on freshness check at every 60 seconds,  
>>> check_result_reaper_frequency at 60 and  
>>> max_check_result_reaper_time at 90. I am getting a lot of stale  
>>> passive results. Anything off with these settings, or the rest of  
>>> my config settings?
>>

What made you first try these settings?  Trial and error?  Normally we  
try to set the reaper_frequency down to under 30 seconds (such as 5  
seconds) and the max reaper_time well under 60 (such as 30 seconds).  
But looking in your other config snippets you have the  
'service_freshness_check_interval=360' and  
'host_freshness_check_interval=60'.  So you may be confusing some of  
the settings.

<example "normal" snippet from main config>
...
check_result_reaper_frequency=5
max_check_result_reaper_time=30
service_freshness_check_interval=60
host_freshness_check_interval=60
additional_freshness_latency=15
...


To really debug this problem we need to know a few factors.  The main  
one for me is the number of checks you are piling up every 60  
seconds.  And what is the time it takes to process these normally.   
Also, if you are taxing your system, it is important to see what the  
system load and disk IO are doing.  Also the relative information for  
your environment... What Nagios version are you running?  What  
distribution?

You may want to check out the relative docs for passive checks and  
freshness as I had to do when looking into you problem.
http://nagios.sourceforge.net/docs/3_0/configmain.html
http://nagios.sourceforge.net/docs/3_0/freshness.html
http://nagios.sourceforge.net/docs/3_0/passivechecks.html



>>
>> You have some interesting choices with your settings.  If you have  
>> the freshness and the reaper_frequency set to the same time of 60  
>> seconds.The freshness threshold is the time in which Nagios should  
>> consider a check to be stale.  This is done by looking at the last  
>> check's timestamp and comparing it to the threshold you set (60  
>> seconds).  While the reaper_frequency is the frequency in which  
>> Nagios will take all the collected passive results and process  
>> them, which you also have set at 60 seconds.  You are setting up a  
>> condition where most of your checks are running close to stale and,  
>> given any processing time, with give you many stale results.
>>
>> Depending on your how powerful your system is, you will need to  
>> either increase your freshness threshold (try 300 seconds),  
>> decrease the reaper frequency, or do both.  You may have to play  
>> around with the exact settings that will work with your system and  
>> the number of checks you are performing.  I would recommend you  
>> start with increasing the freshness threshold.


Perhaps some others that are dealing with more passive checks than I  
can help you out with better base numbers to work off of.

Good luck!

Mark Young
___
Nagios Enterprises, LLC
Web:    www.nagios.com


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list