Nagios 3.1.1 eats cpu like mad

Ricardo Maraschini ricardo.maraschini at opservices.com.br
Mon Aug 10 14:55:23 CEST 2009


Hi,

----- "Hiren Patel" <hir3npatel at gmail.com> escreveu:
> if you have minimal configs that is able to reproduce this, could you
> 
> post it to me, I'll gladly try have a look at what the possible cause
> is.

I couldn't simulate the problem with a static configuration, so me try to explain how I simulate the problem changing the timeperiod configuration:

0. Create a service with active checks enabled scheduled to check every 5 minutes

1. Associate this service with a timeperiod(initially it can be 24x7)

2. Wait until the service check and reschedule occur
   Lets say that the check occurs at 10:00AM and the next check got scheduled to 10:05AM

3. Stop nagios

4. Change your timeperiod configuration to invalidate the next service check:
   Using the above example, you change the service timeperiod configuration to check only from 10:07AM to 24:00. The important thing to simulate the problem is that the next service schedule check(10:10AM) remains valid.

5. Start nagios

6. Wait until the previous scheduled service(10:05AM) occurs.

The behaviour will change acording to your nagios version. On previous versions the service is scheduled to next year, on the latest stable release it is scheduled to next week and a message is print in log files.

Below you can see an email sent by me in April 2nd about the same issue, it can be useful.
Good luck, if you need any other info, please let me know.

-rm


------

Hi,

----- "Hendrik Baecker" <andurin at process-zero.de> escreveu:
> Are you shure that the problem hits us, when the calculation is
> 'inside'
>  a timeperiod?

If I send a valid timestamp to get_next_valid_time it will return the same valid time.
Take a look to base/checks.c at line 280:

get_next_valid_time(preferred_time,&next_valid_time,svc->check_period_ptr);

Suppose preferred_time is a valid timestamp within svc->check_period_ptr.
What is supposed to return from this call?

I supposed the return is: next_valid_time = preferred_time.

Look, some lines below we found:

if (time_is_valid==FALSE && next_valid_time==preferred_time) {
     //schedule this check for next year
}

time_is_valid = FALSE can be achieved with a timeperiod change.
Lets try with an example:

service X, timeperiod 24x7, check_interval is 5 minutes;
Nagios is running and the next check for X is scheduled to 11:48.
So, somebody change the timeperiod of service X to 11:50-23:00.

When the next check runs, time_is_valid will become FALSE because 11:48 is OUT of new timeperiod.
Prefered time is calculed to 11:53 and we run get_next_valid_time passing prefered_time, that returns 11:53.

The code below is self explained:
if (time_is_valid==FALSE && next_valid_time==preferred_time) {
     //schedule this check for next year
}

I propose to use

get_next_valid_time(current_time,&next_valid_time,svc->check_period_ptr);

not

get_next_valid_time(preferred_time,&next_valid_time,svc->check_period_ptr);

I will do this patch against development release and send again to list.

-rm

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july




More information about the Developers mailing list