Nagios is ignoring the retry_interval setting

FTL Nagios ftlnagios at gmail.com
Fri Dec 7 11:56:16 CET 2012


Hi,

Apologies for the delay, been very busy with other things.

Right I have put Nagios into Debug this morning and rerun the tests.

I let it get a couple of successful pings to the server then pulled the
network cable from it.

Behaviour is completely different this morning!!!!

The host check is behaving now and rechecking every 3 minutes as its told
too in the host template. I got my text and email alert to say the host was
down when I expected it!

But now its the service check that is running every 1 minute now, which its
not told too when in problem state.

My service template clearly states  when in problem state to retry_interval
of 3 minutes:

define service{
    name                 service-server        ; The name of this host
template (used above in the checks)
    check_period             server_24x7        ; Server are monitored at
all times
    check_interval             1                ; Server are checked every 1
minute when in OK state
    retry_interval             3                ; Server checked every 3
minutes if in problem state
    max_check_attempts         3                ; Server checked 3 times to
determine if its Up or Down state
    notification_period         server_24x7        ; Emails and Text are
sent out any time of day
    notification_interval         3                ; Resend Notifications
every 3 minutes
    notification_options         c,r            ; Only send alerts for
servers in CRITICAL or RECOVERY state
    notifications_enabled         0                ; Notifications are
disabled
    contact_groups             servers email, servers sms    ; Alerts sent
to contacts in these groups
    event_handler_enabled         1                ; Host event handler is
enabled
    process_perf_data         1                ; Performace data is
processed
    retain_status_information    1                ; Status Info is kept
between server restarts
    retain_nonstatus_information 1                ; Non-Status information
is kept between server restarts
    passive_checks_enabled         0                ; Passive Checks are
disabled
    obsess_over_service         0                 ; We do not obsess over
the server if in problem state
    check_freshness              0                 ; We do not check this
server for freshness
    flap_detection_enabled         0                ; Flap Detection is
disabled
    failure_prediction_enabled   0                ; We will wait for it to
actually fail thankyou!!
    }

And even though its checking every minute, it went straight to Hard State on
the first check it detected it down and has stayed on check 1/3 Hard State
throughout


I really don't understand what is happening here.

The only thing different between this setup and my old nagios box is the
version - old box was 3.31, this new server is 3.4.1, I am using the same
config files that worked fine before.

Here is the debug logfiles of the above testing.

http://dl.dropbox.com/u/895609/nagios.debug1
http://dl.dropbox.com/u/895609/nagios.debug2


If you see anything please let me know, im getting angry with all the
alerts!!! :-)

Thankyou









-----Original Message-----
From: Giorgio Zarrelli [mailto:zarrelli at linux.it] 
Sent: 29 November 2012 19:24
To: Nagios Users List
Subject: Re: [Nagios-users] Nagios is ignoring the retry_interval setting

Hi,

do not seee anything wrong. Could you set debug=-1

repeat the problem and put the log online?

Giorgio

<quota chi="Andrew Thompson">
> Hi Georgio,
>
> The whole test cfg I am using to try troubleshoot this can be found at:
>
> http://dl.dropbox.com/u/895609/test.cfg
>
> This is a direct copy of my main servers config but with the rest of 
> the servers and some templates for other server checks taken out
>
>
>
> Kind Regards
> Andrew
>
> From: Andrew Thompson
> Sent: 29 November 2012 16:11
> To: nagios-users at lists.sourceforge.net
> Subject: Nagios is ignoring the retry_interval setting
>
> Hi,
>
> My nagios box has decided to stop listening to the retry_interval 
> entry in my templates.
>
> My server template reads:
>
> define host{
>      name                       host-server
>      check_period              server_24x7
>      check_interval            1
>      retry_interval            3
>      max_check_attempts        3
>      notification_period       server_24x7
>      notification_interval      3
>      notification_options      d,r
>      notifications_enabled      1
>      contact_groups            servers email, servers sms
>      event_handler_enabled      1
>      process_perf_data         1
>      retain_status_information    1
>      retain_nonstatus_information 1
>      passive_checks_enabled          0
>      obsess_over_host          0
>      check_freshness          0
>      flap_detection_enabled          0
>      failure_prediction_enabled   0
>      }
>
> Now this is what happens:
>
>
> *         Server goes down at 1pm.
>
> *         I check the next scheduled check and it clearly states 1.03pm
>
> *         But at 1.01pm it checks again and then spits out an email and
> text message saying the server is down.
>
> Completely ignoring the retry_interval setting!!!
>
> Id expect from the above:
>
>
> *         1pm server goes down
>
> *         1.03pm check 2 is done
>
> *         1.06pm check 3 is done and determined hard state.
>
> *         At 1.06pm the notification should be sent out.
>
> Why is this, is something in my config wrong?
>
> Ubuntu 12.04 desktop and Nagios 3.4.1
>
> Thanks
>
>
> ----------------------------------------------------------------------
> -------- Keep yourself connected to Go Parallel:
> VERIFY Test and improve your parallel project with help from experts 
> and peers.
> http://goparallel.sourceforge.net_____________________________________
> __________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when 
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null



----------------------------------------------------------------------------
--
Keep yourself connected to Go Parallel: 
VERIFY Test and improve your parallel project with help from experts and
peers. http://goparallel.sourceforge.net
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue. 
::: Messages without supporting info will risk being sent to /dev/null


------------------------------------------------------------------------------
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list