Nagios Check Time Issue

Rus Hughes russell.hughes at gmail.com
Thu Feb 24 14:15:55 CET 2011


The max_check_attempts for all services is configured to be 4 an
example service configuration is :

define service {
    retry_check_interval           1
    contact_groups                 admins
    check_command                  check_nrpe!check_swap
    check_period                   24x7
    host_name                      somehostrarrarrar
    max_check_attempts             4
    normal_check_interval          1
    notification_period            24x7
    notification_interval          960
    ## --PUPPET_NAME-- (called '_naginator_name' in the manifest)
          check_swap_vfantprov2
    use                            generic-service
    service_description            swap
}

define service{
        name                            generic-service     ; The
'name' of this service template
        active_checks_enabled           1               ; Active
service checks are enabled
        passive_checks_enabled          1               ; Passive
service checks are enabled/accepted
        parallelize_check               1               ; Active
service checks should be parallelized (disabling this can lead to
major performance problems)
        obsess_over_service             1               ; We should
obsess over this service (if necessary)
        check_freshness                 0               ; Default is
to NOT check service 'freshness'
        notifications_enabled           1               ; Service
notifications are enabled
        event_handler_enabled           1               ; Service
event handler is enabled
        flap_detection_enabled          1               ; Flap
detection is enabled
        failure_prediction_enabled      1               ; Failure
prediction is enabled
        process_perf_data               1               ; Process
performance data
        retain_status_information       1               ; Retain
status information across program restarts
        retain_nonstatus_information    1               ; Retain
non-status information across program restarts
        is_volatile                     0               ; The service
is not volatile
        check_period                    24x7            ; The service
can be checked at any time of the day
        max_check_attempts              3           ; Re-check the
service up to 3 times in order to determine its final (hard) state
        normal_check_interval           1           ; Check the
service every 10 minutes under normal conditions
        retry_check_interval            1           ; Re-check the
service every two minutes until a hard state can be determined
        contact_groups                  admins          ;
Notifications get sent out to everyone in the 'admins' group
    notification_options        w,u,c,r         ; Send notifications
about warning, unknown, critical, and recovery events
        notification_interval           60          ; Re-notify about
service problems every hour
        notification_period             24x7            ;
Notifications can be sent out at any time
         register                        0              ; DONT
REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }


On Thu, Feb 24, 2011 at 12:38 PM, Yueh-Hung Liu <yuehung.liu at gmail.com> wrote:
> how many attempts do you configure before a non-OK state becomes hard?
>
>
> On Thu, Feb 24, 2011 at 7:24 PM, Rus Hughes <russell.hughes at gmail.com> wrote:
>> Hi,
>>
>> I've been investigating an issue we have with Nagios Core 3.2.0 that
>> we're running on Redhat 5.4. We're being a bit ruthless and have
>> configured retry_check_interval and normal_check_interval to both be 1
>> on all hosts and services (20 hosts and 293 services).
>>
>> We're seeing massive delays between checks getting run for services
>> flagged as DOWN, even though the box has little load (0.2)
>>
>> Looking at the extended information page for a service that was DOWN
>> we're seeing events like this occur :
>>
>> At 10:40 a service that was DOWN had a check that was scheduled to run
>> at 10:25 but still hadn't run
>> At 11:02 I refreshed the page for the Nagios check
>> Nagios had run the check and changed the service state to UP
>> The last check time was set to be 10:25 though
>> Even though the check actually ran between 10:40 and 11:02
>>
>> Does anyone know why
>>
>> 1) Nagios is being 'lazy' when rechecking services marked as DOWN ?
>> We've configured retry_check_interval to 1 for all checks and theres
>> little load on the box and at most only about 4 Nagios processes
>> running at a time, so there are resources free to be used ..
>>
>> 2) Why Nagios is marking the Last Check Time to be the predicted Next
>> Scheduled Check time, even though the real time the check one is way
>> after? (Bug in Nagios?)
>>
>> Thanks,
>>
>> Rus
>>
>> ------------------------------------------------------------------------------
>> Free Software Download: Index, Search & Analyze Logs and other IT data in
>> Real-Time with Splunk. Collect, index and harness all the fast moving IT data
>> generated by your applications, servers and devices whether physical, virtual
>> or in the cloud. Deliver compliance at lower cost and gain new business
>> insights. http://p.sf.net/sfu/splunk-dev2dev
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
>> ::: Messages without supporting info will risk being sent to /dev/null
>>
>
> ------------------------------------------------------------------------------
> Free Software Download: Index, Search & Analyze Logs and other IT data in
> Real-Time with Splunk. Collect, index and harness all the fast moving IT data
> generated by your applications, servers and devices whether physical, virtual
> or in the cloud. Deliver compliance at lower cost and gain new business
> insights. http://p.sf.net/sfu/splunk-dev2dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>

------------------------------------------------------------------------------
Free Software Download: Index, Search & Analyze Logs and other IT data in 
Real-Time with Splunk. Collect, index and harness all the fast moving IT data 
generated by your applications, servers and devices whether physical, virtual
or in the cloud. Deliver compliance at lower cost and gain new business 
insights. http://p.sf.net/sfu/splunk-dev2dev 
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list