passive host checks status instability, bug or configuration error?

Artur D'Assumpção artur.dassumpcao at di.com.pt
Sun Apr 10 10:29:51 CEST 2005


I guess I was wrong... I left this workaround testing trought the night 
with notifications on and today i've received a few status changes 
notifications. So, this isn't good working too.

AD

Artur D'Assumpção wrote:

> I am still having the same problem in the same conditions, but I think 
> I've found a workaround that can help you debug this issue:
>
> I've changed the service-is-stale with another plugin that returns 1 
> of 2 states possible depending on the $HOSTSTATE$ macro:
>
> sr-0 plugins-di # cat service_is_stale
> if [ $1 == "DOWN" ]; then
>        /usr/nagios/libexec/check_dummy 2 "Service results not received"
> elif [ $1 == "UP" ]; then
>        /usr/nagios/libexec/check_dummy 3 "Service results not received"
> fi
>
>
> -- 
>
> # default command used when nsca results for a given host's service 
> wheren't received
> define command {
>    command_name    service-is-stale
>    command_line    $USER2$/service_is_stale $HOSTSTATE$
> }
>
> Now, when he service checks get staled the status returned depends the 
> $HOSTSTATE$ macro. In this specific case, why the host status is DOWN 
> the returned value for the staled services is CRITICAL, leading to a 
> not change of the host status.
>
> Anyway, I still have the same question, isn't supposed to ignore all 
> the service checks if the host is stated DOWN?
>
> AD
>
>
>
> Artur D'Assumpção wrote:
>
>> Hi ppl,
>>
>> I'm having very strange instable results in passive host checks, I 
>> don't know if i've found a bug or if I am actually doing something 
>> wrong here.
>>
>> I'll try to introduce the network first, before exposing the actual 
>> problem:
>>
>> Well, I have some hosts that are behind firewalled networks, so 
>> service and host checks have to be submited passively using send_nsca.
>>
>> In the main config I have these refresh options:
>>
>> check_service_freshness=1
>> check_host_freshness=1
>>
>> service_freshness_check_interval=300
>> host_freshness_check_interval=60
>>
>> retain status options are disabled also.
>>
>> A generic host in these conditions uses this template configuration,
>>
>> define host {
>>    name                            generic-passive-unreachable-host
>>
>>    active_checks_enabled           0
>>    passive_checks_enabled          1
>>
>>    obsess_over_host                1
>>    event_handler_enabled           0
>>    flap_detection_enabled          0
>>    process_perf_data               0
>>    retain_status_information       0
>>    retain_nonstatus_information    0
>>
>>    check_command                   host-is-stale
>>    check_freshness                 1
>>    freshness_threshold             120
>>    max_check_attempts              1
>>
>>    notifications_enabled           1
>>    notification_interval           60
>>    notification_period             24x7
>>    notification_options            d,u,r
>>
>>    contact_groups                  dummy-contacts
>>
>>    register                        0
>> }
>>
>> analogous for services:
>>
>> define service {
>>    name                            generic-passive-service
>>
>>    active_checks_enabled           0
>>    passive_checks_enabled          1
>>
>>    obsess_over_service             1
>>    event_handler_enabled           0
>>    flap_detection_enabled          0
>>    process_perf_data               1
>>    retain_status_information       0
>>    retain_nonstatus_information    0
>>    is_volatile                     0
>>
>>    check_command                   service-is-stale
>>    check_freshness                 1
>>    freshness_threshold             300
>>    parallelize_check               1
>>    check_period                    24x7
>>    max_check_attempts              2
>>    normal_check_interval           5
>>    retry_check_interval            5
>>
>>    notifications_enabled           1
>>    notification_interval           60
>>    notification_period             24x7
>>    notification_options            c,r
>>
>>    contact_groups                  dummy-contacts
>>
>>    register                        0
>> }
>>
>>
>> Now to the real problem. I'm having problems with the host status 
>> flapping from UP to DOWN constantly. In my tests I have only the 
>> monitoring server up, the other clients/servers are down. Everytime 
>> the host threshold expires the 'host-is-stale' get run, returning 
>> allways a DOWN state:
>>
>> Apr  9 17:52:09 sr-0 nagios: Warning: The results of host 
>> 'domain.pt_sfci-dr-0' are stale by 60 seconds (threshold=120 
>> seconds).  I'm forcing an immediate check of the host.
>>
>> This is the expected behavior, so far so good...
>>
>> The problem starts happening when I see that this host related 
>> passive services threshold is also expiring, even when the host is in 
>> status DOWN:
>>
>> Apr  9 17:52:17 sr-0 nagios: Warning: The results of service '[SYS] 
>> Swap Usage' on host 'domian.pt_sfci-dr-0' are stale by 40 seconds 
>> (threshold=500 seconds).  I'm forcing an immediate check of the service.
>>
>> Well, when this happens the command 'service-is-stale' get executed 
>> placing the service in an UNKNOWN status and consequently the host 
>> status changes to UP.
>>
>> Now, let me shoot my question, aren't supposed the services checks 
>> for a stated DOWN host be ignored? This is causing the UP/DOWN 
>> flapping instability, I remember that there aren't any other 
>> distributed servers ou clients submiting results, NSCA isn't even 
>> running at this time. Any clues?
>>
>> I'm running nagios version 2.0b2. (I know there is 2.0b3 but since I 
>> havent found any changlog references on this subject, I am aiming for 
>> a configuration problem)
>>
>> Thanks very much,
>>
>> AD
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -------------------------------------------------------
>> SF email is sponsored by - The IT Product Guide
>> Read honest & candid reviews on hundreds of IT Products from real users.
>> Discover which products truly live up to the hype. Start reading now.
>> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when 
>> reporting any issue. ::: Messages without supporting info will risk 
>> being sent to /dev/null
>
>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when 
> reporting any issue. ::: Messages without supporting info will risk 
> being sent to /dev/null




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list