passive host checks status instability, bug or configuration error?

Artur D'Assumpção artur.dassumpcao at di.com.pt
Sat Apr 9 20:42:17 CEST 2005


I am still having the same problem in the same conditions, but I think 
I've found a workaround that can help you debug this issue:

I've changed the service-is-stale with another plugin that returns 1 of 
2 states possible depending on the $HOSTSTATE$ macro:

sr-0 plugins-di # cat service_is_stale
if [ $1 == "DOWN" ]; then
        /usr/nagios/libexec/check_dummy 2 "Service results not received"
elif [ $1 == "UP" ]; then
        /usr/nagios/libexec/check_dummy 3 "Service results not received"
fi


--

# default command used when nsca results for a given host's service 
wheren't received
define command {
    command_name    service-is-stale
    command_line    $USER2$/service_is_stale $HOSTSTATE$
}

Now, when he service checks get staled the status returned depends the 
$HOSTSTATE$ macro. In this specific case, why the host status is DOWN 
the returned value for the staled services is CRITICAL, leading to a not 
change of the host status.

Anyway, I still have the same question, isn't supposed to ignore all the 
service checks if the host is stated DOWN?

AD



Artur D'Assumpção wrote:

> Hi ppl,
>
> I'm having very strange instable results in passive host checks, I 
> don't know if i've found a bug or if I am actually doing something 
> wrong here.
>
> I'll try to introduce the network first, before exposing the actual 
> problem:
>
> Well, I have some hosts that are behind firewalled networks, so 
> service and host checks have to be submited passively using send_nsca.
>
> In the main config I have these refresh options:
>
> check_service_freshness=1
> check_host_freshness=1
>
> service_freshness_check_interval=300
> host_freshness_check_interval=60
>
> retain status options are disabled also.
>
> A generic host in these conditions uses this template configuration,
>
> define host {
>    name                            generic-passive-unreachable-host
>
>    active_checks_enabled           0
>    passive_checks_enabled          1
>
>    obsess_over_host                1
>    event_handler_enabled           0
>    flap_detection_enabled          0
>    process_perf_data               0
>    retain_status_information       0
>    retain_nonstatus_information    0
>
>    check_command                   host-is-stale
>    check_freshness                 1
>    freshness_threshold             120
>    max_check_attempts              1
>
>    notifications_enabled           1
>    notification_interval           60
>    notification_period             24x7
>    notification_options            d,u,r
>
>    contact_groups                  dummy-contacts
>
>    register                        0
> }
>
> analogous for services:
>
> define service {
>    name                            generic-passive-service
>
>    active_checks_enabled           0
>    passive_checks_enabled          1
>
>    obsess_over_service             1
>    event_handler_enabled           0
>    flap_detection_enabled          0
>    process_perf_data               1
>    retain_status_information       0
>    retain_nonstatus_information    0
>    is_volatile                     0
>
>    check_command                   service-is-stale
>    check_freshness                 1
>    freshness_threshold             300
>    parallelize_check               1
>    check_period                    24x7
>    max_check_attempts              2
>    normal_check_interval           5
>    retry_check_interval            5
>
>    notifications_enabled           1
>    notification_interval           60
>    notification_period             24x7
>    notification_options            c,r
>
>    contact_groups                  dummy-contacts
>
>    register                        0
> }
>
>
> Now to the real problem. I'm having problems with the host status 
> flapping from UP to DOWN constantly. In my tests I have only the 
> monitoring server up, the other clients/servers are down. Everytime 
> the host threshold expires the 'host-is-stale' get run, returning 
> allways a DOWN state:
>
> Apr  9 17:52:09 sr-0 nagios: Warning: The results of host 
> 'domain.pt_sfci-dr-0' are stale by 60 seconds (threshold=120 
> seconds).  I'm forcing an immediate check of the host.
>
> This is the expected behavior, so far so good...
>
> The problem starts happening when I see that this host related passive 
> services threshold is also expiring, even when the host is in status 
> DOWN:
>
> Apr  9 17:52:17 sr-0 nagios: Warning: The results of service '[SYS] 
> Swap Usage' on host 'domian.pt_sfci-dr-0' are stale by 40 seconds 
> (threshold=500 seconds).  I'm forcing an immediate check of the service.
>
> Well, when this happens the command 'service-is-stale' get executed 
> placing the service in an UNKNOWN status and consequently the host 
> status changes to UP.
>
> Now, let me shoot my question, aren't supposed the services checks for 
> a stated DOWN host be ignored? This is causing the UP/DOWN flapping 
> instability, I remember that there aren't any other distributed 
> servers ou clients submiting results, NSCA isn't even running at this 
> time. Any clues?
>
> I'm running nagios version 2.0b2. (I know there is 2.0b3 but since I 
> havent found any changlog references on this subject, I am aiming for 
> a configuration problem)
>
> Thanks very much,
>
> AD
>
>
>
>
>
>
>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when 
> reporting any issue. ::: Messages without supporting info will risk 
> being sent to /dev/null




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list