passive host checks status instability, bug or configuration error?

Artur D'Assumpção artur.dassumpcao at di.com.pt
Sat Apr 9 19:03:15 CEST 2005


Hi ppl,

I'm having very strange instable results in passive host checks, I don't 
know if i've found a bug or if I am actually doing something wrong here.

I'll try to introduce the network first, before exposing the actual problem:

Well, I have some hosts that are behind firewalled networks, so service 
and host checks have to be submited passively using send_nsca.

In the main config I have these refresh options:

check_service_freshness=1
check_host_freshness=1

service_freshness_check_interval=300
host_freshness_check_interval=60

retain status options are disabled also.

A generic host in these conditions uses this template configuration,

define host {
    name                            generic-passive-unreachable-host

    active_checks_enabled           0
    passive_checks_enabled          1

    obsess_over_host                1
    event_handler_enabled           0
    flap_detection_enabled          0
    process_perf_data               0
    retain_status_information       0
    retain_nonstatus_information    0

    check_command                   host-is-stale
    check_freshness                 1
    freshness_threshold             120
    max_check_attempts              1

    notifications_enabled           1
    notification_interval           60
    notification_period             24x7
    notification_options            d,u,r

    contact_groups                  dummy-contacts

    register                        0
}

analogous for services:

define service {
    name                            generic-passive-service

    active_checks_enabled           0
    passive_checks_enabled          1

    obsess_over_service             1
    event_handler_enabled           0
    flap_detection_enabled          0
    process_perf_data               1
    retain_status_information       0
    retain_nonstatus_information    0
    is_volatile                     0

    check_command                   service-is-stale
    check_freshness                 1
    freshness_threshold             300
    parallelize_check               1
    check_period                    24x7
    max_check_attempts              2
    normal_check_interval           5
    retry_check_interval            5

    notifications_enabled           1
    notification_interval           60
    notification_period             24x7
    notification_options            c,r

    contact_groups                  dummy-contacts

    register                        0
}


Now to the real problem. I'm having problems with the host status 
flapping from UP to DOWN constantly. In my tests I have only the 
monitoring server up, the other clients/servers are down. Everytime the 
host threshold expires the 'host-is-stale' get run, returning allways a 
DOWN state:

Apr  9 17:52:09 sr-0 nagios: Warning: The results of host 
'domain.pt_sfci-dr-0' are stale by 60 seconds (threshold=120 seconds).  
I'm forcing an immediate check of the host.

This is the expected behavior, so far so good...

The problem starts happening when I see that this host related passive 
services threshold is also expiring, even when the host is in status DOWN:

Apr  9 17:52:17 sr-0 nagios: Warning: The results of service '[SYS] Swap 
Usage' on host 'domian.pt_sfci-dr-0' are stale by 40 seconds 
(threshold=500 seconds).  I'm forcing an immediate check of the service.

Well, when this happens the command 'service-is-stale' get executed 
placing the service in an UNKNOWN status and consequently the host 
status changes to UP.

Now, let me shoot my question, aren't supposed the services checks for a 
stated DOWN host be ignored? This is causing the UP/DOWN flapping 
instability, I remember that there aren't any other distributed servers 
ou clients submiting results, NSCA isn't even running at this time. Any 
clues?

I'm running nagios version 2.0b2. (I know there is 2.0b3 but since I 
havent found any changlog references on this subject, I am aiming for a 
configuration problem)

Thanks very much,

AD










-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list