Trouble with passive checks and freshness

Christopher Odenbach odenbach at uni-paderborn.de
Mon Sep 4 14:24:28 CEST 2006


Hi,

> > This works fine for nearly every host. But there is one host, which
> > is not different from the others, that makes trouble. The data is
> > coming in every 5 minutes, but Nagios keeps flipping between active
> > and passive mode:
>
> Perhaps some individual configuration that crept into your system?
> I'd recommend to check the objects.cache file and see if this host is
> actually set up identical to the others.
>
> Hope this helps,

I just checked the objects.cache file. The host and service entries for 
rana and another host are completely identical:

define host {
        host_name       rana
        check_command   check-host-alive
        contact_groups  No-Alarm
        notification_period     always
        check_interval  0
        max_check_attempts      3
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess_over_host        1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        freshness_threshold     0
        check_freshness 0
        notification_options    d,u,r
        notifications_enabled   1
        notification_interval   0
        stalking_options        n
        process_perf_data       1
        failure_prediction_enabled      1
        retain_status_information       1
        retain_nonstatus_information    1
        }

define service {
        host_name       rana
        service_description     Local disk
        check_period    always
        check_command   check_disk!-w 15% -c 10% -x /afs -e
        contact_groups  Server
        notification_period     always
        normal_check_interval   10
        retry_check_interval    1
        max_check_attempts      1
        is_volatile     0
        parallelize_check       1
        active_checks_enabled   0
        passive_checks_enabled  1
        obsess_over_service     1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        freshness_threshold     2000
        check_freshness 1
        notification_options    w,c,r,f
        notifications_enabled   1
        notification_interval   0
        stalking_options        n
        process_perf_data       1
        failure_prediction_enabled      1
        retain_status_information       1
        retain_nonstatus_information    1
        }

But still all passive checks are flapping. Let me show you the log file:

root at giedi3[nagios]# tail -2000 nagios.log | grep rana | naglog.pl
[...] (naglog.pl just formats the timestamp readable)

Here the passive check results come in - everything ok:

[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;4upgrades;1;PKG WARNING - Upgrade: 
base-config, libc6, libc6-sparc64, libgnutls11, libsasl2, 
libsasl2-modules, login, passwd, perl, perl-base, perl-doc, 
perl-modules, perl-suid
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free 
space:| /=282MB;1297;1374;0;1527 /dev/shm=0MB;427;452;0;503 /var=118MB;205;217;0;242 /boot=10MB;39;42;0;47 /var/log=83MB;630;667;0;742 /var/cache/openafs=32MB;420;445;0;495 /tmp=0MB;398;422;0;469
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Local swap;0;SWAP OK - 100% free (512 
MB out of 512 MB) |swap=511MB;102;51;0;511
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: bosserver;0;PROCS OK: 1 process 
with command name 'bosserver'
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cfexecd;0;PROCS OK: 2 processes 
with args '/usr/sbin/cfexecd'
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cron;0;PROCS OK: 1 process with 
args '/usr/sbin/cron'
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: klogd;0;PROCS OK: 1 process 
with args '/sbin/klogd'
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ntpd;0;PROCS OK: 1 process with 
args '/usr/sbin/ntpd'
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: nullmailer-send;0;PROCS OK: 1 
process with args '/usr/sbin/nullmailer-send'
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ptserver;0;PROCS OK: 1 process 
with command name 'ptserver'
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: syslogd;0;PROCS OK: 1 process 
with args '/sbin/syslogd'
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: vlserver;0;PROCS OK: 1 process 
with command name 'vlserver'
[04.09. 13:17:16]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;System load;0;OK - load average: 
0.33, 0.15, 0.10|load1=0.330;3.000;5.000;0; 
load5=0.150;9999.000;9999.000;0; load15=0.100;9999.000;9999.000;0;

Five seconds later Nagios updates the service states:

[04.09. 13:17:21]  SERVICE ALERT: rana;4upgrades;WARNING;HARD;1;PKG 
WARNING - Upgrade: base-config, libc6, libc6-sparc64, libgnutls11, 
libsasl2, libsasl2-modules, login, passwd, perl, perl-base, perl-doc, 
perl-modules, perl-suid
[04.09. 13:17:21]  SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - 
free space:
[04.09. 13:17:21]  SERVICE ALERT: rana;Proc: cron;OK;HARD;1;PROCS OK: 1 
process with args '/usr/sbin/cron'
[04.09. 13:17:21]  SERVICE ALERT: rana;Proc: klogd;OK;HARD;1;PROCS OK: 1 
process with args '/sbin/klogd'
[04.09. 13:17:21]  SERVICE ALERT: rana;Proc: syslogd;OK;HARD;1;PROCS OK: 
1 process with args '/sbin/syslogd'
[04.09. 13:17:21]  SERVICE ALERT: rana;Proc: vlserver;OK;HARD;1;PROCS 
OK: 1 process with command name 'vlserver'

20 seconds later some services fall down to unknown state (which is done 
by switching them to active). This should not happen because there was 
correct data a few lines above!

[04.09. 13:17:41]  SERVICE ALERT: rana;Local 
swap;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:18:12]  SERVICE ALERT: rana;Proc: 
ntpd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:20:31]  SERVICE ALERT: rana;Proc: 
nullmailer-send;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not 
running?
[04.09. 13:20:41]  SERVICE ALERT: rana;System 
load;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:20:51]  SERVICE ALERT: rana;Proc: 
bosserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:21:32]  SERVICE ALERT: rana;Proc: 
ptserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:21:32]  SERVICE ALERT: rana;Proc: 
cfexecd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?

After five minutes the same thing. Fresh data comes in:

[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;4upgrades;1;PKG WARNING - Upgrade: 
base-config, libc6, libc6-sparc64, libgnutls11, libsasl2, 
libsasl2-modules, login, passwd, perl, perl-base, perl-doc, 
perl-modules, perl-suid
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free 
space:| /=282MB;1297;1374;0;1527 /dev/shm=0MB;427;452;0;503 /var=118MB;205;217;0;242 /boot=10MB;39;42;0;47 /var/log=83MB;630;667;0;742 /var/cache/openafs=32MB;420;445;0;495 /tmp=0MB;398;422;0;469
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Local swap;0;SWAP OK - 100% free (512 
MB out of 512 MB) |swap=511MB;102;51;0;511
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: bosserver;0;PROCS OK: 1 process 
with command name 'bosserver'
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cfexecd;0;PROCS OK: 2 processes 
with args '/usr/sbin/cfexecd'
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cron;0;PROCS OK: 1 process with 
args '/usr/sbin/cron'
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: klogd;0;PROCS OK: 1 process 
with args '/sbin/klogd'
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ntpd;0;PROCS OK: 1 process with 
args '/usr/sbin/ntpd'
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: nullmailer-send;0;PROCS OK: 1 
process with args '/usr/sbin/nullmailer-send'
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ptserver;0;PROCS OK: 1 process 
with command name 'ptserver'
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: syslogd;0;PROCS OK: 1 process 
with args '/sbin/syslogd'
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;Proc: vlserver;0;PROCS OK: 1 process 
with command name 'vlserver'
[04.09. 13:22:41]  EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;rana;System load;0;OK - load average: 
0.24, 0.12, 0.10|load1=0.240;3.000;5.000;0; 
load5=0.120;9999.000;9999.000;0; load15=0.100;9999.000;9999.000;0;

The services toggle to OK again:

[04.09. 13:22:51]  SERVICE ALERT: rana;Local swap;OK;HARD;1;SWAP OK - 
100% free (512 MB out of 512 MB)
[04.09. 13:22:51]  SERVICE ALERT: rana;Proc: bosserver;OK;HARD;1;PROCS 
OK: 1 process with command name 'bosserver'
[04.09. 13:22:51]  SERVICE ALERT: rana;Proc: cfexecd;OK;HARD;1;PROCS OK: 
2 processes with args '/usr/sbin/cfexecd'
[04.09. 13:22:51]  SERVICE ALERT: rana;Proc: ntpd;OK;HARD;1;PROCS OK: 1 
process with args '/usr/sbin/ntpd'
[04.09. 13:22:51]  SERVICE ALERT: rana;Proc: 
nullmailer-send;OK;HARD;1;PROCS OK: 1 process with args 
'/usr/sbin/nullmailer-send'
[04.09. 13:22:51]  SERVICE ALERT: rana;Proc: ptserver;OK;HARD;1;PROCS 
OK: 1 process with command name 'ptserver'
[04.09. 13:22:51]  SERVICE ALERT: rana;System load;OK;HARD;1;OK - load 
average: 0.24, 0.12, 0.10

Then the services which should still be ok fall to unknown state:

[04.09. 13:24:31]  SERVICE ALERT: rana;Proc: 
cron;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:24:41]  SERVICE ALERT: rana;Proc: 
syslogd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:24:41]  SERVICE ALERT: rana;4upgrades;UNKNOWN;HARD;1;UNKNOWN: 
No data from host - nsce not running?
[04.09. 13:25:41]  SERVICE ALERT: rana;Proc: 
vlserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
[04.09. 13:25:41]  SERVICE ALERT: rana;Local 
disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
root at giedi3[nagios]#

What is going on here?

Thanks,

Christopher

-- 
======================================================
    Dipl.-Ing. Christopher Odenbach
    Zentrum fuer Informations- und Medientechnologien
    Universitaet Paderborn
    Raum N5.110
    odenbach at uni-paderborn.de
    Tel.: +49 5251 60 5315
======================================================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20060904/389317c2/attachment.sig>
-------------- next part --------------
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list