Trouble with passive checks and freshness

Arno Lehmann al at its-lehmann.de
Mon Sep 4 21:57:12 CEST 2006


Hi,

On 9/4/2006 2:24 PM, Christopher Odenbach wrote:
> Hi,
> 
> 
>>>This works fine for nearly every host. But there is one host, which
>>>is not different from the others, that makes trouble. The data is
>>>coming in every 5 minutes, but Nagios keeps flipping between active
>>>and passive mode:
>>
>>Perhaps some individual configuration that crept into your system?
>>I'd recommend to check the objects.cache file and see if this host is
>>actually set up identical to the others.
>>
>>Hope this helps,
> 
> 
> I just checked the objects.cache file. The host and service entries for 
> rana and another host are completely identical:
> 
> define host {
>         host_name       rana
>         check_command   check-host-alive
>         contact_groups  No-Alarm
>         notification_period     always
>         check_interval  0
>         max_check_attempts      3
>         active_checks_enabled   1
Try disabling active checks in the configuration
>         passive_checks_enabled  1
>         obsess_over_host        1
>         event_handler_enabled   1
>         low_flap_threshold      0.000000
>         high_flap_threshold     0.000000
>         flap_detection_enabled  1
>         freshness_threshold     0
>         check_freshness 0
>         notification_options    d,u,r
>         notifications_enabled   1
>         notification_interval   0
>         stalking_options        n
>         process_perf_data       1
>         failure_prediction_enabled      1
>         retain_status_information       1
>         retain_nonstatus_information    1
>         }
> 
> define service {
>         host_name       rana
>         service_description     Local disk
>         check_period    always
>         check_command   check_disk!-w 15% -c 10% -x /afs -e
>         contact_groups  Server
>         notification_period     always
>         normal_check_interval   10
>         retry_check_interval    1
>         max_check_attempts      1
>         is_volatile     0
>         parallelize_check       1
>         active_checks_enabled   0
Or rather, in the web front-end... I guess you overlooked this 
difference :-)

Arno

>         passive_checks_enabled  1
>         obsess_over_service     1
>         event_handler_enabled   1
>         low_flap_threshold      0.000000
>         high_flap_threshold     0.000000
>         flap_detection_enabled  1
>         freshness_threshold     2000
>         check_freshness 1
>         notification_options    w,c,r,f
>         notifications_enabled   1
>         notification_interval   0
>         stalking_options        n
>         process_perf_data       1
>         failure_prediction_enabled      1
>         retain_status_information       1
>         retain_nonstatus_information    1
>         }
> 
> But still all passive checks are flapping. Let me show you the log file:
> 
> root at giedi3[nagios]# tail -2000 nagios.log | grep rana | naglog.pl
> [...] (naglog.pl just formats the timestamp readable)
> 
> Here the passive check results come in - everything ok:
> 
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;4upgrades;1;PKG WARNING - Upgrade: 
> base-config, libc6, libc6-sparc64, libgnutls11, libsasl2, 
> libsasl2-modules, login, passwd, perl, perl-base, perl-doc, 
> perl-modules, perl-suid
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free 
> space:| /=282MB;1297;1374;0;1527 /dev/shm=0MB;427;452;0;503 /var=118MB;205;217;0;242 /boot=10MB;39;42;0;47 /var/log=83MB;630;667;0;742 /var/cache/openafs=32MB;420;445;0;495 /tmp=0MB;398;422;0;469
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Local swap;0;SWAP OK - 100% free (512 
> MB out of 512 MB) |swap=511MB;102;51;0;511
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: bosserver;0;PROCS OK: 1 process 
> with command name 'bosserver'
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cfexecd;0;PROCS OK: 2 processes 
> with args '/usr/sbin/cfexecd'
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cron;0;PROCS OK: 1 process with 
> args '/usr/sbin/cron'
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: klogd;0;PROCS OK: 1 process 
> with args '/sbin/klogd'
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ntpd;0;PROCS OK: 1 process with 
> args '/usr/sbin/ntpd'
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: nullmailer-send;0;PROCS OK: 1 
> process with args '/usr/sbin/nullmailer-send'
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ptserver;0;PROCS OK: 1 process 
> with command name 'ptserver'
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: syslogd;0;PROCS OK: 1 process 
> with args '/sbin/syslogd'
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: vlserver;0;PROCS OK: 1 process 
> with command name 'vlserver'
> [04.09. 13:17:16]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;System load;0;OK - load average: 
> 0.33, 0.15, 0.10|load1=0.330;3.000;5.000;0; 
> load5=0.150;9999.000;9999.000;0; load15=0.100;9999.000;9999.000;0;
> 
> Five seconds later Nagios updates the service states:
> 
> [04.09. 13:17:21]  SERVICE ALERT: rana;4upgrades;WARNING;HARD;1;PKG 
> WARNING - Upgrade: base-config, libc6, libc6-sparc64, libgnutls11, 
> libsasl2, libsasl2-modules, login, passwd, perl, perl-base, perl-doc, 
> perl-modules, perl-suid
> [04.09. 13:17:21]  SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - 
> free space:
> [04.09. 13:17:21]  SERVICE ALERT: rana;Proc: cron;OK;HARD;1;PROCS OK: 1 
> process with args '/usr/sbin/cron'
> [04.09. 13:17:21]  SERVICE ALERT: rana;Proc: klogd;OK;HARD;1;PROCS OK: 1 
> process with args '/sbin/klogd'
> [04.09. 13:17:21]  SERVICE ALERT: rana;Proc: syslogd;OK;HARD;1;PROCS OK: 
> 1 process with args '/sbin/syslogd'
> [04.09. 13:17:21]  SERVICE ALERT: rana;Proc: vlserver;OK;HARD;1;PROCS 
> OK: 1 process with command name 'vlserver'
> 
> 20 seconds later some services fall down to unknown state (which is done 
> by switching them to active). This should not happen because there was 
> correct data a few lines above!
> 
> [04.09. 13:17:41]  SERVICE ALERT: rana;Local 
> swap;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:18:12]  SERVICE ALERT: rana;Proc: 
> ntpd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:20:31]  SERVICE ALERT: rana;Proc: 
> nullmailer-send;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not 
> running?
> [04.09. 13:20:41]  SERVICE ALERT: rana;System 
> load;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:20:51]  SERVICE ALERT: rana;Proc: 
> bosserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:21:32]  SERVICE ALERT: rana;Proc: 
> ptserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:21:32]  SERVICE ALERT: rana;Proc: 
> cfexecd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> 
> After five minutes the same thing. Fresh data comes in:
> 
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;4upgrades;1;PKG WARNING - Upgrade: 
> base-config, libc6, libc6-sparc64, libgnutls11, libsasl2, 
> libsasl2-modules, login, passwd, perl, perl-base, perl-doc, 
> perl-modules, perl-suid
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free 
> space:| /=282MB;1297;1374;0;1527 /dev/shm=0MB;427;452;0;503 /var=118MB;205;217;0;242 /boot=10MB;39;42;0;47 /var/log=83MB;630;667;0;742 /var/cache/openafs=32MB;420;445;0;495 /tmp=0MB;398;422;0;469
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Local swap;0;SWAP OK - 100% free (512 
> MB out of 512 MB) |swap=511MB;102;51;0;511
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: bosserver;0;PROCS OK: 1 process 
> with command name 'bosserver'
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cfexecd;0;PROCS OK: 2 processes 
> with args '/usr/sbin/cfexecd'
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: cron;0;PROCS OK: 1 process with 
> args '/usr/sbin/cron'
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: klogd;0;PROCS OK: 1 process 
> with args '/sbin/klogd'
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ntpd;0;PROCS OK: 1 process with 
> args '/usr/sbin/ntpd'
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: nullmailer-send;0;PROCS OK: 1 
> process with args '/usr/sbin/nullmailer-send'
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: ptserver;0;PROCS OK: 1 process 
> with command name 'ptserver'
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: syslogd;0;PROCS OK: 1 process 
> with args '/sbin/syslogd'
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;Proc: vlserver;0;PROCS OK: 1 process 
> with command name 'vlserver'
> [04.09. 13:22:41]  EXTERNAL COMMAND: 
> PROCESS_SERVICE_CHECK_RESULT;rana;System load;0;OK - load average: 
> 0.24, 0.12, 0.10|load1=0.240;3.000;5.000;0; 
> load5=0.120;9999.000;9999.000;0; load15=0.100;9999.000;9999.000;0;
> 
> The services toggle to OK again:
> 
> [04.09. 13:22:51]  SERVICE ALERT: rana;Local swap;OK;HARD;1;SWAP OK - 
> 100% free (512 MB out of 512 MB)
> [04.09. 13:22:51]  SERVICE ALERT: rana;Proc: bosserver;OK;HARD;1;PROCS 
> OK: 1 process with command name 'bosserver'
> [04.09. 13:22:51]  SERVICE ALERT: rana;Proc: cfexecd;OK;HARD;1;PROCS OK: 
> 2 processes with args '/usr/sbin/cfexecd'
> [04.09. 13:22:51]  SERVICE ALERT: rana;Proc: ntpd;OK;HARD;1;PROCS OK: 1 
> process with args '/usr/sbin/ntpd'
> [04.09. 13:22:51]  SERVICE ALERT: rana;Proc: 
> nullmailer-send;OK;HARD;1;PROCS OK: 1 process with args 
> '/usr/sbin/nullmailer-send'
> [04.09. 13:22:51]  SERVICE ALERT: rana;Proc: ptserver;OK;HARD;1;PROCS 
> OK: 1 process with command name 'ptserver'
> [04.09. 13:22:51]  SERVICE ALERT: rana;System load;OK;HARD;1;OK - load 
> average: 0.24, 0.12, 0.10
> 
> Then the services which should still be ok fall to unknown state:
> 
> [04.09. 13:24:31]  SERVICE ALERT: rana;Proc: 
> cron;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:24:41]  SERVICE ALERT: rana;Proc: 
> syslogd;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:24:41]  SERVICE ALERT: rana;4upgrades;UNKNOWN;HARD;1;UNKNOWN: 
> No data from host - nsce not running?
> [04.09. 13:25:41]  SERVICE ALERT: rana;Proc: 
> vlserver;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> [04.09. 13:25:41]  SERVICE ALERT: rana;Local 
> disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce not running?
> root at giedi3[nagios]#
> 
> What is going on here?
> 
> Thanks,
> 
> Christopher
> 

-- 
IT-Service Lehmann                    al at its-lehmann.de
Arno Lehmann                  http://www.its-lehmann.de

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list