Trouble with passive checks and freshness

Arno Lehmann al at its-lehmann.de
Fri Sep 1 22:57:44 CEST 2006


Hi,

On 9/1/2006 1:48 PM, Christopher Odenbach wrote:
> Hi,
> 
> we have the following setup:
> 
> Nagios 2.5
> 73 hosts
> 335 active checks
> 765 passive checks
> 
> Each host submits its passive check results every 5 minutes into the 
> nagios command file. The freshness threshold is set to 2000 seconds,
> so the service stays passive, when everything runs as it should.
> The check_command is defined as check_dummy, which generates the
> message "No data from host" when executed actively.
> 
> This works fine for nearly every host. But there is one host, which
> is not different from the others, that makes trouble. The data is
> coming in every 5 minutes, but Nagios keeps flipping between active
> and passive mode:

Perhaps some individual configuration that crept into your system? I'd 
recommend to check the objects.cache file and see if this host is 
actually set up identical to the others.

Hope this helps,

Arno

> root at giedi3[nagios]# tail -10000 nagios.log | grep rana | grep disk | naglog.pl  | cut -c-100
> [01.09. 12:29:04]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 12:31:49]  SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
> [01.09. 12:34:24]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 12:34:29]  SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
> [01.09. 12:39:44]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 12:41:50]  SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
> [01.09. 12:45:04]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 12:45:10]  SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
> [01.09. 12:50:29]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 12:51:49]  SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
> [01.09. 12:55:49]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 12:55:59]  SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
> [01.09. 13:01:09]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 13:01:49]  SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
> [01.09. 13:06:29]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 13:06:39]  SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
> [01.09. 13:11:50]  SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
> [01.09. 13:11:50]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 13:11:50]  SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
> [01.09. 13:17:09]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 13:21:49]  SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
> [01.09. 13:22:30]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 13:22:39]  SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
> [01.09. 13:27:54]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 13:31:49]  SERVICE ALERT: rana;Local disk;UNKNOWN;HARD;1;UNKNOWN: No data from host - nsce n
> [01.09. 13:33:14]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;rana;Local disk;0;DISK OK - free s
> [01.09. 13:33:19]  SERVICE ALERT: rana;Local disk;OK;HARD;1;DISK OK - free space:
> 
> All other hosts work just fine:
> 
> root at giedi3[nagios]# tail -10000 nagios.log | grep etamin | grep disk | naglog.pl  | cut -c-100
> [01.09. 12:30:49]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 12:36:09]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 12:41:29]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 12:46:50]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 12:52:09]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 12:57:29]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 13:02:49]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 13:08:09]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 13:13:34]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 13:18:54]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 13:24:14]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 13:29:34]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> [01.09. 13:34:54]  EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;etamin;Local disk;0;DISK OK - free
> 
> The configuration is:
> 
> define service {
>         use                             check_local_disk_templ
>         host_name                       rana
> }
> define service{
>         use                             passive_templ
>         name                            check_local_disk_templ
>         service_description             Local disk
>         servicegroups                   local-disk-services
>         check_command                   check_disk!-w 15% -c 10% -x /afs -e
>         notifications_enabled           1
>         register                        0
> }
> define service{
>         name                            passive_templ
>         register                        0
>         max_check_attempts              1
>         normal_check_interval           10
>         retry_check_interval            1
>         active_checks_enabled           0
>         passive_checks_enabled          1
>         check_freshness                 1
>         freshness_threshold             2000
>         check_period                    always
>         notification_interval           0
>         notification_period             always
>         notification_options            w,c,r,f
>         notifications_enabled           0
>         contact_groups                  Server
>         process_perf_data               1
> }
> define command{
>         command_name            check_disk
>         command_line            $USER1$/check_dummy 3 "No data from host - nsce not running?"
> }
> 
> Any ideas what is going wrong here? Why is Nagios flipping the
> service to active when data has arrived less than 2000 seconds ago?
> 
> Thanks,
> 
> Christopher
> 
> 
> 
> ------------------------------------------------------------------------
> 
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null

-- 
IT-Service Lehmann                    al at its-lehmann.de
Arno Lehmann                  http://www.its-lehmann.de

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list