nsca service going critical

Marko Riedel mriedel at neuearbeit.de
Wed Jun 11 14:56:31 CEST 2003


Hi folks,

we have a number of passive services that report to a central nagios
host every five minutes. The log file nagios.log shows external
commands being submitted at precisely the right intervals.

The problem is that at some random point in time the service will go
critical and stay critical for some time before it recovers. It goes
critical with an absurd message like service "stale by 1920 seconds,"
yet the logs show that nsca submits service checks every five minutes.

This is my template:

define service{
        use                             generic-service
        name                            nsca-service
        active_checks_enabled           0
        passive_checks_enabled          1
        is_volatile                     0
        check_period                    none
        max_check_attempts              1
        normal_check_interval           5
        retry_check_interval            1
        contact_groups                  linux-admins
        notification_interval           240
        notification_period             24x7
        notification_options            r,c,w
        check_freshness                 1
        freshness_threshold             480
        check_command                   check_dummy!2
        register                        0
        }

define service{
        use                             nsca-service
        service_description             TRACEROUTE
        host_name                       somename
}




Here is an excerpt from the log:

[1055283610] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;somename;TRACEROUTE;0;somename traceroute okay
[1055283910] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;somename;TRACEROUTE;0;somename traceroute okay
[1055284210] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;somename;TRACEROUTE;0;somename traceroute okay
[1055284510] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;somename;TRACEROUTE;0;somename traceroute okay
[1055284530] SERVICE ALERT: somename;TRACEROUTE;CRITICAL;HARD;1;Status is CRITICAL
[1055284540] Warning: The results of service 'TRACEROUTE' on host 'somename' are stale by 1140 seconds (threshold=480 seconds).  I'm forcing an immediate che
ck of the service.
[1055284810] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;somename;TRACEROUTE;0;somename traceroute okay
[1055285080] Warning: The results of service 'TRACEROUTE' on host 'somename' are stale by 60 seconds (threshold=480 seconds).  I'm forcing an immediate check
 of the service.
[1055285110] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;somename;TRACEROUTE;0;somename traceroute okay
[1055285380] SERVICE ALERT: somename;TRACEROUTE;OK;HARD;1;somename traceroute okay


Best regards,


-- 
+------------------------------------------------------------+
| Marko Riedel, EDV Neue Arbeit gGmbH, mriedel at neuearbeit.de |
| http://www.geocities.com/markoriedelde/index.html          |
+------------------------------------------------------------+


-------------------------------------------------------
This SF.net email is sponsored by:  Etnus, makers of TotalView, The best
thread debugger on the planet. Designed with thread debugging features
you've never dreamed of, try TotalView 6 free at www.etnus.com.
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list