PING service stuck in CRITICAL - can't unstick it

Skip Montanaro skip at pobox.com
Thu Jun 19 00:36:59 CEST 2003


I was browsing through the availability reports and noticed while most
machines are reported as being OK 99.996% of the time some machines are
reported as being CRITICAL 99.996% of the time.  They don't show up on the
service problems or host problems pages though.  Here are the state changes
for the PING service on one host (somewhat abbreviated so as to be readable
in 80 columns):

Event Start     Event End       Duration        State   Info
06-07 06:17:58  06-07 06:17:59  1s              OK      First State Assumed
06-10 22:06:01  06-10 22:26:01  20m 0s          WARN    PING RTA = 110.19 ms
06-10 22:26:01  06-10 22:33:00  6m 59s          OK      PING OK
06-10 22:33:00  06-10 23:01:32  28m 32s         WARN    PING RTA = 123.74 ms
06-10 23:01:32  06-10 23:23:10  21m 38s         CRIT    Plugin timed out
06-10 23:23:10  06-10 23:30:51  7m 41s          WARN    PING Packet loss = 20%
06-10 23:30:51  06-10 23:36:30  5m 39s          CRIT    Plugin timed out
06-10 23:36:30  06-10 23:42:01  5m 31s          WARN    PING Packet loss = 20%
06-10 23:42:01  06-18 08:20:21  7d 8h 38m 20s   CRIT    Plugin timed out

Now looking at today's nagios.log I see:

% egrep ' ad;PING;' nagios.log
[1055921331] SERVICE ALERT: ad;PING;WARNING;SOFT;1;PING WARNING - Packet loss = 20%, RTA = 20.42 ms
[1055921381] SERVICE ALERT: ad;PING;OK;SOFT;2;PING OK - Packet loss = 0%, RTA = 18.91 ms
[1055930982] SERVICE ALERT: ad;PING;WARNING;SOFT;1;PING WARNING - Packet loss = 0%, RTA = 207.67 ms
[1055931051] SERVICE ALERT: ad;PING;OK;SOFT;2;PING OK - Packet loss = 0%, RTA = 20.96 ms
[1055952733] SERVICE ALERT: ad;PING;WARNING;SOFT;1;PING WARNING - Packet loss = 20%, RTA = 2.52 ms
[1055952782] SERVICE ALERT: ad;PING;OK;SOFT;2;PING OK - Packet loss = 0%, RTA = 1.88 ms

The first timestamp is 06-18 02:28:51.  The last is 06-18 11:13:02.  Why is
there no chit in the service availability report which reflects any of
today's state changes?  In fact, looking back at yesterday's log I see 24
entries for the PING service on that host.  Lots of WARNING and OK
transitions.

The machine is clearly pinging fine at the moment:

    % ~nagios/libexec/check_ping -H ad.northwestern.edu -w 100,5% -c 200,10% -p 1
    PING OK - Packet loss = 0%, RTA = 0.62 ms

Is this an example of a flapping host?  If so, shouldn't something show up
on the service problems or host problems pages?

One note of caution.  I'm testing this out on my laptop which goes back and
forth between work and home.  When running at home, firewall and router
access control lists prevent Nagios from performing some checks.  I don't
think the ping test on this particular host is among that bunch, but I can't
be sure because I'm at work at the moment.

Is there a virtual two-by-four I can use to whack Nagios in the head?  I
executed the "Schedule an immediate check" command from the web interface a
couple times, but it appears not to have had any effect.

perplexed-ly y'rs,

-- 
Skip Montanaro
skip at pobox.com
Got spam? http://spambayes.sf.net/


-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list