Gaps in availability report (ie the end of a down != the start of the next up)

Stanley Hopcroft Stanley.Hopcroft at IPAustralia.Gov.AU
Wed Feb 26 04:45:38 CET 2003


Dear Ladies and Gentlemen,

I am writing to report a perplexing observation in the avail.cgi report
from Nagios-1.0, for both a host and a service on the host.

The problem is that the report presents alternating down and up
intervals with the start and end times of each intervals. To my suprise
however, the start of the up interval after a down is not the same as
end of the down.

Formally
     Start   Up
Down d1      u1
Up   u2      d2 ... with u1 != u2.


For example,

tsitc> lynx 
'http://<nagios>/cgi-bin/avail.cgi?t1=1046143534&t2=1046229934&show_log_entries=&host=hpa-bne&service=ISDN+access+to+HPA&assumeinitialstates=yes&
assumestateretention=yes&initialassumedstate=6&backtrack=0&timeperiod=thismonth'


Event Start Time     Event End Time    Event Duration  Event/State Type            
Event/State Information
02-01-2003 00:00:00 02-03-2003 15:18:57 2d 15h 18m 57s  SERVICE OK       
First Service State Assumed (Faked Log Entry)
02-20-2003 20:42:08 02-21-2003 10:49:46 0d 14h 7m 38s   SERVICE CRITICAL 
CRITICAL - Plugin timed out after 15 seconds
02-21-2003 10:52:56 02-21-2003 12:06:15 0d 1h 13m 19s   SERVICE OK       
PING ok - Packet loss = 0%, RTA = 76.12 ms
02-21-2003 19:18:06 02-22-2003 18:02:13 0d 22h 44m 7s   SERVICE CRITICAL 
CRITICAL - Plugin timed out after 15 seconds
02-24-2003 12:29:39 02-24-2003 15:09:47 0d 2h 40m 8s    SERVICE OK       
PING ok - Packet loss = 0%, RTA = 436.29 ms
02-25-2003 12:46:41 02-25-2003 13:02:33 0d 0h 15m 52s   SERVICE CRITICAL 
CRITICAL - Plugin timed out after 15 seconds
02-25-2003 17:14:54 02-26-2003 14:30:04 0d 21h 15m 10s+ SERVICE OK       
PING ok - Packet loss = 20%, RTA = 110.98 ms

(This is the service availability report but the host availability is 
the same)

Nag was running all the time the service was being monitored, and the 
service (check_ping) does not return UNKNOWNs.

Here is the log extract for the intervals above,

tsitc> tail -2000 nagios.log | grep hpa-bne | grep HARD | 
./ns_log_localtime

Tue Feb 25 12:46:39 HOST ALERT: hpa-bne;DOWN;HARD;10;CRITICAL - Plugin 
timed out after 10 seconds

Tue Feb 25 12:46:41 SERVICE ALERT: hpa-bne;ISDN access to 
HPA;CRITICAL;HARD;1;CRITICAL - Plugin timed out after 15 seconds

Tue Feb 25 17:14:53 HOST ALERT: hpa-bne;UP;HARD;1;PING ok - Packet loss 
= 0%, RTA = 75.09 ms

Tue Feb 25 17:14:54 SERVICE ALERT: hpa-bne;ISDN access to 
HPA;OK;HARD;1;PING ok - Packet loss = 20%, RTA = 110.98 ms
tsitc> 

tsitc> tail -2000 nagios.log | grep PROGRAM | ./ns_log_localtime
tsitc> tail -2000 nagios.log | grep UNKNOWN | grep hpa-bne 
./ns_log_localtime
tsitc>

I expected the SERVICE CRITCAL end time in the report to be 
17:14 (the start of the SERVICE OK interval) instead of 13:02.

The check_interval is 10 minutes; the log rotation period is monthly.

The Nagios log can be posted if necessary.

I haven't any research apart from searching the Nag FAQs for 
'availability'. 

This isn't too much of a problem as usually determine down and up times 
from the log. It is suprising and unsettling to see that it can't be 
done from the  availability CGI and suggests that the availability 
computation doesn't allow for the gap intervals.

Please let me know if this is a STFW.

Yours sincerely. 


 -- 
------------------------------------------------------------------------
Stanley Hopcroft
------------------------------------------------------------------------

'...No man is an island, entire of itself; every man is a piece of the
continent, a part of the main. If a clod be washed away by the sea,
Europe is the less, as well as if a promontory were, as well as if a
manor of thy friend's or of thine own were. Any man's death diminishes
me, because I am involved in mankind; and therefore never send to know
for whom the bell tolls; it tolls for thee...'

from Meditation 17, J Donne.


-------------------------------------------------------
This SF.net email is sponsored by: Scholarships for Techies!
Can't afford IT training? All 2003 ictp students receive scholarships.
Get hands-on training in Microsoft, Cisco, Sun, Linux/UNIX, and more.
www.ictp.com/training/sourceforge.asp
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list