escalation bug?

Adam R. Forsyth forsytad at luther.edu
Tue May 11 05:19:58 CEST 2004


I think I may have found a bug in nagios escallations.  We're currently
using Nagios 1.0.

Tonight we have a service that has been flip-flopping between yellow and red.
That's the way picking a set tollerance goes soemtimes, and I understand
that.  What I don't get is why paging for this service occurred the way it
did.

Correlating the state changes, the notifications, and the acknowlegements,
I get a story that goes something like this:

--State goes critical
--Primary gets paged
--Primary Acknowleges

--State goes to warning
--State goes to Critical
--Primary gets paged
--Primary Acknowleges

--State goes to warning
--State goes to Critical
--Primary gets paged
--State goes to warning
--State goes to Critical
--Escallation gets paged.

Now, our paging rules say that nagios should page the primary 3 times, and
then escallate.  I guess that's what it did, but what I don't understand
is why the state change between Warning and Critical seems to clear the
acknowlegement, but does not restart the primary escallation cycle.

Based on how it behaves, I guessing it takes going back to a Green state
before the paging cycle would start back over at the first page to the
primary.  If that's the case, though, shouldn't it also take going back to
a green state to remove the acknowleged state?

As it is, it was an unfortunate and annoying series of events that kept
paging the primary, but I don't see any bennefit in having the escalation
notified when the primary was attempting to acknowlege that he was
watching this situation.

So, is this a bug, or a feature that I'm not understanding the bennefit of?

Thanks for any insight that anyone can provide.


>From event log:
Total Processes
CRITICAL
05-10-2004 21:24:46
monesc
notify-by-epager
CRITICAL - 417 processes running

students
Total Processes
CRITICAL
05-10-2004 21:24:44
moncall
notify-by-epager
CRITICAL - 417 processes running

students
Total Processes
CRITICAL
05-10-2004 21:04:43
monesc
notify-by-epager
CRITICAL - 426 processes running

students
Total Processes
CRITICAL
05-10-2004 20:49:44
monesc
notify-by-epager
CRITICAL - 427 processes running

students
Total Processes
CRITICAL
05-10-2004 20:39:43
moncall
notify-by-epager
CRITICAL - 416 processes running

students
Total Processes
ACKNOWLEDGEMENT (CRITICAL)
05-10-2004 20:31:33
moncall
notify-by-epager
Aknowledged -thw

students
Total Processes
CRITICAL
05-10-2004 20:29:44
moncall
notify-by-epager
CRITICAL - 424 processes running

students
Total Processes
ACKNOWLEDGEMENT (CRITICAL)
05-10-2004 20:16:18
moncall
notify-by-epager
Called Chris - Said to acknowledge and let it go... THW

students
Total Processes
CRITICAL
05-10-2004 20:09:43
moncall
notify-by-epager
CRITICAL - 432 processes running

>From the service alert history for this service:
:

05-10-2004 21:29:43] SERVICE ALERT: students;Total
Processes;WARNING;HARD;3;WARNING - 415 processes running
[05-10-2004 21:24:44] SERVICE ALERT: students;Total
Processes;CRITICAL;HARD;3;CRITICAL - 417 processes running
[05-10-2004 21:19:43] SERVICE ALERT: students;Total
Processes;WARNING;HARD;3;WARNING - 410 processes running
[05-10-2004 21:04:43] SERVICE ALERT: students;Total
Processes;CRITICAL;HARD;3;CRITICAL - 426 processes running

[05-10-2004 20:59:44] SERVICE ALERT: students;Total
Processes;WARNING;HARD;3;WARNING - 415 processes running
[05-10-2004 20:49:44] SERVICE ALERT: students;Total
Processes;CRITICAL;HARD;3;CRITICAL - 427 processes running
[05-10-2004 20:44:43] SERVICE ALERT: students;Total
Processes;WARNING;HARD;3;WARNING - 414 processes running
[05-10-2004 20:39:43] SERVICE ALERT: students;Total
Processes;CRITICAL;HARD;3;CRITICAL - 416 processes running
[05-10-2004 20:34:43] SERVICE ALERT: students;Total
Processes;WARNING;HARD;3;WARNING - 414 processes running
[05-10-2004 20:29:44] SERVICE ALERT: students;Total
Processes;CRITICAL;HARD;3;CRITICAL - 424 processes running
[05-10-2004 20:24:43] SERVICE ALERT: students;Total
Processes;WARNING;HARD;3;WARNING - 406 processes running
[05-10-2004 20:09:43] SERVICE ALERT: students;Total
Processes;CRITICAL;HARD;3;CRITICAL - 432 processes running

-- 
Adam Forsyth
Senior Systems Administrator
Luther College


-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list