RFC: Downtime and flapping

Jochen Bern Jochen.Bern at LINworks.de
Fri Feb 4 11:30:46 CET 2011


On 02/03/2011 11:59 PM, Andreas Ericsson wrote:
> On 02/03/2011 07:53 PM, Ton Voon wrote:
>> From the code, I can see that Nagios does not record any soft
>> non-OK states in this state history. Any objections if I add "host
>> or service in downtime" to that exception?
> None at all. In fact, +1 on doing so. This way, downtime makes all
> effects of statechanges void and null

Umh, not quite, I'm afraid. It means that hosts/services will emerge
from downtime with the history they had when they entered downtime
way-back-when - which may well be the non-OK or FLAPPING which prompted
you to schedule urgent repairs in the first place.

It IIUC also means that during the downtime, the CGI-bins will keep
displaying the *historic* flapping state, along with the *current*
host/service state.

Downtime disables notifications anyway, and there already is logic to
trigger actions when downtime ends (*). IMHO, the proper way to provide
a clean slate after a downtime would be to flush (**) the entire history
at that point.

(*) Notification type "s" - BTW,
http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html#contact
lists services-"s" in the Definition Format but not in the Directive
Descriptions.

(**) Whether the bins should be reset to OK, PENDING,
last-before-downtime or the current post-downtime $*STATE$ (if one is
already available) is up for discussion ...

Regards,
								J. Bern
-- 
Jochen Bern, Systemingenieur --- LINworks GmbH <http://www.LINworks.de/>
Postfach 100121, 64201 Darmstadt | Robert-Koch-Str. 9, 64331 Weiterstadt
PGP (1024D/4096g) FP = D18B 41B1 16C0 11BA 7F8C DCF7 E1D5 FAF4 444E 1C27
Tel. +49 6151 9067-231, Zentr. -0, Fax -299 - Amtsg. Darmstadt HRB 85202
Unternehmenssitz Weiterstadt, Geschäftsführer Metin Dogan, Oliver Michel

------------------------------------------------------------------------------
The modern datacenter depends on network connectivity to access resources
and provide services. The best practices for maximizing a physical server's
connectivity to a physical network are well understood - see how these
rules translate into the virtual world? 
http://p.sf.net/sfu/oracle-sfdevnlfb




More information about the Developers mailing list