More on notifications and reboot monitoring

Ethan Galstad nagios at nagios.org
Mon Jan 10 07:36:54 CET 2005


On 7 Jan 2005 at 14:37, Andreas Ericsson wrote:

> Carson Gaspar wrote:
> > [ Resending from correct From address ]
> > 
> > --On Thursday, January 06, 2005 2:51 PM +0100 Andreas Ericsson
> > <ae at op5.se> wrote:
> > 
> >> Ehrm. The idea of scheduled downtime is to do this sort of thing.
> >> If you want to add a script submitting a 5 minute (or something)
> >> downtime whenever you run reboot, then by all means feel free. If
> >> you make it clean I'm sure lots of other users would be interested.
> >> I don't think it's a very good idea to keep that logic in the
> >> Nagios daemon though, as it can never possibly guess if a host has
> >> been shut down or crashed, so I don't quite see the point of this
> >> email. Care to clarify?
> > 
> > 
> > I'll try again (3rd time lucky? ;-) ).
> > 
> > We need:
> > 
> > - Alarms when machines reboot unexpectedly
> > - Alarms when machines fail to come back after a reboot
> > - No alarms during normal scheduled reboots
> > 
> > Scheduled downtime is great, except for one thing - if any alarms
> > are received during scheduled downtime, no notifications go out.
> > Ever.
> 
> This is a bug or a missing feature. It will be fixed.
> 
> > Even 
> > after downtime has ended. This is a result of the design decision to
> > only see if notifications are required when receiving a new check
> > result.
> 
> Nagios handles the host when it comes out of scheduled downtime, so
> there's no real reason it shouldn't check what the status was prior to
> downtime and match against current upon a host exiting. It's a minor
> change, and shouldn't be too hard to add.

This isn't really a bug that you want to fix, as it will cause a lot 
of not-so-great side effects.  When you schedule downtime for a host, 
anything that happens during that time is fair game and is ignored 
for purposes of notification (that's why its in downtime).  When 
downtime ends, Nagios will not notify about a problem that happened 
during that downtime - that's what downtime was for.  If the problem 
continues after downtime (i.e. an active check returns a problem), 
then a notification can occur.  

> 
> > As the only "pull" monitor in my environment is Ping, it's the only
> > thing I can safely schedule downtime against (ignoring freshness
> > checks for now). This is only really an issue when trying to get a
> > "failed to reboot" alarm. I finally gave up, and just have the Ping
> > service alarm if a reboot fails (as opposed to a more specific
> > alarm).
> > 
> > If you re-read my previous message, the only logic on the central
> > nagios server is some basic dependency logic to prevent false alarms
> > - all the work is done on the client in an init script (which
> > submits passive check results and schedules downtime via an in-house
> > queueing agent to Nagios' named pipe). It does work, I was just
> > asking for opinions about it (as it seems a bit complex for my
> > tastes).
> > 
> 
> It was unclear to me that you were simply asking the opinion, which is
> why I responded the way I did. As for my opinion; Whatever works.
> 
> > And yes, I fully understand freshness checks - they're wonderful for
> > continuously monitored services, but don't really work for reboots
> > (unless you have your agent constantly send "Reboot OK" status msgs
> > while the machine is up), as they are hopefully rare events ;-)
> > 
> 
> Why not simply set a higher max_check_attempts or retry_interval for
> the ping services? That way you'll get soft down when the machine is
> actually down, but no alerts will go out.
> 

I would use active checks as Andreas suggested for checking host 
availability.  Passive-only checks might be troublesome to implement 
reliably.


Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list