More on notifications and reboot monitoring

Carson Gaspar carson+nagiosusers at taltos.org
Thu Jan 6 23:09:40 CET 2005


[ Resending from correct From address ]

--On Thursday, January 06, 2005 2:51 PM +0100 Andreas Ericsson <ae at op5.se> 
wrote:

> Ehrm. The idea of scheduled downtime is to do this sort of thing. If you
> want to add a script submitting a 5 minute (or something) downtime
> whenever you run reboot, then by all means feel free. If you make it
> clean I'm sure lots of other users would be interested. I don't think
> it's a very good idea to keep that logic in the Nagios daemon though, as
> it can never possibly guess if a host has been shut down or crashed, so I
> don't quite see the point of this email. Care to clarify?

I'll try again (3rd time lucky? ;-) ).

We need:

- Alarms when machines reboot unexpectedly
- Alarms when machines fail to come back after a reboot
- No alarms during normal scheduled reboots

Scheduled downtime is great, except for one thing - if any alarms are 
received during scheduled downtime, no notifications go out. Ever. Even 
after downtime has ended. This is a result of the design decision to only 
see if notifications are required when receiving a new check result. As the 
only "pull" monitor in my environment is Ping, it's the only thing I can 
safely schedule downtime against (ignoring freshness checks for now). This 
is only really an issue when trying to get a "failed to reboot" alarm. I 
finally gave up, and just have the Ping service alarm if a reboot fails (as 
opposed to a more specific alarm).

If you re-read my previous message, the only logic on the central nagios 
server is some basic dependency logic to prevent false alarms - all the 
work is done on the client in an init script (which submits passive check 
results and schedules downtime via an in-house queueing agent to Nagios' 
named pipe). It does work, I was just asking for opinions about it (as it 
seems a bit complex for my tastes).

And yes, I fully understand freshness checks - they're wonderful for 
continuously monitored services, but don't really work for reboots (unless 
you have your agent constantly send "Reboot OK" status msgs while the 
machine is up), as they are hopefully rare events ;-)

-- 
Carson



-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list