decreasing sensitivity of host (down) checks?

Frost, Mark {PBG} mark.frost1 at pepsi.com
Tue May 15 21:31:30 CEST 2007


I had a feeling someone was going to say "scheduled outage" :-).

We already have scheduled outages for these boxes for time when we can
schedule outages.  We're trying to cover the other 2 cases -- 1) where a
host is rebooted to resolve a problem and 2) where calculating the day
of the outage is too difficult.  For #2, I mean, it would be something
like "the last Sunday of a quarter at 2am".  Nagios gives me to no
ability to calculate a date.  It's either a specific day of the week, or
it isn't.

Beyond that, we're generally talking about too many hosts to have these
teams go into the interface and say "it's going to be down this Friday".
That coupled with the fact that the team who does the reboots is another
team entirely and doesn't use our Nagios system.  Having Nagios just
ignore host downtime (for certain hosts) that are less than a certain
duration was considered a very attractive option.

We have had discussions about knowing about a box going down and not
knowing and the ones they don't want to know about are a subset of the
whole (hundreds of boxes).  And of course, they're Windows boxes, not
something more stable and reliable :-)

I hadn't though of the escalations options.  I'll check that out.

Thanks, Jim.

Mark

-----Original Message-----
From: nagios-users-bounces at lists.sourceforge.net
[mailto:nagios-users-bounces at lists.sourceforge.net] On Behalf Of Jim
Avery
Sent: Tuesday, May 15, 2007 2:57 PM
To: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] decreasing sensitivity of host (down)
checks?

On 15/05/07, Frost, Mark {PBG} <mark.frost1 at pepsi.com> wrote:
> > Recently, some of the teams who get the alerts have asked if they
> > could not get host UP/DOWN alerts if the boxes are down for less
than
> > 10 minutes.  (These are windows boxes being rebooted).  They've
> > indicated that they don't care about a box being rebooted, but they
> > would care if the box went down and stayed down for longer than 10
> > minutes.

I would say that if the host is being rebooted, that should be a
scheduled outage.   You should submit a scheduled outage in Nagios for
the period around the time the reboot is going to happen, then during
this time the notifications will not be sent.  There are plenty of
scripts available on nagiosexchange.org which allow you easily to
automate this in cron for outages which are scheduled for the same
time each day/week/whatever.

I'm not 100% familiar with the logic of when host checks are done, but
like you I've tried increasing the no. of retries for host checks to
do the same thing and it hasn't worked for the same reason.  If you
really don't want to use scheduled outages, one way (which might not
be the best way but would work) would be to use escalation.  You could
have the initial notification go to /dev/null and only send the
escalation some minutes later to the real email or pager address.

I take the view that if a server is being rebooted and it isn't
scheduled I want to know about it.  If someone runs a server in such a
way that it is being rebooted willy-nilly, they don't deserve to have
out of hours support for it!   In that case I don't configure it to
forward any notifications to our on-call team.

------------------------------------------------------------------------
-
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list