decreasing sensitivity of host (down) checks?

Frost, Mark {PBG} mark.frost1 at pepsi.com
Tue May 15 19:10:08 CEST 2007


> I feel like this is a dumb question, but I've got to ask it anyway :-)
> 
> Our host checks are done as needed by Nagios which I guess is the
> common way which doesn't hit us hard performance-wise.  I use
> check_fping.
> 
> Recently, some of the teams who get the alerts have asked if they
> could not get host UP/DOWN alerts if the boxes are down for less than
> 10 minutes.  (These are windows boxes being rebooted).  They've
> indicated that they don't care about a box being rebooted, but they
> would care if the box went down and stayed down for longer than 10
> minutes.
> 
> We already do this kind of thing (setting a minumum threshold at which
> we want to be bothered) for service checks which is most of what we
> do, but this seems more problematic with host checks.
> 
> For my host checks I have the following defined:
> 
>         notifications_enabled 1
>         event_handler_enabled 1
>         flap_detection_enabled 1
>         process_perf_data 1
>         retain_status_information 1
>         retain_nonstatus_information 1
>         check_command           check-host-alive
>         check_interval 0
>         check_freshness 0
>         max_check_attempts 10
>         notification_interval 0
>         notification_period 24x7
>         notification_options d,u,r
> 
> So with the max_check_attempts set at 10 I can see that Nagios will
> try 10 successive pings of this host before it wants to send an alert.
> Looking at the history for downed hosts, it looks like it reruns this
> check once per second 10 times.  The check_interval being set at 0
> causes the checks to be performed only on demand.
> 
> I could bump up the max_check_attempts to something like 600 (10
> minutes of successive 1-second pings), but I imagine that's not too
> good from a performance perspective either.
> 
> I'm not really sure what I could do here to leave the checks as "on
> demand", but yet not consider sending out an alert unless its been
> down for more than 10 minutes.  Am I right about killing my
> performance if I crank up the max_check_attempts value here to 600?
> 
> Thanks
> 
> Mark

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list