Lots of hosts, only a couple of services?

Andreas Ericsson ae at op5.se
Tue Aug 24 20:08:36 CEST 2004


Demetri Mouratis wrote:
> On Tue, 24 Aug 2004, Jason Byrns wrote:
> 
> 
>>Here's part of the problem: If any device misses a single service check,
>>a host check is immediately triggered.  But sometimes a device can miss
>>a ping even though there is no problem, just a burst of network traffic.
>>
>>So here's my question: how can I improve our Nagios setup?
>>
>>Here are my goals:
>>1) Prevent false positives with max_check_attempts (set to 5)
>>2) Get Nagios to respect max_check_attempts
>>3) Have the Status Map correctly show situation if any devices are down.
>>
>>Could I...
>>1) Check telnet instead of just pinging these devices?  (And change the
>>host checks back to the regular host_check_alive?)
>>2) Not check services at all, unless necessary, and only do host checks?
>>  (Nagios throws lots of warnings if you do this, and I suppose I'd
>>rather avoid that)

Nagios 2.0 supports regularly scheduled host checks, which Nagios 1.x 
doesn't. If you don't have any services for a host, then Nagios 1.x will 
NEVER perform the host check.

>>3) ...?  (Profit?)
>>
> 
> 
> Jason,
> 
> Sounds like the source of your problem is that the service check and host
> check are both ultimately using PING.  I think you could remedy your
> problem with your suggestion 1 above, check telnet instead of, or better
> yet in addition to, PING.  A successfull check of the telnet service
> should cause Nagios to bypass the forced host check.
> 

... but a failed check of the PING service will still trigger the host 
check, which will fail since it's also PING based. Solution; Cut the 
PING service entirely, or make ICMP a privileged protocol on the network 
(NOT recommended, altough sometimes the right thing to do if you're sure 
noone will kill the network with a sudden ping-storm).

> One other suggestion is to modify the paramaters of the check_ping and
> check_host_alive invocations to send and expect say 20 packets.  You might
> also try increasing the warning and critical packet loss paramters.
> These changes would allow you to weather the storm of the network traffic
> burst without sending out premature false alarms.
> 

Or roll a plugin of your own to do the host-check. I'm thinking the yet 
non-existant check_conn_refused should do the trick if it measures the 
time it takes to achieve a connection refused to any arbitrary port.

> Ohh, and make sure to bitch at your network guys for dropping those
> packets in the first place ;-)
> 
> Hope that helps.
> ---------------------------------------------------------------------
> Demetri Mouratis
> dmourati at linfactory.com
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list