Lots of hosts, only a couple of services?

Jason Byrns jason-sourceforge at microlnk.net
Wed Aug 25 15:48:13 CEST 2004


Thanks to everyone for their input, I certainly appreciate it.

To summarize, it sounds like the place to start is to change our service 
checks from ping to telnet checks.  Or possibly even SNMP or something. 
  I am also going to change check_host_alive settings, as it only sends 
one packet now.  (It was already at five seconds and 100% packet loss 
for critical status, which still seems fair.)

(Is there any advantage to checking SNMP instead of telnet?)

As someone else already mentioned, check_telnet is basically already 
defined as "check_tcp -H (host address) -p 23".

As for QoS, I'm not sure that's an option.  If one of our wireless 
access points is too busy to reply, wouldn't the AP itself need some 
kind of QoS features to help us?  I don't think they do, we've got a 
mixture of older and a few newer Cisco access points, and those are 
usually the ones that may miss a check or two here and there...

As for the max_check_attempts, and how it relates to host and service 
checks, I believe I found my final answer in the Nagios FAQ pages. 
However, after searching yesterday I couldn't find it again.  All I 
could find was this page, which mentions exceptions to the monitoring logic:
http://nagios.sourceforge.net/docs/1_0/statetypes.html

...but says it will not discuss those exceptions for now.

The information I found before basically stated what I said earlier: 
when a single service check fails, a host check is triggered.  And if a 
host check then also fails, it then chooses to skip the "soft" error 
states and go straight to a "hard" error state.  In other words, ignore 
the max_check_attempts and send out notifications right away.  And not 
as a bug, but since, y'know, your HOST is down!  Not just a service!

But tweaking our host checks is probably the answer to any single false 
positive warning.  Besides, I'm going to go ahead and slap Nagios onto 
one of my test servers, and put together a very simple setup to test 
again how Nagios handles service and host checks and max_check_attempts. 
  I'm virtually certain that we were being warned every time, after any 
host failed just a single check, even though my settings look like it 
should take five failed checks in a row.

Thanks again, everybody!

--
Jason Byrns
System Administrator, MicroLnk


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list