passive host - distributed monitoring

Marco Supino Marco at praxell.com
Sat Apr 14 00:33:27 CEST 2007


Hi,

Thanks for the answer, but according to the docs, a host has
max_check_attemps :

max_check_attempts: This directive is used to define the number of times
that Nagios will retry the host check command if it returns any state
other than an OK state. Setting this value to 1 will cause Nagios to
generate an alert without retrying the host check again. Note: If you do
not want to check the status of the host, you must still set this to a
minimum value of 1. To bypass the host check, just leave the
check_command option blank.  

also, from my setup, i can see this in the logs

[13-04-2007 14:20:55] HOST ALERT: drpdb;DOWN;HARD;5;PING CRITICAL -
Packet loss = 100%
[13-04-2007 14:20:45] HOST ALERT: drpdb;DOWN;SOFT;4;PING CRITICAL -
Packet loss = 100%
[13-04-2007 14:20:35] HOST ALERT: drpdb;DOWN;SOFT;3;PING CRITICAL -
Packet loss = 100%
[13-04-2007 14:20:24] HOST ALERT: drpdb;DOWN;SOFT;2;PING CRITICAL -
Packet loss = 100%
[13-04-2007 14:20:14] HOST ALERT: drpdb;DOWN;SOFT;1;PING CRITICAL -
Packet loss = 100%

while in active checks a notification is sent only on max_check_attempts
(5 in this case) , in passive it immediately changes to HARD, and a
notification is sent,
I am getting false positives because of this, for checks over slow
links, the active check fails into SOFT once, and a notification is sent
immediately,
i know the host checks are run immediately one after the other , still,
doing some will reduce false alerts,

also , from the performance tuning doc :

Optimize host check commands. If you're checking host states using the
check_ping plugin you'll find that host checks will be performed much
faster if you break up the checks. Instead of specifying a max_attempts
value of 1 in the host definition and having the check_ping plugin send
10 ICMP packets to the host, it would be much faster to set the
max_attempts value to 10 and only send out 1 ICMP packet each time. This
is due to the fact that Nagios can often determine the status of a host
after executing the plug-in once, so you want to make the first check as
fast as possible. This method does have its pitfalls in some situations
(i.e. hosts that are slow to respond may be assumed to be down), but I
you'll see faster host checks if you use it. Another option would be to
use a faster plugin (i.e. check_fping) as the host_check_command instead
of check_ping.

except for performance, i will also reduce false alerts.

I am currently on Nagios 2.8 , and a solution I found for this is in the
OCHP command, submit throw NSCA only of the state is HARD, so the active
check will only submit host results for HARD OK and Non-OK , although
SOFT errors do appear sometimes,

I started working on allowing passive soft alerts, but i think the
service reaper is breaking my logic, i will continue working on it,

one more thing, while working in distributed setup, i noticed that a
service freshness checking is not don't if check_period is NONE, i think
this is problematic, because if i disable active checks, the services
will show as "Disabled" in the TAC, if i enable checks, it will run
checks even if results are fresh, but my remote hosts are unreachable
from the "master" nagios, I have a patch for it, if you think this is an
issue, or maybe add a config option to "force freshness even if out of
check_period"

Thanks.

Marco.


-----Original Message-----
From: nagios-devel-bounces at lists.sourceforge.net
[mailto:nagios-devel-bounces at lists.sourceforge.net] On Behalf Of Ethan
Galstad
Sent: Friday, April 13, 2007 22:57
To: Nagios Developers List
Subject: Re: [Nagios-devel] passive host - distributed monitoring

Marco Supino wrote:
> Hi,
> 
> I am facing a problem with distributed monitoring, it seems that when
a 
> passive host check is received, it immedialtly goes into HARD state, 
> without noticing the max_attempt directive, I tried fixing it in the 
> source, and make it go into SOFT before attempt = max_attempts, but
some 
> other function is messing my check_attempt , anyone can assist
> 
> Marco.
> 

Nagios 2 doesn't support a max_attempts directive for hosts and all 
passive host check results will immediately force the host into a HARD 
state.  This has changed a bit in Nagios 3 - hosts do have a 
max_attempts directive, but passive results still put the host into a 
HARD state.

Hope that helps.


Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org

------------------------------------------------------------------------
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share
your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDE
V
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list