freshness check on passive service fails

Antoine Reid areid at logient.com
Fri May 28 21:25:10 CEST 2004


--On Friday, May 28, 2004 9:46 AM +0200 jan gregor 
<pamela at rak.bb.euroweb.sk> wrote:

>> For what it's worth, I'm having similar issues myself too. My setup is a
>> bit different so I'll post it below.  What happens here is that I have
>> two Nagios processes running on two different hosts, in different
>> subnets. The  one
>> doing the actual checks is obsessing over services and sends the results
>> through nsca to the main nagios host.  The main host seems to decide my
>> services results aren't fresh enough, then runs the check_command, which
>> is a dummy script returning WARNING (originally CRITICAL but it
>> generated too many notifications..), then, a couple seconds or minutes
>> later, a new  passive
>> check comes in, which brings the service(s) back to OK, then a couple
>> minutes
>> later, it switches back to WARNING and so on..
>
> Why are you doing freshness checking on master host? Is that of any use?
> Please, correct me, if i'm wrong, but freshness checking is mainly for
> active checking. Only idea when this is usable with passive is in
> passive+active checks, when one services are configured to accept
> passive check and doing active checks over some time (to check if we
> have not missed somthing). Again, maybe I overlooked something important,
> please correct me, if I'm terribly wrong.

Actually, the idea is that when active_checks are disabled, the 
check_command is never run as long as the passive checks come in frequently 
enough.  According to the docs (the part about distributed monitoring 
and/or freshness checking), IF the results are not fresh enough, then the 
check_command will be executed.  In a failover/redundancy situation, that 
would be ideal as you main machine does not usually perform the tests but 
will if the results are getting stale.

In my situation though, the main machine *cannot* access the services that 
the second host is monitoring.  What is configured instead, is a 
check_command that will always return an error (right now, I return WARNING 
but I would like it to be "CRITICAL") stating that the results are stale. 
This would indicate that the nagios process on the 2nd machine is no longer 
sending passive checks OR that the checks somehow don't make it through to 
the main machine.  In any case, I would get a notification and would start 
investigating.

This is exactly what I am trying to achieve.  Now, my problem is the 
following:  the second nagios process is doing active checks, the 
service(s) checked never or rarely go down (eg: fping on an otherwise 
working machine).  I can see on the MAIN host that the passive checks are 
being received AND processed by nagios yet it decides for some reason that 
the results are not fresh and run the check_command defined (which returns 
WARNING).

Net result is, according to the second machine, my services are up 100% of 
the time.  According to the MAIN machine, those services go OK - WARNING - 
OK - WARNING - OK - WARNING every couple of minutes..

Would anyone know which timeout or setting to tweak so that it HAS to wait 
for much much longer without having received the passive checks before it 
actually decides to take matter in its own hands and run the check_command 
defined?  (Please see my previous post to see my configuration details, 
services definitions, etc).

> Best regards
>
> Jan Gregor


thank you!
Antoine

--
Antoine Reid
Administrateur Système - System Administrator

          __________________________________________________

Logient Inc.
 Solutions de logiciels Internet - Internet Software Solutions
 417 St-Pierre, Suite #700
 Montréal (Qc) Canada H2Y 2M4
 T. 514-282-4118 ext.32
 F. 514-288-0033
 www.logient.com

*AVIS DE CONFIDENTIALITÉ*
 L'information apparaissant dans ce message est légalement privilégiée et
confidentielle. Elle est destinée à l'usage exclusif de son destinataire
tel qu'identifié ci-dessus. Si ce document vous est parvenu par erreur,
soyez par la présente avisé que sa lecture, sa reproduction ou sa
distribution sont strictement interdites. Vous êtes en conséquence prié de
nous aviser immédiatement par téléphone au (514) 282-4118 ou par courriel.
Veuillez de plus détruire le message. Merci.

*CONFIDENTIALITY NOTE*
 This message along with any enclosed documents are confidential and are
legally privileged. They are intended only for the person(s) or
organization(s) named above and any other use or disclosure is strictly
forbidden. If this message is received by anyone else, please notify us at
once by telephone (514) 282-4118 or e-mail and destroy this message. Thank
you.



-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id149&alloc_id66&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list