Distributed checks

Andrew Meredith andrew at anvil.org
Tue Feb 4 16:12:31 CET 2003


Hi Folks,

Before I start I should say that I have searched the mail archives for
questions similar to the following, but didn't come up with anything.

I then unpacked the source code and went through the logic for accepting
checks and for initiating checks, with particular reference to the
control variables.

I therefore hope you will forgive if these are FAQs.

Platform & version
------------------

Platform: Red Hat 7.3 on Athlon
Nagios:   1.0

enable_active_checks
--------------------

I tried following the documented procedure in html/docs/distributed.html
where it recommends using a variable of this name. I did so and it
seemed to have absolutely no effect. It turns out that the source code
does not include this variable.

There are a number of different variables of the type "stop the thing
doing active checks for the purpose of distributed testing" mentioned
variously in the source code and/or the documentation, but none of them
quite go all the way to the following setup.

The distributed architecture I am after
---------------------------------------

  Server

The server should be configured to know about a remote host and
optionally be able to check that the host itself is actually up, and to
externally check some/all of the network services. It should however be
possible to prevent it from trying to check a remote machine where the
network topology does not permit this. This would give a false negative.

If the remote machine stops submitting results to the server, the server
should follow the usual timeouts as if it was performing the checks
itself.

  Client

The client has its own configuration .. hopefully differing from the
server version of its config by a very few parameters. It can stack up
results in the event of a network outage between the client and the
server .. sending them through when things come back. This way the stats
for the box aren't affected by network outages that don't necessarily
affect its actually uptime and would be recorded separately anyway.

How have I done
---------------

I am quite close, but for two points .. one cosmetic and one
show-stopper.

Cosmetic: The only way I have found of shutting off active checks from
the server is to set the checks_enabled flag in the host description.
This is taken as meaning that the services are to be tested by a remote
host by some parts of the UI. However other parts see this as the
service being disabled and display it as such.

Show-stopper: With the above configuration the server quite happily
receives and displays service checks from the distributed remote and
will show state changes and everything in the case of a service failure.
If however something stops the delivery of service checks, the server
simply keeps displaying the last state. If the checks had been done from
the server and the network dropped, the failure would be flagged.

Is there any way of making the active checks stop without stopping the
server from changing state to at least "Unknown" if not "Critical".

If you got this far .. thanks for reading :)

Andrew Meredith BEng CEng MBCS MIEE
_______________________________________________________________
                  The Anvil Organisation Ltd.
                          Director
Tel: +44 (0) 1249 444240 | Email:              andrew at anvil.org
Fax: +44 (0) 1249 460560 | Web:           http://www.anvil.org/
Mob: +44 (0) 7802 389007 | WAPMail:  andrew.meredith at orange.net
_______________________________________________________________
   The box says Win95 or better .. Must run under Linux then!



-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com




More information about the Users mailing list