Questions on migrating to Distributed Environment on Nag 1.1

Hochberg, Keith Keith.Hochberg at mtvi.com
Mon Oct 13 18:37:02 CEST 2003


> 1.  Where do I do check_host_alive? Central or Distributed?

The central server executes the host check command defined in hosts.cfg
after it receives data passively that a service has gone into a HARD
state.  Passive host checks should be possible with Nagios 2.0 (per
Ethan).

For 2 and 3 I have never seen this happen on my build... the only thing
I can think of is to make sure your freshness_threshold is set to an
interval that the active server should be sending data for that service.

> 4.  For members of the list who are managing large Nagios
implementations "thousands of services on hundreds of hosts", how are
your configs managed?  I'm currently using Nagmin, which seems ok, but
was just curious what big sites might be doing.

I manage my configs by hand... if you ask me it's the only way to do it.


-Keith

-----Original Message-----
From: James Harrison [mailto:james.harrison at amcg.com] 
Sent: Monday, October 13, 2003 12:09 PM
To: nagios-users at lists.sourceforge.net
Cc: James Harrison
Subject: [Nagios-users] Questions on migrating to Distributed
Environment on Nag 1.1


List,

I have recently outgrown my central server only configuration and have
begun migrating over to a central poller polling some sites and a
distributed poller taking up the slack.

I have successully setup NSCA and have approximately 20 sites being
polled (simple PING using check_ping service) via the distributed poller
with results being sent to the central box.  Everything seems to be
working properly with just a few questions created along the way.

I'm 95% of the way there!

1.  Where do I do check_host_alive? Central or Distributed?

In theory, if I want to do all my polling(host alive and service checks)
from the distributed box then can/how do I do check_host_alive plus
service polling from the distribute box.  Or maybe the better question
is how/can I setup the central box's check_host_alive(or equivalent
command[check_dummy, etc]) so that it is "ignored" and is completed via
a "passive" check?  I am using the "check period set to none" method for
setting up my passive services on the central server.  I cannot find or
I'm just missing a similar option for host information.

Or, as I suspect, but can't verify through the docs, check_host_alive is
an active process that must always be done from the central server and
additional service checks for that host can/are performed via the
distributed box.

2.  Out of bounds (after a stale detect)

On my passive checks that I'm getting from my distributed box I'm
intermittently receiving "Warning: Return code of 127 for check of
service" errors in my event log.  This appears to be occurring after a
stale detect when the central server says "I'm forcing an immediate
check of this service"  These errors are never creating a "HARD" down
state and therefore no notifications are being sent.  Is this cause for
concern?  I obviously have Freshness Checking turned on for the passive
checks as recommended.

3.  Why am I getting stale detects as mentioned in question 2?

4.  For members of the list who are managing large Nagios
implementations "thousands of services on hundreds of hosts", how are
your configs managed?  I'm currently using Nagmin, which seems ok, but
was just curious what big sites might be doing.

Thanks
-- 
James Harrison RHCE, CCNA


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
SourceForge.net hosts over 70,000 Open Source Projects.
See the people who have HELPED US provide better services:
Click here: http://sourceforge.net/supporters.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list