Issues with Distributed Nagios

Jonah Horowitz JHorowitz at looksmart.net
Wed Nov 14 21:45:20 CET 2007


Hello all,

I'm running a distributed nagios installation with about 500 hosts and
4000 checks.  Almost all of my checks run at five minute intervals.

I'm trying to move from a centralized to a distributed situation because
my latency has gotten to about 75 seconds average on the central server.

I've built a new central server and two distributed nodes.  My
distributed nodes are having no trouble keeping up with the checks. The
latency is less than 10 seconds on both of them, but the latency on the
central server is really high, 350-650 seconds.  I'm seeing freshness
timeouts and errors.

The checks are getting submitted, but there seems to be a horrible
latency between the check result and when the central server gets it.

Any help would be appreciated.

Thanks,

Jonah




submit_service_check is as follows:

$USER1$/eventhandlers/submit_service_check $HOSTNAME$ '$SERVICEDESC$'
$SERVICESTATEID$ '$SERVICEOUTPUT$'

#!/bin/bash
###########################################################
# Arguments:
#  $1 = host_name (Short name of host that the service is
#       associated with)
#  $2 = svc_description (Description of the service)
#  $3 = state_string (A string representing the status of
#       the given service - "OK", "WARNING", "CRITICAL"
#       or "UNKNOWN")
#  $4 = plugin_output (A text string that should be used
#       as the plugin output for the service checks)
#
###########################################################
PRINTF="/usr/bin/printf"
CMD="/usr/pkg/nsca/bin/send_nsca"
CFG="/usr/pkg/nsca/etc/send_nsca.cfg"

HOST=$1
SRV=$2
RESULT=$3
OUTPUT=$4

# pipe the service check info into the send_nsca program, which
# in turn transmits the data to the nsca daemon on the central
# monitoring server

# /bin/printf "%s:%s:%s:%s\n" "$1" "$2" "$RESULT" "$4" | $CMD -H plato
-c $CFG -d :
#$PRINTF "%b" "$HOST:$SRV:$RESULT:$OUTPUT" | $CMD -H plato -c $CFG -d :

echo "$HOST:$SRV:$RESULT:$OUTPUT -- FYI: this is a distributed passive
SERVICE alert fed from `hostname`" | /usr/pkg/nsca/bin/send_nsca -H
central -p 5667 -c /usr/pkg/nsca/etc/send_nsca.cfg -d :

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list