NSCA and Latency

Maxwell,Brady maxwellb at oclc.org
Thu Oct 23 16:41:51 CEST 2008


My Environment:
3 x Dell 2950 Dual DualCore and 8 GB of RAM
One system runs checks against our Linux servers
One runs checks against our Windows servers
We are running SLES10 update 3

Both systems use nsca to send their check results to a third server that
displays the service checks for our operators.

All three systems are on the same vlan but separate cisco switchs.

I am running nsca in daemon mode on the central server with this command

/usr/local/nagios/bin/nsca -c /usr/local/nagios/etc/nsca.cfg -daemon

Nsca.cfg is as follows:
pid_file=/var/run/nsca.pidserver_port=5667#server_address=192.168.1.1nsc
a_user=nagiosnsca_group=nagios#nsca_chroot=/var/run/nagios/rwdebug=1comm
and_file=/usr/local/nagios/var/rw/nagios.cmdalternate_dump_file=/usr/loc
al/nagios/var/rw/nsca.dumpaggregate_writes=1append_to_file=1max_packet_a
ge=300password=xxxxxxxxxxdecryption_method=14


I just set the aggregate and append options to try and fix the problem
they were not set before either way the results are the same.

Ok so on the 2 servers doing the checks.... Everything runs fine even
with the OCSP running my send_service_check_results script. My script is
pretty much straight out of the book.

#!/bin/sh# Arguments:# $1 = Hostname of the host (using the $HOSTNAME$
macro)# $2 = Service description of the service (using the $SERVICEDESC$
macro)# $3 = Service status id of the service (using the
$SERVICESTATUSID$ macro)# $4 = Output of the Service Check (using the
$SERVICEOUTPUT$ macro)/bin/echo "$1","$2","$3","N3 - $4" |
/usr/local/nagios/libexec/send_nsca -H 10.10.129.37 -c
/usr/local/nagios/etc/send_nsca.cfg -d ","
Like I said everything is fine on the 2 servers even with OCSP on.
Between the 2 servers we are running about 10k service checks, latency
is very low just a few seconds. However if I turn on the NSCA Deamon on
the central server my latency creeps up to about 1500+ seconds with in
an hour and just gets worse from there on both remotes. The checks that
should run every 5 minutes on the 2 remote servers end up running every
few hours or less. The central server is doing 0 active checks.

I set debug mode and that proved to provide very little insight into the
problem.

CPU and Mem stats are both very low on all three server. The same thing
can be said for the network, network utilization is less than 2% and
there are no errors on the interfaces. Overall hardware utilization is
10% or less on these three systems. 

So my question is has anyone had this kind of problem with NSCA? What am
I missing? Should I be batching my service checks on the remote servers?
Should I be using xinetd for NSCA instead of deamon mode?

Thanks
Brady
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20081023/1655ba4e/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list