Problem with high latencies after going distributed

Frost, Mark {PBG} mark.frost1 at pepsi.com
Wed Jan 23 03:13:43 CET 2008

Previous message: Problem with high latencies after going distributed
Next message: Problem with high latencies after going distributed
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

 

>-----Original Message-----
>From: Steve Shipway [mailto:s.shipway at auckland.ac.nz] 
>Sent: Tuesday, January 22, 2008 8:45 PM
>To: Frost, Mark {PBG}; Nagios Users
>Subject: RE: [Nagios-users] Problem with high latencies after 
>going distributed
>
>> As I'd mentioned in a previous message, I'm in the process of
>converting
>> from a centralized
>> Nagios 2.10 setup all running on a single host to a distributed setup
>> running on at least 3
>> hosts (3 to start anyway).  The centralized setup has 572 hosts and
>2900
>> services 99.9% of which are active checks.
>...
>> 	Active Service Latency:               0.000 / 7267.198 /
>> 4241.019 sec
>
>This isn't much help, but...
>
>We've just done exactly the same (Nagios 2.9), and we have a comparable
>size of system (actually a bit larger - 713 hosts, 5834 services).
>After going distributed, we too have this insanely high latency on the
>satellites.
>
>The only possible cause is the OCSP command slowing things 
>down somehow.
>This is using the supplied send_nsca call to send the status off to the
>central server...
>
>define command {
>    command_name    relay
>    command_line    $USER1$/submit_check_result "$HOSTNAME$"
>"$SERVICEDESC$" "$SERVICESTATEID$" "$SERVICEOUTPUT$"
>}
>
>So it should work.  I guess things would be better if it packaged the
>updates up into batches, although it cant do that normally.
>
>I think it might be better to make the OCSP command just dump 
>the status
>to a file, and then have a cronjob every 60 seconds that reads the file
>and sends the statuses off as a batch.  I will try this here, 
>when I get
>the chance.
>
>Steve


But if the submit_check_result is running slowly, that would only affect
the service
execution time wouldn't it?  My understanding of check latency is that
it's the difference
in time between when Nagios schedules a check to run versus the time
that the check
actually starts to execute.

But maybe I'm misunderstanding something here.  When it comes to working
with Nagios, I
tend to learn the most when I have the biggest problems :-).

Do you do the same thing I mentioned where you define all the checks on
both distributed
nodes, but disable checks on complimentary halves of those checks on
each node?

Thanks

Mark

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

Previous message: Problem with high latencies after going distributed
Next message: Problem with high latencies after going distributed
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Users mailing list