ocsp slows nagios a great deal

Daniel Reynolds danny at deakin.edu.au
Tue Aug 15 00:47:37 CEST 2006


Hello Loren,



My Nagios setup currently in development is running just under 18000 service checks.  I have 5 servers all running active checks with four of them having their service checks picked up by one 'central' server.



Attached is a PDF screen shot of the central server health check cgi.



Yes I did find the same problem as you,  I can only assume that Nagios pauses and waits for each oscp command to complete before running other checks.  Since each nsca-send command

1) Starts,
2) Does a three way TCP hand shake, possibility with DNS lookup as well,
3) read_init_packet() reading 128 bytes of data from server,
4) Sends to the server 1 packet containing 1 service result,
5) Tears down the TCP connection,
6) Process exits.

This takes a while.



My setup was attempting to send 20 service results a second from some remote servers.  This solution was not working for me (latency for service checks went through the roof).



My solution is as follows:

1) Configure Nagios to write out service perfdata to a file for all service checks.  

2) Transport the data from 'remote' Nagios server to local Nagios server.  I am doing this by using a daemon written in perl that effectively does a state-full, across the network(ssh), tail of the service-perfdata files.

3) Put the service results into Nagios via the /var/log/nagios/cmd/nagios.cmd file.



It works for me since the remote Nagios's do not pause to execute a oscp command, only to write to a file, and as a side benefit, if the WAN connection is lost, when re-established, all of the historical service results from the remote server are fed back into the 'central' Nagios server.  I have been carefull in trying to make full use of buffering to reduce network packets/context switching.



Attached are tar balls, rpms and source rpms.  Code is under GPL.  If people are interested I look into a site on source forge.





Hope I was of some assistance.



Regards

Danny Reynolds
Service Operations Centre Coordinator.
Deakin University.


p.s.

My apology for the lack of man pages (they are on the way).  Any questions are welcome.




On Mon, Aug 14, 2006 at 09:57:27AM -0500, Marc Powell wrote:

>>> > Could you be more specific about how I might do said testing?
>>> > I guess I could start by compiling with "--enable-DEBUGALL".. would
>>> > that tell me what I need to know? Any other things?
>>    
>>
>> 
>> I'd personally start by adding time markers to your submit_check_result
>> script around important sections --
>  
>

I tried this but I need to get millisecond timings to be helpful (most
send_nsca events are triggered & return in the same second, it turns
out) and I can't get embedded perl interpreter to use Time::HiRes for
some reason (nor can I really see the error messages, since error
output from ocsp commands doesn't seem to be saved anywhere...  using
mini_epn it works fine at the command line).

Also, since I can't get timings when ocsp is turned off (since it's
the ocsp command that does the logging), I can't establish a baseline,
and the information would be of limited use anyway.

Even when /bin/true is my ocsp command, I lose 600 checks every 5
minutes... that definitely points to the ocsp command itself not being
totally to blame.

I think I'm going to have to switch to a different product. I've been
trying to get nagios working for months now, but it's just been one
battle after another, and finding support has been very difficult. I
hate to lose all the work i've already done & start from scratch, but
I don't see how to get past this problem, and running 12 monitoring
servers that cost $5000 a piece (plus maintenance) isn't really an
option for me.   :-( 

Loren



-------------- next part --------------
A non-text attachment was scrubbed...
Name: screen_shot.pdf
Type: application/pdf
Size: 86449 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-ServiceResult-0.01.tar.gz
Type: application/x-gzip
Size: 2663 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-File-Cmd-0.03-1.deakin.noarch.rpm
Type: application/octet-stream
Size: 4633 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-File-Cmd-0.03-1.deakin.src.rpm
Type: application/octet-stream
Size: 6689 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-File-Cmd-0.03.tar.gz
Type: application/x-gzip
Size: 4249 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0001.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-File-Pid-0.01-1.deakin.noarch.rpm
Type: application/octet-stream
Size: 3426 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-File-Pid-0.01-1.deakin.src.rpm
Type: application/octet-stream
Size: 4692 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0003.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-File-Pid-0.01.tar.gz
Type: application/x-gzip
Size: 2311 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-File-State-0.04-1.deakin.noarch.rpm
Type: application/octet-stream
Size: 4022 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-File-State-0.04-1.deakin.src.rpm
Type: application/octet-stream
Size: 5532 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0005.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-File-State-0.04.tar.gz
Type: application/x-gzip
Size: 3122 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-nagios_logwatch-0.06-1.deakin.src.rpm
Type: application/octet-stream
Size: 25626 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0006.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-nagios_logwatch-0.06.tar.gz
Type: application/x-gzip
Size: 22437 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-nagios_logwatch-client-0.06-1.deakin.noarch.rpm
Type: application/octet-stream
Size: 17654 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0007.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-nagios_logwatch-server-0.06-1.deakin.noarch.rpm
Type: application/octet-stream
Size: 24671 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0008.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-ServiceResult-0.01-1.deakin.noarch.rpm
Type: application/octet-stream
Size: 3672 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0009.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SOCC-ServiceResult-0.01-1.deakin.src.rpm
Type: application/octet-stream
Size: 5037 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060815/7e5241ba/attachment-0010.obj>
-------------- next part --------------
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list