ocsp slows nagios a great deal

Fred f1216 at yahoo.com
Wed Aug 23 17:29:26 CEST 2006


Not sure if this was discussed (I didn't find later threads), but I would suggest
that you need to batch your send_nsca requests.  Realize that *every*
transaction that nagios does invokes OCSP if it is defined.   This means
there is a fork, exec, and then whatever that app does.  If you have a perl
ocsp script for example, perl has to compile that script, execute your
code, and most likely then fork/exec send_nsca.

Send_nsca has the ability to accept batch input.  I streamline my ocsp
script so that the data is batched up in a file that at some point later
in time will be sent using send_ncsa.  Given that you have a good
number of checks, nagios is making the ocsp call very frequently.  You
can use that to your advantage.

Each time you run your script:

1) stat the queue/batch file (if it exists)
2) flock the batch file (if it exists)
3) If it is older then an acceptable amount of time (make this
    a configurable parameter), set a flag to remember you will
    be pushing the data on this iteration.
4) If the file is larger then an acceptable size, set your
    flag again.

5) write your ocsp args to the end of your batch queue file.
6) if the flag is set, run send_nsca and pipe your batch queue file
    into it.
7) truncate the file to zero length
8) unlock it

You will dramatically cut down on the send_nsca fork/exec's and
you will also cut down on the network traffic and system noise
that you create as a result of making so many connections.

Go back over your code and streamline it.  An alternate implementation
could be to start a perl demon that does 1-8 that reads from a FIFO
and simply make your OCSP routine an "echo $@ >>fifo"
You could also then have the perl program wake up more regularly and
flush the queue rather then having to rely on the next OCSP request
to come through (you could also use a cron job or a nagios plug-in
to periodically flush the queue by making the ocsp command both
a plug-in and ocsp compatible, simply call the OCSP command with
a zero timeout to cause the flush and allow null args which would
skip adding them to the queue when called as a plug-in)

-FredC

 



----- Original Message ----
From: loren jan wilson <loren at uchicago.edu>
To: nagios-users at lists.sourceforge.net
Sent: Friday, August 11, 2006 10:38:31 AM
Subject: [Nagios-users] ocsp slows nagios a great deal

dear nagios users,

I'm in the process of trying to set up a distributed nagios
environment monitoring about 9,000 services on 2,500 hosts.
i'm using Sunfire V210 servers running Solaris 10.

i've found that the distributed servers which monitor the active
services can run about 1700 checks every 5 minutes if ocsp isn't
enabled, but once I enable ocsp, the number of active checks I can do
goes WAY down. here's a breakdown:

- ocsp disabled: 1700 checks / 5 min.

- ocsp command set to /bin/true: 1200 checks / 5 min.

- ocsp command set to a perl program that forks, then pipes output to
  send_nsca: 800 checks / 5 min. 

- ocsp command set to a shell program that pipes output to
  send_nsca: 500 checks / 5 min.

What's the deal? I've followed the instructions in the "performance
tuning" place in the manual, but nothing seems to help much, and I
don't know what else to check. Resources on the machines are not being
fully utilized.... there's about 30% free cpu at any given time, and
plenty of RAM (only 500 MB used of 2 GB). Any help would be much
appreciated!

Solaris 10 is fully patched with recommended updates from last week.
I'm running Nagios 2.5 and it's configured like this:

        --with-perlcache \
        --enable-embedded-perl \
        --enable-nanosleep \
        --with-gd-inc=$GD_INC_PATH \
        --with-gd-lib=$GD_LIB_PATH



Thanks, 
Loren


-- 
loren jan wilson
network engineering, uchicago.edu
1155 rm. 327 ; 773/702-8189

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list