scalability of active (NRPE) vs. passive (NSCA) checks

Carroll, Jim P [Contractor] jcarro10 at sprintspectrum.com
Fri Nov 29 20:10:54 CET 2002


I recently (on Nov.27) added several service definitions to 26 hosts (via
NRPE), thus bumping up my total service count from 800+ to 1100+.  As a
result, my poor Nagios server slowly but surely succumbed to the electronic
tar pits.  I had to force a reboot, login, then stop Nagios and pick over
the carcass with sar.  (Before shutting down Nagios, I ran "iostat -x 5" and
noticed some extremely heavy disk I/O which I think I can work around (I
need to split a mirror).)  I'm pretty sure I need more swap space (1GB of
RAM vs. 500MB of swap), so I'll likely do a rebuild early next week to take
care of that.

Examining the many e-mails that Nagios had been kicking out these past
couple days, essentially all of the alerts were timeouts.  Clearly the
Nagios host is running out of steam.

And I've yet to add these same service checks to yet another 50+ hosts.

So now I've come full circle, pondering the same question I pondered when I
was first setting up Nagios:  NRPE or NSCA?  I'm coming to the (possibly
erroneous) conclusion that NRPE doesn't scale well.  But this snippet from
the Nagios docs seems to discourage using NSCA, except for async events:

"Unless you're implementing a distributed monitoring environment with the
central server accepting only passive service checks (and not performing any
active checks), you'll probably be using both types of checks in your setup.
As mentioned before, active checks are more suited for services that lend
themselves to periodic checks (availability of an FTP or web server, etc),
whereas passive checks are better off at handling asynchronous events that
occur at variable intervals (security alerts, etc.)."

I'd like to get some feedback from others on this list, to see whether
you've run into similar performance problems, or whether you're happily
using NSCA for regular system checks (procs, memory, swap, users, disk
space, etc) and having a cron job punt them over to Nagios once every 5 mins
from a cronjob (or similar arrangement).

Thanks in advance,

jc


-------------------------------------------------------
This SF.net email is sponsored by: Get the new Palm Tungsten T 
handheld. Power & Color in a compact size! 
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en




More information about the Users mailing list