Problems with distributed monitoring

Trisha Hoang trisha at rockyou.com
Fri May 14 19:28:32 CEST 2010


Hi Sergio,
Some of the directives I found helpful for our MASTER server are listed
below.

Since status.dat and nagios.cmd are disk bound, put them on ramdisk will be
faster.
status_file=/mnt/ramdisk/status.dat
command_file=/mnt/ramdisk/nagios.cmd

I don't think aggressive_host_checking is needed as nagios checks for host
when a service is in error anyway.
use_aggressive_host_checking=0
check_host_freshness=0

Service freshness is important as the MASTER tends to process passive checks
much slower so the services may go stale. However, since our checks are 5
min interval, having the MASTER wait for the next round of check is fine.
check_service_freshness=1
service_freshness_check_interval=420

We use nagios-3.2.1 and I think these directives are still experimental but
they seem to help. You will see defunct nagios processes that come and go. I
think it's caused by child forked once instead of twice so one gets killed
(my theory), but again, it seems to be running ok.
use_large_installation_tweaks=0
child_processes_fork_twice=0

Our MASTER receives ~7000 passive checks from the SLAVE but it could only
process max ~5000 passive checks per 5 min. The latency is about <10 secs.
For the rest, the MASTER actively checks them. If you or someone knows a way
to improve passive check processing, that will be great.

Also, in our setup, we don't use NSCA. The slaves have
ocsp_command=send_service_check where this command inserts the checks into a
file that gets sent every 5 sec to the master. On the master, there's a
script that opens this file and inserts the lines directly into the
nagios.cmd pipe every 5 sec.

Trisha
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100514/2f1eddb4/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------

-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list