FW: Problems with nagios

Wheeler, JF (Jonathan) J.F.Wheeler at rl.ac.uk
Fri May 2 12:45:33 CEST 2008


-----Original Message-----
Sent: 14 March 2008 10:52
To: nagios-users at lists.sourceforge.net

> In the past I have reported problems when our master server has failed
> with "Out of memory" problems caused by all server memory and swap
space
> being used up.  I have largely (but not completely) solved these by
> increasing the number of "Command" and "Check result" buffers.

Regular readers of the list will remember that reported this problem
which was affecting our nagios installation.  I finally solved the
problem about a month ago.  The key is that I am using the NDOUTILS
package to write the Nagios logs and configuration to a MySQL database.
On the MySQL server there is a cron job which uses a program called
mysqlhotcopy to create a snapshot of all of the MySQL databases.  It
does this by locking the tables whilst they are being copied.  This
causes the Nagios daemon on the master server to wait until the latest
write request to MySQL is completed.  Whilst the Nagios daemon is
waiting the NSCA daemon is busy writing results to the command file
which cannot be processed until the MySQL table locks are released.
However the number of commands is too many to be processed before the
command reaper starts again.  This uses up command buffer slots and
eventually the system runs out of memory and swap space, processes are
killed by the OOM hander (Linux OS) and possibly the system crashes
because all memory is used up.  The solution to the problem was to
exclude the nagios database from consideration by the mysqlhotcopy
backup (there is a configuration option to do this).  The lesson to
learn is that when there is a problem you need to consider what is
happening on all the computer systems involved in Nagios.

Jonathan Wheeler
e-Science Centre
Rutherford Appleton Laboratory

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list