FW: Nagios 3.0.5 problem

jonathan.wheeler at stfc.ac.uk jonathan.wheeler at stfc.ac.uk
Mon Feb 1 10:31:15 CET 2010


From: Rick Mangus [mailto:rick.mangus+nagios at gmail.com] 
Sent: 29 January 2010 17:02

> Hello, all.
>
> Forgive me, I am new to the list, and have only begun working with nagios recently.  I have
> searched this list and googled furiously with little result, so must cease my lurking and
> present my problem to you.
>
> I will begin with the problem: Sometime after midnight every night, my nagios server starts
> to have trouble processing service checks.  I don't know the cause, and cannot find a
> solution.  I can describe the symptoms in detail and hope we can diagnose it.
>
> The web interface shows the last service check came in at 02:28:34 (EST).  I know that
> around 4:15 every morning, xinetd starts refusing connections to nsca due to high load
> (max_load is 18), and that eventually I will have 32000+ nsca connections using up all
> available PIDs leading to an inability to fork new processes, effectively killing the
> machine.  While all this happens, the nagios.log appears to periodically stall, making no
> new entries for 15 minutes at a time, and then flush 15000 in the space of a single
> second.  Also, it seems the checkresults directory is empty most of the time, but sometimes
> pops up to 2045 files (it's on a ramdisk with 2048 inodes) and not a single one gets
> deleted in a time period I have been patient enough to observe.
>
> The periods in which the nagios log is going nowhere are accompanied by nagios taking 100%
> of 2 CPUs.  One thread appears to poll() approximately every 25 usecs, and another is
> inscrutable, with mprotect() the only strace-visible syscall.  All the nsca processes have
> a blocking write() they are waiting on.  When the log is showing new entries, there are
> still no updates made to the services, and it seems that that is what is filling up
> checkresults.  I admit I have not checked to find the order of the log and checkresults
> processes, though I assumed they would operate in the opposite order of what this appears
> to show.
>
> I know this behavior has been ongoing for at least 1 month.  I have disabled all cron jobs
> that I feared might be interfering.  I will answer any and all questions to the best of my > ability, and hope someone here can shed some light on the situation.

1. Do you run ndoutils (to write results to a MySQL database) ?  If so, which version ?  I ask because I used to have a similar problem which I eventually tracked down to an interfering backup on the MySQL server that hosted the database.

2. Do you run other services on the Nagios server which might interfere with Nagios (e.g backups which start sometime after midnight) ?

3. Have you thought of upgrading to nagios 3.2.0 which is the latest stable version ?

Jonathan Wheeler 
e-Science Centre 
Rutherford Appleton Laboratory



-- 
Scanned by iCritical.

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT32268.txt
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100201/3632d670/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT32269.txt
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100201/3632d670/attachment-0001.txt>
-------------- next part --------------
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list