nagios with pgsql, stuck doing DELETEs, services not being tested

Youngblood, Gregory (SAIC) gyoungblood at saicmail.jsc.nasa.gov
Wed Aug 11 17:13:02 CEST 2004


Today marks the second or third time I have come in and found nagios "stuck"
while doing some kind of database operations.

>From what I can gather, there are two nagios connections to the database
active. One is running a SELECT and the other is running a DELETE. This
morning is the second time where the DELETE process had an active time of
over 3 hours. The SELECT process appears to come and go from time to time,
seeming to indicate that that select process disconnects. I am guessing it
is the CGI connecting to report things back to the web browser. The problem
is the DELETE process.

While this is going on, nagios appears to stop performing tests. Only
passive tests are reported right now. My configuration has no testing
performed between 6PM and 5AM. According to the services page last checked
column, none of the tests nagios is supposed to start performing at 5AM or
later have been performed. That means nagios, the monitoring system, has not
actively monitored anything since 6PM last night. The last checked column
confirms this, with the date/time of the active services tests ending right
at 6PM yesterday.

I stopped nagios, killed the errant database connections, and started nagios
again. Nagios did some database operations, then ran a vacuum command, and
then resumed with the massive SELECT/DELETEs.

The database is maintained (vacuum) daily during the maintenance window. The
other applications running against this database do not have these issues.

The first time this problem showed up, I thought it was being slowed down
because I had X running, plus lots of other programs, as well as had pgsql
statements logged, and figured the combination of swapping and logging
brought the system to a crawl. But, that does not appear to be the case. I
wish I would have kept that log so I could look at it closer and see exactly
what it is trying to do.

I am running this on a workstation. Yes, the system could have more memory
and faster hard drives, but until now that has not been an issue. I am also
sure that Postgres could be tuned better for this environment, though,
again, it hasn't been an issue until now.

Anyone have any ideas on this? Just what exactly is nagios doing? How mature
is the SQL layer (especially for pgsql) in nagios? Any suggestions on index
changes, or other tweaks to improve performance? How often does it do this
(whatever it is doing)? It doesn't seem like everyday, it seems more like
every other day, though it is possible I just missed it yesterday.

Thanks for any advice and assistance with this.
Greg


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list