Possible patch to cure CGI's not finding data for objects in status.dat

Cary Petterborg PetterborgCa at ldschurch.org
Fri Aug 7 18:00:41 CEST 2009


Putting the web server on the Nagios server box would not work for us. Our web server is currently overloaded with 4 CPUs. The CPU usage is typically over 80% (average) with bursts of 100% usage occuring multiple times a minute and lasting up to 20 seconds. We are using DNX with 3 client servers (each with 4 CPUs) and the main Nagios server (with 4 CPUs) and a DB server (with 4 CPUs). There are actually times where we would probably use 100% of 16 CPUs on the webserver if it has 16 CPUs because we have more than 15 status.cgi and extinfo.cgi processes running at once using 30MB status.dat and objects.cache files. What would help us most is to go to 3.x with a DB (like Merlin or other similar), but until we can properly migrate, we are stuck with 2.7.

We haven't tried CIFS, so that is something I guess we should look at as well.

We are looking at possibly using rsync to keep the files up to date from the Nagios server to the web server (using an in memory tmpfs on the webserver, which might lower our CPU usage). If we get rid of the problem with last_update value in status.dat, then the rsync would happen very quickly because the percentage of changes to the file will be pretty minimal from one rsync to the next. We were looking at the fsync() issue to make sure the file would be complete before we rsync, otherwise the rsync would just rsync incomplete data. We weren't looking at NFS being the cause of the lack-of-data problem, but I guess it should be looked at now.

Thanks for the suggestions.

Cary

________________________________________
From: Andreas Ericsson [ae at op5.se]
Sent: Friday, August 07, 2009 2:30 AM
To: Nagios Developers List
Subject: Re: [Nagios-devel] Possible patch to cure CGI's not finding data for objects in status.dat

Cary Petterborg wrote:
> In response to your request for details of our system: We are running
> SuSE 9 writing to a Rieser FS (with a separate web server reading the
> status.dat, etc. from an NFS mount off the main Nagios server). Our
> status.dat file is 37MB, and objects.cache is 32MB. If you need more
> details than this, please let me know what you need.
>

I blame NFS. Don't use it for sync()-sensitive data, as caching happens
on multiple levels. The patch hurts the normal case (webserver on same
system as Nagios) though, so I'd prefer if it wasn't applied.

>
> I may be wrong in this next information, but I did homework on it
> before proceeding to try to implment the fix on our system, and I'm
> taking the information from what I found. The fsync() call is the
> more important function call in the fix. fclose() almost always
> guarantees fflush(), but it doesn't guarantee that it will be written
> to the disk immediately, especially if the program doesn't exit.

It doesn't have to be written to disk. After the fclose() the kernel
will cache the data so the next reader will still see the full file
contents no matter if it's actually committed to disk or not.
fsync() and fflush() are primarily meant to make sure data stays
intact across power outages.

NFS breaks this sometimes. CIFS is a better option, I think.
What happens if you use a webserver on the same host?
What happens if you use CIFS instead of NFS?

--
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


 NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july




More information about the Developers mailing list