Possible patch to cure CGI's not finding data for objects in status.dat

Cary Petterborg PetterborgCa at ldschurch.org
Fri Aug 7 23:32:57 CEST 2009


I have one question and one observation in addition to what I already said....

Question:
In the case of the web server and nagios being on the same system, how much is an fsync() really going affect the performance of the system? Can someone "quantify" the impact of a single fsync? (I guess that is technically a second question.) If it were to do a sync() (where all the open files are checked to ensure that they are sync'd) instead of an fsync(), I would definitely agree - much bigger impact. But, the single fsync at the end of writing a file should not be of any real significance, except in the case where it means that the data is actually there where it should be if it *is* done (e.g. my case).



Observation:

I think that the retention.dat and other files that are important to retain across crashes would definitely benefit from the fsync. Though nagios would still work across such a failure, I would rather that data be preserved properly, especially when we have a retention.dat file that is 40MB and the rescheduling, etc. would be handled so much more effectively. So for what it is worth, I think it important that those files be fsync'd.



Thanks.

Cary

________________________________________
From: Andreas Ericsson [ae at op5.se]
Sent: Friday, August 07, 2009 2:30 AM
To: Nagios Developers List
Subject: Re: [Nagios-devel] Possible patch to cure CGI's not finding data for objects in status.dat

Cary Petterborg wrote:
> In response to your request for details of our system: We are running
> SuSE 9 writing to a Rieser FS (with a separate web server reading the
> status.dat, etc. from an NFS mount off the main Nagios server). Our
> status.dat file is 37MB, and objects.cache is 32MB. If you need more
> details than this, please let me know what you need.
>

I blame NFS. Don't use it for sync()-sensitive data, as caching happens
on multiple levels. The patch hurts the normal case (webserver on same
system as Nagios) though, so I'd prefer if it wasn't applied.

>
> I may be wrong in this next information, but I did homework on it
> before proceeding to try to implment the fix on our system, and I'm
> taking the information from what I found. The fsync() call is the
> more important function call in the fix. fclose() almost always
> guarantees fflush(), but it doesn't guarantee that it will be written
> to the disk immediately, especially if the program doesn't exit.

It doesn't have to be written to disk. After the fclose() the kernel
will cache the data so the next reader will still see the full file
contents no matter if it's actually committed to disk or not.
fsync() and fflush() are primarily meant to make sure data stays
intact across power outages.

NFS breaks this sometimes. CIFS is a better option, I think.
What happens if you use a webserver on the same host?
What happens if you use CIFS instead of NFS?

--
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


 NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20090807/6c55d08a/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list