Bug? Flapping detection during startup causes comment IDs to change

Kamens, Jonathan jkamens at Advent.COM
Mon Feb 1 20:15:29 CET 2010


If a host or service is found to be flapping when state is read in during startup in xrddefault.c, then a new comment about the flapping is added to the system.

Since this happens before all of retention.dat has been read in, the comment ID assigned to this new comment could conflict with a comment ID of an existing comment later in retention.dat.

Then, when that later comment is read in, it gets assigned a new comment ID. (Or, if you're using my recently submitted startup time speed-up patch, it gets assigned a duplicate comment ID.  D'oh!)

A comment ID changing when Nagios restarts can cause problems if comment IDs are being stored and used in a persistent fashion anywhere outside of the Nagios process itself.  One way I know that happens is in the CGI.  If I have a page with a comment on it displayed in my browser, and then the Nagios server restarts, and then I try to remove the comment and it has been renumbered, I'll end up removing the wrong comment.  I don't know how else, if at all, comment IDs are used outside of the Nagios process, so I don't know how much larger than that the scope of the problem is.

I have been trying, unsuccessfully, to think of a good fix for this problem.  I've thought of a fix that I think will work, but it makes me uncomfortable in a "boy, that's gross, that can't possibly be the correct fix" sort of way.  The fix I have in mind depends on the facts that (a) only persistent comments will be read from retention.dat on startup, and (b) flapping comments are non-persistent.  So what I'm thinking is that we should allow duplicate comment IDs to be created when adding flapping comments (as my patch does), and then when we're done reading retention.dat, scan through the list of comments (after sorting it), find duplicates, and when a duplicate is found, renumber the one that isn't marked persistent (since when there are duplicates, exactly one of them will always be non-persistent).  Like I said, that's really gross, but I think it will do the right thing.

Thoughts?

Jonathan Kamens
Operations Manager
[cid:image001.jpg at 01CAA346.063475E0]<http://www.advent.com/solutions/by-product/tamale-rms>
201 South Street, Suite 300, Boston, MA  02111
Phone: +1 617 261 0264 ext. 133 | Mobile : +1 617 417 8989 | Fax: + 1 617 812 0330
jkamens at advent.com<mailto:jkamens at advent.com> | www.advent.com<http://www.advent.com/>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20100201/897d5074/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 6255 bytes
Desc: image001.jpg
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20100201/897d5074/attachment.jpg>
-------------- next part --------------
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list