Nagios ignores broken file descriptor?

Steven D. Morrey smorrey at ldschurch.org
Tue Nov 18 18:28:07 CET 2008


Hello Everyone,
Over the weekend my test implementation of Nagios stopped recording
results.
I checked with ps and it is still running along just fine but it appears
to have lost the ability to write out results.
After doing some checking I noticed that it stopped writing at 5pm last
Saturday.
On a hunch I checked my /var/messages and found this little beauty of an
error.

Nov 15 05:01:43 test-system kernel: SCSI error : <0 0 0 0> return code =
0x20008
Nov 15 05:01:43 test-system kernel: end_request: I/O error, dev sda,
sector 27780153
Nov 15 05:01:43 test-system kernel: buffer layer error at
fs/buffer.c:2996
Nov 15 05:01:43 test-system kernel: Call Trace:
Nov 15 05:01:43 test-system kernel:  [<c0160649>] drop_buffers
+0x149/0x1c0
Nov 15 05:01:43 test-system kernel:  [<c01606e4>] try_to_free_buffers
+0x24/0x70
Nov 15 05:01:43 test-system kernel:  [<f8cbe5cc>] reiserfs_releasepage
+0x5c/0xa0 [reiserfs]
Nov 15 05:01:43 test-system kernel:  [<f8cbe570>] reiserfs_releasepage
+0x0/0xa0 [reiserfs]
Nov 15 05:01:43 test-system kernel:  [<c0160765>] try_to_release_page
+0x35/0x50
Nov 15 05:01:43 test-system kernel:  [<f8cbe76c>]
reiserfs_invalidatepage+0x15c/0x1b0 [reiserfs]
Nov 15 05:01:43 test-system kernel:  [<c01494c4>] do_invalidatepage
+0x14/0x30
Nov 15 05:01:43 test-system kernel:  [<c01499ce>] truncate_complete_page
+0x9e/0xc0
Nov 15 05:01:43 test-system kernel:  [<c0149a93>] truncate_inode_pages
+0xa3/0x300
Nov 15 05:01:43 test-system kernel:  [<f8cc2b70>] reiserfs_delete_inode
+0x0/0xdc [reiserfs]
Nov 15 05:01:43 test-system kernel:  [<f8cc2b88>] reiserfs_delete_inode
+0x18/0xdc [reiserfs]
Nov 15 05:01:43 test-system kernel:  [<c017608d>] __d_move+0xed/0x1f0
Nov 15 05:01:43 test-system kernel:  [<c016bcb4>] vfs_rename_other
+0x74/0x110
Nov 15 05:01:43 test-system kernel:  [<f8cc2b70>] reiserfs_delete_inode
+0x0/0xdc [reiserfs]
Nov 15 05:01:43 test-system kernel:  [<c0177c14>] generic_delete_inode
+0x94/0x120
Nov 15 05:01:43 test-system kernel:  [<c0176de7>] iput+0x57/0x90
Nov 15 05:01:43 test-system kernel:  [<c0175537>] dput+0x17/0x180
Nov 15 05:01:43 test-system kernel:  [<c016eccb>] sys_rename+0x24b/0x2c0
Nov 15 05:01:43 test-system kernel:  [<c0107db9>] sysenter_past_esp
+0x52/0x79
Nov 15 05:01:43 test-system kernel: 
Nov 15 05:01:54 test-system kernel: REISERFS: abort (device dm-1):
Journal write error in flush_commit_list
Nov 15 05:01:54 test-system kernel: REISERFS: Aborting journal for
filesystem on dm-1

Now I think the root cause of the file system error was that an ntpd
daemon was running and set the system time somewhere in the past thereby
confusing the filesystem.  But that is neither here nor there.

I would normally expect the program to either receive a SIGPIPE or at a
minimum have the write operation return an error of some sort and either
shut the system down or restart nagios.  But in this case nothing is
happening.  Is this normal behavior for Nagios, or am I missing
something?

For the record we are running a modified version of nagios 2.7,  with
dnx 0.19, on  SLES 9 patch level 4 so if this is a known bug that was
fixed in a later version of nagios, I would really appreciate knowing
about that as well.

Thanks in advance!

Sincerely,
Steven Morrey


 NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/




More information about the Developers mailing list