Bug with --enable-nanosleep?

Thomas Guyot-Sionnest Thomas at zango.com
Fri Aug 25 21:04:33 CEST 2006


I've been running a fairly big Nagios setup (600+ checks) for a few years 
now... Only issue so far is some lost passive checks under load (I posted 
about it some time ago, been dismissed as a non-issue which I think is not).

Some time ago (Aug 16 to be precise) I noticed there were a --enable-nanosleep 
option so I tried it to see if it helps for the passive checks problem. I 
couldn't see any change in performance or passive checks reliability, however 
I had an issue.

Twice since then I found out Nagios stopped running active checks and 
processing passive checks, so I had a stale daemon that wouldn't monitor 
anything apart from showing everything is fine. The first time was not so long 
after the nanosleep change, right after a restart so I dismissed it as an odd 
startup bug. The second time it happened was today (nagios were running fine 
since 2-3 days, last restart was for config change). For no apparent reasons 
it stopped running checks.

Running check_nagios manually shown that status file wasn't updated and 
process count were oscillating between 3 and 6.

I'm running nagios-2-x-cvs (2006-07-07 10:11:49), last commit was for a bug I 
reported:
* Bug fix for segfault during startup due to extended service definition 
duplication

Here's the last entries in the log (edited). Service X is a custom script 
scheduled to run every 5 minutes on some servers and reporting trough 
send_nsca:

[2006-08-25 13:46:47] Caught SIGHUP, restarting... <--- ME RESTARTING NAGIOS 
(STALE)
Informational Message[2006-08-25 13:15:20] Auto-save of retention data 
completed successfully.
Service Ok[2006-08-25 13:15:13] SERVICE ALERT: hostx.example.com;Service 
X;OK;SOFT;2;OK: Everything looks fine
Service Ok[2006-08-25 13:15:13] SERVICE ALERT: hosty.example.com;Service 
X;OK;SOFT;2;OK: Everything looks fine
Service Critical[2006-08-25 13:11:29] SERVICE ALERT: hosty.example.com;Service 
X;CRITICAL;SOFT;1;CRITICAL: Didn't recieved Service X results.
Service Critical[2006-08-25 13:11:29] SERVICE ALERT: hostx.example.com;Service 
X;CRITICAL;SOFT;1;CRITICAL: Didn't recieved Service X results.
Informational Message[2006-08-25 13:11:20] Warning: The results of service 
'Service X' on host 'hosty.example.com' are stale by 47 seconds (threshold=330 
seconds). I'm forcing an immediate check of the service.
Informational Message[2006-08-25 13:11:20] Warning: The results of service 
'Service X' on host 'hostx.example.com' are stale by 48 seconds (threshold=330 
seconds). I'm forcing an immediate check of the service.
Informational Message[2006-08-25 13:10:20] Auto-save of retention data 
completed successfully.
Informational Message[2006-08-25 13:05:20] Auto-save of retention data 
completed successfully.
Informational Message[2006-08-25 13:00:21] Auto-save of retention data 
completed successfully.


Thanks,

Thomas 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3022 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20060825/8f118f41/attachment.bin>
-------------- next part --------------
-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list