nagios 3 host checks logic problem on some kernels/distros

Thomas Stolle it0a60 at retail-sc.com
Tue Sep 18 10:11:00 CEST 2007


From: SCHAER Frederic <frederic.schaer <at> cea.fr>
Subject: nagios 3 host checks logic problem on some kernels/distros
Newsgroups: gmane.network.nagios.devel
Date: 2007-09-10 16:17:30 GMT (1 week, 15 hours and 23 minutes ago)
Hi, 
 
I think I identified a problem (but not and the solution) on the nagios 3 
source tree? 
I tried with both the 3.0b3 and cvs HEAD source files and could not get 
rid of the problem. 
I?m running a 2.4.21 kernel on a RHEL3 box. 
 
What happens is that as soon as I start nagios 3, it starts eating all of 
the CPU. 
Stracing the nagios process shows this (and almost only this): 
gettimeofday({1189419621, 161574}, NULL) = 0 
time([1189419621])                      = 1189419621 
time([1189419621])                      = 1189419621 
gettimeofday({1189419621, 183742}, NULL) = 0 
gettimeofday({1189419621, 183780}, NULL) = 0 
gettimeofday({1189419621, 183814}, NULL) = 0 
time([1189419621])                      = 1189419621 
gettimeofday({1189419621, 184172}, NULL) = 0 
gettimeofday({1189419621, 184326}, NULL) = 0 
time([1189419621])                      = 1189419621 
time([1189419621])                      = 1189419621 
gettimeofday({1189419621, 184734}, NULL) = 0 
gettimeofday({1189419621, 184861}, NULL) = 0 
 
I tried stracing nagios on a Ubuntu feisty (7.04) box, and the output is 
much different : there are nanosleep calls? 
I tried activating and deactivating nanosleeps at nagios compile time, but 
this did not solve my problem. 
 
Having full debug, I have this kind of output at the nagios start : 
[1189438977.881574] [016.0] [pid=18234] Attempting to run scheduled check 
of host 'wn010': check options=0, latency=0.874000 
[1189438977.881651] [001.0] [pid=18234] run_async_host_check_3x() 
[1189438977.881665] [016.0] [pid=18234] ** Running async check of host 
'wn010'... 
[1189438977.881678] [001.0] [pid=18234] check_host_check_viability_3x() 
[1189438977.881691] [001.0] [pid=18234] check_time_against_period() 
[1189438977.881712] [001.0] [pid=18234] check_host_dependencies() 
[1189438977.881726] [016.1] [pid=18234] A check of this host is already 
being executed, so we'll pass for the moment... 
[1189438977.881739] [016.1] [pid=18234] Unable to run scheduled host check 
at this time 
 
If I run nagios just for  2 seconds and then hit CTRL+C, I still see this 
: 
>grep "A check of this host is already being executed" 
/var/log/nagios/nagios.debug | wc -l 
    971 
 
>grep "Attempting to run scheduled check of host 'wn010'" 
/var/log/nagios/nagios.debug | wc -l 
    971 
>grep "Attempting to run scheduled check of host" 
/var/log/nagios/nagios.debug | wc -l 
    971 
 
I have 53 hosts defined, I don?t understand why nagios is checking ever 
and ever the same host? and why this is not happening on all systems. 
 
De-activating host checks magically ?solves? the problem. 
 
I just found out that commenting hosts ?check_command? caused this 
behaviour (with host_checks_enabled=true), and that defining a correct 
check_command prevented nagios from being so CPU hungry? 
 
Hope I helped? 
 
Cheers 

Dear List,

I can confirm the problem Frederic reported.
I am using Nagios 3.0b3 on CentOS 4.4
After starting nagios, the process catches nearly 100 % CPU (See 
top-output  below)
Disableing hostchecks let the process return to normal values.
As far as I can remember, the problem did not occour with nagios3.0a (but 
I can not verify at the moment)

Tasks:  89 total,   3 running,  86 sleeping,   0 stopped,   0 zombie
Cpu(s): 26.0% us,  1.3% sy,  0.0% ni, 72.6% id,  0.0% wa,  0.1% hi,  0.0% 
si
Mem:   4041580k total,  1373844k used,  2667736k free,    60200k buffers
Swap:  4192956k total,        0k used,  4192956k free,  1137348k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
28617 nagios    25   0 29756  10m 1056 R   96  0.3  17:12.48 nagios
    1 root      16   0  4752  552  460 S    0  0.0   0:02.75 init
    2 root      RT   0     0    0    0 S    0  0.0   0:00.04 migration/0


Thomas


P Please consider the environmental impact of needlessly printing this 
e-mail. 

--
RSC Commercial Services OHG
Wanheimer Strasse 70, D-40468 Duesseldorf
Registergericht: Duesseldorf, HRA 12655

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20070918/f6de49e9/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list