[naemon-users] (No output returned from host check) / Naemon SIGKILLs jobs

Kevin Hauf - ratiokontakt GmbH kh at ratiokontakt.de
Mon May 22 11:07:07 CEST 2017


Hello,

I'm experiencing a rather strange issue and I'm at a point where I have
no idea how to debug it further.

I see a lot of host and service checks returning "(No output returned
from host check)". Any check can be affected, and most of the time
multiple checks fails simultaneously.

After turning on the debug log, the only two lines I can see that are
associated with the failure of such a check are:

[1494431930.179385] [4096.0] [pid=2361] wproc: job 19690 from worker
Core Worker 29950 died by signal 9 after 8.99 seconds[1494431930.179390]
[4096.1] [pid=2361] wproc:   command:
/usr/lib64/naemon/plugins/check_ping -H 212.223.101.126 -w 40.00,1% -c
80.00,1% -p 10
[1494431930.179394] [4096.1] [pid=2361] wproc:   early_timeout=0;
exited_ok=1; wait_status=9; error_code=0;

So apparently Naemon kills the job via signal 9. I verified this with a
systemtap script to catch any SIGKILLs happening on the system:

Mon May 22 10:40:41 2017 CEST SIGKILL was sent to check_snmp_load
(pid:310) by naemon uid:496
Mon May 22 10:40:41 2017 CEST SIGKILL was sent to check_snmp_stor
(pid:311) by naemon uid:496
Mon May 22 10:40:41 2017 CEST SIGKILL was sent to head (pid:334) by
naemon uid:496
Mon May 22 10:40:41 2017 CEST SIGKILL was sent to grep (pid:333) by
naemon uid:496

I can't see anything leading up to that kill in the log, so I'm at a
loss here as to why it is happening. The time before killing the job
seems to be insignificant as well, as I can see anything from 0.01 to
the above 8.99 seconds in the log.

The killed checks usually recover when they're rechecked, but
occassionally the same check fails multiple times in a row, so they
enter a hard state.

I'm running naemon-1.0.6-1.el6.x86_64 with thruk-2.14-1.x86_64 and
naemon-livestatus-1.0.6-1.el6.x86_64 on CentOS 6.9. There are about 6500
checks defined (hosts + services). The average check latency is below 1
second.

Any suggestions?

Thanks,
Kevin Hauf


More information about the Naemon-users mailing list