Problem with passive check

Thomas Nilsen Thomas.Nilsen at roxar.com
Fri Apr 8 13:07:03 CEST 2005


Hope someone can shed some light on this problem.

I've got 2 Nagios 1.2 servers installed, both active monitoring, but one also supports passive monitoring of the other one. One of the services on the server which receives the passive monitoring is constantly switching between FAIL and OK on a service. What's strange about it is that the server which is actively checking the service has had an OK status for months for this particular service. However, the main nagios server will report "Return Code of 127..."), and then soon after it will return OK. 

I can't figure out why this is, as the server doing the service checks never sends anything but OK to the main server.
The hardware for the main server (the one which receives the passive info) is a 3 CPU 768 MB RAM. Load is around average 24%. A total of 421 services on 102 hosts, where 24 servers and 137 services are passive.

Nagios.log show the the following for the serivce in question.
-------------------------------------------------------------------------------------
[1112947261] Warning: Return code of 127 for check of service 'CPU Load' on host 'micmac' was out of bounds. Make sure the plugin you're trying to run actually exists.
[1112947265] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
[1112947278] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;micmac;CPU Load;0;OK - load average: 0.00, 0.00, 0.00
[1112947278] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;micmac;Telnet;0;TCP OK - 0.001 second response time on port 23
[1112947306] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;micmac;ClearCase TCP;0;TCP OK - 0.000 second response time on port 371
[1112947340] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;micmac;Disk /;0;DISK OK [10617 MB (80%) free on /]
[1112947340] SERVICE ALERT: micmac;CPU Load;OK;SOFT;2;OK - load average: 0.00, 0.00, 0.00
[1112947340] SERVICE ALERT: micmac;Telnet;OK;HARD;1;TCP OK - 0.001 second response time on port 23
-------------------------------------------------------------------------------------


And the GUI event log for the service in question show this:
-------------------------------------------------------------------------------------

Service Critical[08-04-2005 10:26:28] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
Service Ok[08-04-2005 10:25:35] SERVICE ALERT: micmac;CPU Load;OK;SOFT;2;OK - load average: 0.00, 0.03, 0.00
Service Critical[08-04-2005 10:25:15] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
Program Start[08-04-2005 10:23:30] Nagios 1.2 starting... (PID=1144)
Program Restart[08-04-2005 10:23:07] Caught SIGHUP, restarting...
Service Ok[08-04-2005 10:20:04] SERVICE ALERT: micmac;CPU Load;OK;HARD;3;OK - load average: 0.12, 0.09, 0.02
Service Critical[08-04-2005 10:18:59] SERVICE ALERT: micmac;CPU Load;CRITICAL;HARD;3;(Return code of 127 is out of bounds - plugin may be missing)
Service Critical[08-04-2005 10:18:42] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;2;(Return code of 127 is out of bounds - plugin may be missing)
Service Critical[08-04-2005 10:17:41] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
Service Ok[08-04-2005 10:14:52] SERVICE ALERT: micmac;CPU Load;OK;SOFT;3;OK - load average: 0.00, 0.00, 0.00
Service Critical[08-04-2005 10:13:53] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;2;(Return code of 127 is out of bounds - plugin may be missing)
Service Critical[08-04-2005 10:11:28] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
Service Ok[08-04-2005 10:10:50] SERVICE ALERT: micmac;CPU Load;OK;SOFT;2;OK - load average: 0.00, 0.00, 0.00
Service Critical[08-04-2005 10:10:50] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
Service Ok[08-04-2005 10:09:40] SERVICE ALERT: micmac;CPU Load;OK;SOFT;2;OK - load average: 0.00, 0.00, 0.00
Service Critical[08-04-2005 10:08:44] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
Service Ok[08-04-2005 10:05:37] SERVICE ALERT: micmac;CPU Load;OK;SOFT;2;OK - load average: 0.00, 0.00, 0.00
Service Critical[08-04-2005 10:04:52] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
Service Ok[08-04-2005 10:02:20] SERVICE ALERT: micmac;CPU Load;OK;SOFT;2;OK - load average: 0.00, 0.00, 0.00
Service Critical[08-04-2005 10:01:05] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;1;(Return code of 127 is out of bounds - plugin may be missing)
-------------------------------------------------------------------------------------

I've also got a server on the same LAN as the main nagios server which constantly failes it host and service checks every 30 minutes or so. Timeout on the host check is 10 sec. However, if I run a local ping against it from the nagios box, it will never drop a single package... 

Anyone got any good ideas?

Regards, 
Thomas Nilsen
BGO / SVG Support
Roxar AS
Tel: +47 55599505 / Mob: +47 916 98 229

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20050408/38cad92a/attachment.html>


More information about the Users mailing list