Problem with passive check

Marc Powell marc at ena.com
Fri Apr 8 15:03:01 CEST 2005



> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of Thomas Nilsen
> Sent: Friday, April 08, 2005 6:07 AM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] Problem with passive check
> 
> Hope someone can shed some light on this problem.
> 
> I've got 2 Nagios 1.2 servers installed, both active monitoring, but
one
> also supports passive monitoring of the other one. One of the services
on
> the server which receives the passive monitoring is constantly
switching
> between FAIL and OK on a service. What's strange about it is that the
> server which is actively checking the service has had an OK status for
> months for this particular service. However, the main nagios server
will
> report "Return Code of 127..."), and then soon after it will return
OK.
> 
> I can't figure out why this is, as the server doing the service checks
> never sends anything but OK to the main server.
> The hardware for the main server (the one which receives the passive
info)
> is a 3 CPU 768 MB RAM. Load is around average 24%. A total of 421
services
> on 102 hosts, where 24 servers and 137 services are passive.
> 
> Nagios.log show the the following for the serivce in question.
>
------------------------------------------------------------------------
--
> -----------
> [1112947261] Warning: Return code of 127 for check of service 'CPU
Load'
> on host 'micmac' was out of bounds. Make sure the plugin you're trying
to
> run actually exists.
> 
> [1112947265] SERVICE ALERT: micmac;CPU Load;CRITICAL;SOFT;1;(Return
code
> of 127 is out of bounds - plugin may be missing)

Nagios on this machine is attempting to actively check this service but
the plugin referenced by the check_command for this host and service
either does not exist or is not in the specified location. If it's not
supposed to be actively checking this service, set active_checks to 0
for it. If it is supposed to be actively checking then make sure your
command definition is correct and the plugin exists where you've told
nagios to find it.

> 
> [1112947278] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;micmac;CPU
> Load;0;OK - load average: 0.00, 0.00, 0.00
> [1112947278] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;micmac;Telnet;0;TCP OK - 0.001 second
> response time on port 23

This external command (passive service check) overrides the Warning
state above. That's why you see it flip-flop between the two states.

> 
> I've also got a server on the same LAN as the main nagios server which
> constantly failes it host and service checks every 30 minutes or so.
> Timeout on the host check is 10 sec. However, if I run a local ping
> against it from the nagios box, it will never drop a single package...

How does it 'fail'? There are lots of ways. Plugin timeout? 100% packet
loss? High latency? What's the service and command definition?

> Anyone got any good ideas?

You tell us, we just provide the suggestions.
 
--
Marc


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list