No Output

Marc Powell marc at ena.com
Tue Jun 27 15:31:49 CEST 2006



> -----Original Message-----
> From: Williard, Jason [mailto:Jason.Williard at chartercom.com]
> Sent: Monday, June 26, 2006 5:58 PM
> To: nagios-users at lists.sourceforge.net; Marc Powell
> Subject: RE: [Nagios-users] No Output
> 
> > -----Original Message-----
> > From: Williard, Jason [mailto:Jason.Williard at chartercom.com]
> > Sent: Monday, June 26, 2006 5:03 PM
> > To: Marc Powell; nagios-users at lists.sourceforge.net
> > Subject: RE: [Nagios-users] No Output
> >
> > HOSTS.CFG ENTRY
> > ----------------
> > define host{
> >         use     generic-hos
> >         host_name               22xx-WOR-905
> >         alias                   WOR 905 Tunnel 1
> >         address                 172.29.xxx.xxx
> >         parents                 KWA-Core-7206,MOR-7206
> >         check_command           check-host-alive
> >         max_check_attempts      10
> >         notification_interval   60
> >         notification_period     24x7
> >         notification_options    d,u,r
> >         contact_groups          field-admins
> >         }
> >
> > CHECKCOMMANDS.CFG ENTRY
> > ------------------------
> 
> Thanks. Looks good.
> >
> > TEST RUN
> > ---------
> > [root at web nagios]# /usr/lib/nagios/plugins/check_ping -H
> 172.29.xxx.xxx
> > -w 3000.0,80% -c 5000.0,100% -p 1
> > CRITICAL - Plugin timed out after 10 seconds
> 
> You should always perform your tests as the nagios user. Root can run
> commands that a normal user might not be able to (including ping) and
> you may see different output. In any event, the output above for this
> host looks good. Do you have the same check-host-alive command
specified
> for KWA-Core-7206 and MOR-7206? I am presuming that you can't ping
this
> host because one/both of those are down? How about the same kind of
test
> run for those two hosts. Also, try pinging them directly from the
> command line with /bin/ping -n -U -c 1 172.29.xxx.xxx.
> 
> Did you upgrade the plugins when you upgraded nagios? Were there any
> other system upgrades performed at the same time?
> 
> 
> 
> I ran the exact same test as the nagios user and got the same result:
> [root at vuo02web nagios]# su nagios
> sh-3.00$ /usr/lib/nagios/plugins/check_ping -H 172.29.248.75 -w
> 3000.0,80% -c 5000.0,100% -p 1
> CRITICAL - Plugin timed out after 10 seconds
> 
> 
> As for the KWA-Core-7206 & MOR-7206 sites; these are both UP and
> pingable.  I know that the two sites currently showing UNREACHABLE are
> down.  We can confirm this by looking at their tunnel status.
However,
> in previous versions of Nagios, it would show the site as DOWN rather
> than UNREACHABLE.

That should still be the case. Nagios determines down v.s. unreachable
by running the host check_command for the parents. If that returns down
then the initial host is unreachable, otherwise the host is just down. I
don't use host checks so I am unable to verify this myself but I haven't
heard of anyone else having the same issue. I've looked through checks.c
and the only place an UNREACHABLE status is assigned is here --

[parent checks removed. Sets value of route_blocked to FALSE if they are
all up]
* if this host has at least one parent host and the route to this host
is blocked, it is unreachable */
if(route_blocked==TRUE && hst->parent_hosts!=NULL)
	return_result=HOST_UNREACHABLE;

/* else the parent host is up (or there isn't a parent host), so this
host must be down */
else
	return_result=HOST_DOWN;
 
This is why I believe that the UNREACHABLE status is coming from the
checks of the parents. 

> As well, the Status Information would display something like "CRITICAL
-
> Plugin timed out after 10 seconds" rather than "(No output!)".  I am
> assuming this is a plugin issue, but, as mentioned before, I am
assuming
> the plugins are working as the same check-host-alive command works for
> sites that are up.

The no output error is interesting and my feeling is that check_ping is
having problems parsing the output of /bin/ping when a host is down.
That's why I was expecting an error when you ran it manually. Try
extending the timeout to something longer to see if you get different
output --

/usr/lib/nagios/plugins/check_ping -H 172.29.xxx.xxx
-w 3000.0,80% -c 5000.0,100% -p 1 -t 30

It definitely shouldn't have taken 10 seconds to send 1 ping to that
host in the first place. Using strace might show something interesting
if you know how to use that application.
 
> When we did the upgrade, we basically wiped the whole Nagios system
and
> installed the new version.  The new latest plugins were installed
along
> with version 2.4 of Nagios.  The old cfg files were copied into the
> config directory and modified to fit the new parameters.

Ok. Just on the off chance, have you verified that you have only 1
nagios daemon running?

--
Marc


Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list