Host status never returns to UP

Kirk Hoganson khoganson at comcast.net
Tue Apr 20 23:08:10 CEST 2004


I don't think that the source of the problem is entirely obvious.  I 
think that the defaults work just fine for most people.  That is 
probably the reason for such few responses.  Almost everyone can use the 
defaults and never have this problem.  The solution should be pretty 
simple though, at least if it's the same problem.  The hosts.cfg file 
has a line for each host that starts with check_command.  This is the 
command used to determine and set the server state.  The problem you are 
having is most likely that this command is failing for some reason.  The 
check_ping command is not the command used to determine host status.  So 
even if that is OK, it won't set the server state to UP, unless the 
command specified in the hosts.cfg can also return an OK.  The command 
that is failing is NOT set in the services.cfg.  It is set in hosts.cfg, 
and the specifics for that command are in the checkcommands.cfg.

XeloQ Comms wrote:
> YEs me, I have exactly the same problem. I tried the amount of retries to
> reduce from 10 to 3 and that  seems to improve my situation but than i ran
> into the same problem. Even more weird. The service shows OK but the the
> server state is DOWN although you can ping to the unit. A nagios shutdown
> and restart resolves the problem for me but I don't know how to fis this.
> Happens with a ping only the rest I have changesd to snmp and tcp and that
> runs fine. I think it is something in the config file. I posted my problem
> before on this list and got only 1 reaction. This indicates that it is
> something obvious or rtfm. Dunno. Tjapko.
> 
> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net
> [mailto:nagios-users-admin at lists.sourceforge.net]On Behalf Of Kirk
> Hoganson
> Sent: Martes, 20 de Abril de 2004 05:19 a.m.
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Host status never returns to UP
> 
> 
> I figured this out, and decided to post the solution for those who have
> had similar problems.  I have recently seen posts about similar
> problems, though I don't believe they were using check_by_ssh, the
> solution should be the same.
> 
> The problem is really a simple configuration problem.  Even if you are
> using a check_ping check that is returning an OK result, this is not the
> check that nagios uses to determine the UP/DOWN status of a host.  The
> check used by nagios for each host is determined in the hosts.cfg, and
> is not always the same as any command you might be using to correctly
> ping the host.  By default that command is check-host-alive.  If for
> example you are using check_by_ssh to ping hosts in private network, the
> check-host-alive will fail and your hosts will never return to an UP
> status, even if the host comes up and your ping using check_by_ssh
> returns an OK.
> 
> Once you understand the problem, the solution is obvious.
> 
> Kirk Hoganson wrote:
> 
>>I am monitoring the status of several servers using check_by_ssh to run
>>a check_ping (this is necessary given the network topology).  It will
>>monitor the status without difficulty until the server goes down.  At
>>which point it will send a critical alert.  However, once the server
>>comes up it will never send a recovery alert.
>>
>>When the server comes up, the status log will show that the PING is now
>>OK, but the host status will never reset to UP.  It will remain in a
>>DOWN state, until Nagios is restarted.
>>
>>Any thoughts from anyone...  The notification options for the service
>>specify notification on recovery.  The problem is that even though the
>>ping becomes OK, the server status never becomes UP.
>>
>>Has anyone seen anything like this?
>>
>>
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
> 
> ---
> Incoming mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004
> 
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004
> 
> 


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list