Host status never returns to UP

Kirk Hoganson khoganson at comcast.net
Wed Apr 21 07:36:01 CEST 2004


The problem is that nagios doesn't care about any of the working 
services when determining if a server is UP.  It only uses the one check 
that you specify (per host) in hosts.cfg.  Because those hosts behind 
the firewall can't be pinged normally, the default check fails.  Don't 
worry about all the ones that are working... those don't matter.  The 
only one that matters is the one you specify as the check_command for 
that host in the hosts.cfg.  By default it uses a normal one packet ping 
to determine if a host is back up.  If that fails the host will not be 
considered UP regardless of whatever other commands work.

Go to your hosts.cfg, find the check_command for the problem hosts.  You 
can change that to be a new ping that works on the special port, by 
creating that special check in the checkcommands.cfg.  Go back to your 
hosts.cfg, and specify that the check_command for problem hosts now use 
the new special ping you created in checkcommands.cfg.

XeloQ Comms wrote:
> Thanks for your answer although I find it a bit hard to understand.
> 
> Let me explain to you about my situation. I have about 300 servers in my
> system and about 17.000 endpoints that are being scanned with different
> variations of Check_nt, Check nrpe, check_snmp and mrtg and tcp mainly.
> 
> All the commands are created by myself reading the manual and the readme's.
> 
> Almost everything is working except some endpoints that can't be reached by
> snmp or tcp checks due to firewall issues and that I only can check with a
> ping on an allowed port.
> 
> Due to the nature of the endpoints (snmp aware h323 1 port gateways) that I
> use the ping command is excellent to inform me if the ping delays are
> reaching certain limits. And from here we go. I received the warning and the
> critical that I configure in the host and services.cfg. Than eventually the
> host goes down in nagios telling me that it can't be reached anymore. This
> is NOT TRUE. In the real life the host never went down.  Than after a while
> when the ping is again within its limits the service goes green but the HOST
> is still showing DOWN and RED. The configs that I showed you have all the
> notifications options in the host.cfg d u and r and service w u c r. I have
> read the manual like a couple of hundred times to be sure I understood this
> correctly.
> I am not sure and maybe should introduce the flapping?
> 
> The same host are being checked with snmp and a tcp check that are just
> working fine. If I turn manually off one host they go red and come back
> green again.
> 
> I limited the initial test setup for nagios to 6 servers , 3 W-2000 and 3
> rh9's and 20 H323 endpoints to first analyse the mentioned ping problem. I
> run a second system scanning all the h323 endpoints with a check snmp
> command running excellent.
> 
> Hope this explained a little my issue. Again thanks for answering.
> 
> Tjapko.
> 
> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net
> [mailto:nagios-users-admin at lists.sourceforge.net]On Behalf Of Kirk
> Hoganson
> Sent: Martes, 20 de Abril de 2004 05:08 p.m.
> To: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Host status never returns to UP
> 
> 
> I don't think that the source of the problem is entirely obvious.  I
> think that the defaults work just fine for most people.  That is
> probably the reason for such few responses.  Almost everyone can use the
> defaults and never have this problem.  The solution should be pretty
> simple though, at least if it's the same problem.  The hosts.cfg file
> has a line for each host that starts with check_command.  This is the
> command used to determine and set the server state.  The problem you are
> having is most likely that this command is failing for some reason.  The
> check_ping command is not the command used to determine host status.  So
> even if that is OK, it won't set the server state to UP, unless the
> command specified in the hosts.cfg can also return an OK.  The command
> that is failing is NOT set in the services.cfg.  It is set in hosts.cfg,
> and the specifics for that command are in the checkcommands.cfg.
> 
> XeloQ Comms wrote:
> 
>>YEs me, I have exactly the same problem. I tried the amount of retries to
>>reduce from 10 to 3 and that  seems to improve my situation but than i ran
>>into the same problem. Even more weird. The service shows OK but the the
>>server state is DOWN although you can ping to the unit. A nagios shutdown
>>and restart resolves the problem for me but I don't know how to fis this.
>>Happens with a ping only the rest I have changesd to snmp and tcp and that
>>runs fine. I think it is something in the config file. I posted my problem
>>before on this list and got only 1 reaction. This indicates that it is
>>something obvious or rtfm. Dunno. Tjapko.
>>
>>-----Original Message-----
>>From: nagios-users-admin at lists.sourceforge.net
>>[mailto:nagios-users-admin at lists.sourceforge.net]On Behalf Of Kirk
>>Hoganson
>>Sent: Martes, 20 de Abril de 2004 05:19 a.m.
>>To: nagios-users at lists.sourceforge.net
>>Subject: Re: [Nagios-users] Host status never returns to UP
>>
>>
>>I figured this out, and decided to post the solution for those who have
>>had similar problems.  I have recently seen posts about similar
>>problems, though I don't believe they were using check_by_ssh, the
>>solution should be the same.
>>
>>The problem is really a simple configuration problem.  Even if you are
>>using a check_ping check that is returning an OK result, this is not the
>>check that nagios uses to determine the UP/DOWN status of a host.  The
>>check used by nagios for each host is determined in the hosts.cfg, and
>>is not always the same as any command you might be using to correctly
>>ping the host.  By default that command is check-host-alive.  If for
>>example you are using check_by_ssh to ping hosts in private network, the
>>check-host-alive will fail and your hosts will never return to an UP
>>status, even if the host comes up and your ping using check_by_ssh
>>returns an OK.
>>
>>Once you understand the problem, the solution is obvious.
>>
>>Kirk Hoganson wrote:
>>
>>
>>>I am monitoring the status of several servers using check_by_ssh to run
>>>a check_ping (this is necessary given the network topology).  It will
>>>monitor the status without difficulty until the server goes down.  At
>>>which point it will send a critical alert.  However, once the server
>>>comes up it will never send a recovery alert.
>>>
>>>When the server comes up, the status log will show that the PING is now
>>>OK, but the host status will never reset to UP.  It will remain in a
>>>DOWN state, until Nagios is restarted.
>>>
>>>Any thoughts from anyone...  The notification options for the service
>>>specify notification on recovery.  The problem is that even though the
>>>ping becomes OK, the server status never becomes UP.
>>>
>>>Has anyone seen anything like this?
>>>
>>>
>>
>>
>>
>>-------------------------------------------------------
>>This SF.Net email is sponsored by: IBM Linux Tutorials
>>Free Linux tutorial presented by Daniel Robbins, President and CEO of
>>GenToo technologies. Learn everything from fundamentals to system
>>administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
>>_______________________________________________
>>Nagios-users mailing list
>>Nagios-users at lists.sourceforge.net
>>https://lists.sourceforge.net/lists/listinfo/nagios-users
>>::: Please include Nagios version, plugin version (-v) and OS when
> 
> reporting
> 
>>any issue.
>>::: Messages without supporting info will risk being sent to /dev/null
>>
>>---
>>Incoming mail is certified Virus Free.
>>Checked by AVG anti-virus system (http://www.grisoft.com).
>>Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004
>>
>>---
>>Outgoing mail is certified Virus Free.
>>Checked by AVG anti-virus system (http://www.grisoft.com).
>>Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004
>>
>>
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
> 
> ---
> Incoming mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004
> 
> ---
> Outgoing mail is certified Virus Free.
> Checked by AVG anti-virus system (http://www.grisoft.com).
> Version: 6.0.659 / Virus Database: 423 - Release Date: 15/04/2004
> 
> 


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list