After svc recovery, host still down | Svc warning results in host down

Bas van der Veen bas.vanderveen at kahuna.nl
Fri Mar 7 16:04:18 CET 2003


Hi,

I got Nagios up and running just fine on FreeBSD. It pings around for 92 hosts, everything is working fine as long as everything is ok :)

I am experiencing trouble with 2 things:

1) I have defined a ping service (with check_ping) for each host (apart from the check-host-alive host command). If the service fails and the state becomes critical, both the 'Hosts' and 'Services' pane (tactical overview) show the critical state. However, if the service state becomes OK again, the host state doesn't change. When I look at the hosts via the 'Hosts'pane (tac. overview), all the critical hosts are shown including the host for which the state changed to OK. The state is even printed in green!
What am I doing wrong here? I looked at the manual, especially the part on state changes, but couldn't find the answer :(

2) With check ping, you can define a warning level and a critical level. If the ping results in a value that triggers a warning, the service is being marked 'warning', the host is being marked down. This surprises me, as this is only a warning about packet loss or a high RTA value.
My question: a ping warning result equals host down?

Thanks in advance,

Bas

sample host and svc definition:
------------
# nl_test host definition
define host {
        use                     generic-host
        host_name               nl_test
        alias                   alias
        address                 62.177.182.x
        check_command           check-host-alive
        max_check_attempts      10
        notification_interval   0
        notification_period     5x12
        notification_options    d
        }

#
# Service definition
#
define service{
        use                             generic-service
        host_name                       nl_test
        service_description             hst_ping_24x7_5x12
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           5
        retry_check_interval            1
        contact_groups                  on_call_support
        notification_interval           0
        notification_period             5x12
        notification_options            c
        check_command                   check_ping!4000.0,50%!5000.0,60%
        }
---------------


-----Original Message-----
From: Daniel Finn [mailto:DFinn at studentadvantage.com]
Sent: donderdag 6 maart 2003 19:14
To: 'Carroll, Jim P [Contractor]'; Daniel Finn;
'nagios-users at lists.sourceforge.net'
Subject: RE: [Nagios-users] problems using check_snmp to monitor
sendmail


The way our mail app is setup we do need sendmail to listen on localhost:25
and only there.  We've considered allowing the nagios server access to port
25 on each machine but it would mean tweaking the sendmail config on 50+
servers.  At this point for ease of management it's much easier to have
every server run the same sendmail.cf.

-----Original Message-----
From: Carroll, Jim P [Contractor] [mailto:jcarro10 at sprintspectrum.com]
Sent: Thursday, March 06, 2003 12:44 PM
To: 'Daniel Finn'; 'nagios-users at lists.sourceforge.net'
Subject: RE: [Nagios-users] problems using check_snmp to monitor
sendmail


I can't comment on the SNMP approach, but I'm curious:  Do you need to have
sendmail listening on socket 127.0.0.1:25?  I usually disable the listen
functionality of sendmail for outbound-only hosts.  Alternatively, if you're
going to have sendmail listen on port 25 at all, you could permit
connections from your Nagios server.  (I know, it would mean tweaking all
the hosts that you're interested in monitoring....)

Food for thought.

jc


> -----Original Message-----
> From: Daniel Finn [mailto:DFinn at studentadvantage.com]
> Sent: Thursday, March 06, 2003 10:42 AM
> To: 'nagios-users at lists.sourceforge.net'
> Subject: [Nagios-users] problems using check_snmp to monitor sendmail
> 
> 
> I'm having issues trying to implement check_snmp to monitor 
> the status of
> sendmail.  Your first question is why am I not just using 
> check_smtp, it's
> because all the mail servers that I need to implement this on 
> only listen on
> 127.0.0.1:25.
> 
> I've enabled the MIB or OID in snmpd.conf on the test box and 
> that appears
> to be working fine:
> 
> [root at sadqalx38 plugins]# snmpwalk -v 1 l52m-be sapublic 
> .1.3.6.1.4.1.2021.2
> 
> enterprises.ucdavis.prTable.prEntry.prIndex.1 = 1
> enterprises.ucdavis.prTable.prEntry.prNames.1 = sendmail
> enterprises.ucdavis.prTable.prEntry.prMin.1 = 1
> enterprises.ucdavis.prTable.prEntry.prMax.1 = 10
> enterprises.ucdavis.prTable.prEntry.prCount.1 = 2
> enterprises.ucdavis.prTable.prEntry.prErrorFlag.1 = 0
> enterprises.ucdavis.prTable.prEntry.prErrMessage.1 = 
> enterprises.ucdavis.prTable.prEntry.prErrFix.1 = 0
> enterprises.ucdavis.prTable.prEntry.prErrFixCmd.1 = 
> 
> [root at sadqalx38 plugins]# snmpwalk -v 1 -On l52m-be sapublic
> .1.3.6.1.4.1.2021.2
> .1.3.6.1.4.1.2021.2.1.1.1 = 1
> .1.3.6.1.4.1.2021.2.1.2.1 = sendmail
> .1.3.6.1.4.1.2021.2.1.3.1 = 1
> .1.3.6.1.4.1.2021.2.1.4.1 = 10
> .1.3.6.1.4.1.2021.2.1.5.1 = 3
> .1.3.6.1.4.1.2021.2.1.100.1 = 0
> .1.3.6.1.4.1.2021.2.1.101.1 = 
> .1.3.6.1.4.1.2021.2.1.102.1 = 0
> .1.3.6.1.4.1.2021.2.1.103.1 = 
> 
> 
> So, it appears I can either monitor
> enterprises.ucdavis.prTable.prEntry.prCount.1 which lists the 
> number of
> sendmail processes running or
> enterprises.ucdavis.prTable.prEntry.prErrorFlag.1 which is 0 
> when sendmail
> is running or 1 when it's either not running or there are 
> more procs running
> than what you've specified acceptible (10 in this case).
> 
> 
> Here's the problem I'm having with the snmp_check program.  It always
> reports back that it's either at a WARNING state or a CRITICAL state.
> 
> [root at sadqalx38 plugins]# ./check_snmp -H 10.209.68.92 -o
> .1.3.6.1.4.1.2021.2.1.5.1 -w '1:10' -c '1:10' -C sapublic
> SNMP WARNING - 1
> 
> 
> Am I doing something wrong.  Shouldn't it return OK because 
> the value it's
> returning is between 1 and 10?  Now if I change the warning 
> and critical
> threshholds to be outside of what it's returning I get :
> 
> [root at sadqalx38 plugins]# ./check_snmp -H 10.209.68.92 -o
> .1.3.6.1.4.1.2021.2.1.5.1 -w '3:10' -c '3:10' -C sapublic
> SNMP CRITICAL - *1*
> 
> I've downloaded nagios-plugins-1.3.0.tar.gz and compiled them for this
> system and I still have the same problem.  If anyone could 
> help that would
> be great.  Also, I'm not subscribed to this list so if you 
> could reply to me
> via email that would be great.
> 
> Thanks
> Dan Finn
> Unix Systems Administrator
> Student Advantage Inc
> dfinn at studentadvantage.com
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Etnus, makers of 
> TotalView, The debugger 
> for complex code. Debugging C/C++ programs can leave you 
> feeling lost and 
> disoriented. TotalView can help you find your way. Available 
> on major UNIX 
> and Linux platforms. Try it free. www.etnus.com
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS 
> when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
> 


-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger for complex code. Debugging C/C++ programs can leave you feeling lost and disoriented. TotalView can help you find your way. Available on major UNIX and Linux platforms. Try it free. www.etnus.com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


******************************************************************
This footnote also confirms that this email message has 
been swept by MIMEsweeper for the presence of 
computer viruses.
******************************************************************
http://www.kahuna.nl



-------------------------------------------------------
This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger 
for complex code. Debugging C/C++ programs can leave you feeling lost and 
disoriented. TotalView can help you find your way. Available on major UNIX 
and Linux platforms. Try it free. www.etnus.com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list