Host Check Clarification

Bishop, Dean dean.bishop at tcdsb.org
Tue Oct 22 14:05:52 CEST 2002


The way to do this is to use the retry_check_interval and
max_check_attempts.

upon failure (this applies to both services and hosts) the
normal_check_interval is not used.  Rather the retry_check_interval is used.
The host/service will not become hard Non-OK until/unless max_check_attempts
is reached without getting an OK result.

so to avoid notifications for temporary outages, retry [max_check_attempts]
times every [retry_check_interval] minutes.

hope this helps,
dean

-----Original Message-----
From: Kevin Miller [mailto:kmiller at inflow.com]
Sent: Monday, October 21, 2002 6:53 PM
To: 'Bishop, Dean'
Subject: RE: [Nagios-users] Host Check Clarification


Thanks, that is what I assumed.  What I am actually looking for is a way to
suppress host down alerts from notifying me so quickly.  I am monitoring
hosts across the internet and therefore cannot control everything.  Very
often there will be a temporary routing problem that will clear up after 1
or 2 mins.  I would like nagios to keep trying for a few mins before paging
me.  


Any ideas?

Thanks

-----Original Message-----
From: Bishop, Dean [mailto:dean.bishop at tcdsb.org] 
Sent: Monday, October 21, 2002 3:03 PM
To: 'Kevin Miller '; 'nagios-users at lists.sourceforge.net '
Subject: RE: [Nagios-users] Host Check Clarification



i am away from my docs right now but here is how it works.


if the a service check, any service check (this would include the first of
many) returns a Non-OK status, then the host is checked.

if the host checks OK, then the services are scheduled for check using the
service's check_retry_interval.  If the service stays Non-OK until
max_attempts, then the service notification is sent.

if the host check is Non-OK, then the host is pounded.  If it stays Non-OK
until max_attempts (for the host) then the host notification is sent.

under both of these circumstances the service is now rescheduled at its
normal_check_interval.

the difference is that if the host is down, then service notifications are
squelched.



later,
dean


-----Original Message-----
From: Kevin Miller
To: nagios-users at lists.sourceforge.net
Sent: 10/21/2002 4:14 PM
Subject: [Nagios-users] Host Check Clarification

Looking for some clarification on Nagios Host checking.  I am monitoring
the SSH service on multiple hosts, from what I understand when the SSH
service check has problems, Nagios then tries to do a Host check.  
 
>From the documentation
"One instance where Nagios checks the status of a host is when a service
check results in a non-OK status. Nagios checks the host to decide
whether or not the host is up, down, or unreachable. If the first host
check returns a non-OK state, Nagios will keep pounding out checks of
the host until either (a) the maximum number of host checks (specified
by the max_attempts option in the host definition) is reached or (b) a
host check results in an OK state. "
 
The documentation states that Nagios dedicates all resources to checking
this host and then sends a notification that the host is down.  The part
that seems a little strange to me is that often I will get a Host Down
notification while Nagios is still doing test 1 out of 3 for the SSH
service.  I have my max_attempts set to 10 for each host, what is the
interval between these attempts?.  Is there anyway to tell Nagios to
perform host checks that are a certain interval apart (just like in
service checks) before sending a notification?  
 
 
Thanks
 
  
 
 


-------------------------------------------------------
This sf.net emial is sponsored by: Influence the future of 
Java(TM) technology. Join the Java Community Process(SM) (JCP(SM)) 
program now. http://ad.doubleclick.net/clk;4699841;7576301;v?
http://www.sun.com/javavote




More information about the Users mailing list