Service Alerts and Notifications

wnorth wnorth at verizon.net
Sat Jan 6 01:19:23 CET 2007


That is actually interesting, when the host goes down I see a HARD service
alert as follows:

HOST ALERT: ebro;DOWN;HARD;5;CRITICAL - Host Unreachable (10.0.33.8)

But for the check_http I only see the following:

SERVICE ALERT: ebro;Website App Server MS2;CRITICAL;SOFT;3;Connection
refused

Once I changed the retry interval to 1 and the max attempts to 1 I saw the
email, so I just wasn't waiting long enough...makes sense. In theory I would
want it to try 3 times in a row, if it fails send an email, then wait 5
minutes and retry again.

For that to work I tried the following: 
max_check_attempts 3
retry_check_interval 5
normal_check_interval 5

This should force it to try 3 times before setting a HARD alert and wait 5
minutes between normal intervals, however that is not what it does, it
appears it sets the retry_check_interval to 5 minutes between non-OK service
alerts, so if I tell it to try 3 times, it will try 3 times and wait
in-between tries for 5 minutes, if I set it to 2 on the retry it will wait 2
minutes in between tries, which comes out to a total of 6 minutes. I'd
rather it fail after a minute or so, so if I set it to 0 it will inherit a
standard minute...the only way to solve this is to set it at a 1 minute
interval and just wait.

Sound about right?

-----Original Message-----
From: Josh Yost [mailto:Josh.Yost at epsiia.com] 
Sent: Friday, January 05, 2007 3:56 PM
To: wnorth at verizon.net
Cc: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Service Alerts and Notifications

Hi,
	This is kind of stupid/obvious, but

a) I don't see a HARD service alert in your log snip for the service.
Did it actually get to that state?  Your retry interval is 3 min, so it
would take you 15 min or so to get an alert.

b) If it did get to HARD, what was the cmd it tried to run & is that a
valid cmd?

c) Did you kill all the old processes and restart Nagios w/ the new config?

I don't see anything obvious in your cfgs that wouldn't be working.

- Josh


wnorth at verizon.net wrote:
> I have setup a few host and HTTP service checks and alerts. When a host
goes down I recieve an email, but when the check_http service fails (e.g.
the TCP port is shutdown on the web server) I see the service alert in the
nagios.log as follows:
> 
> [1168038639] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;ebro;Website App
Server MS2;1168038636
> [1168038644] SERVICE ALERT: ebro;Website App Server
MS2;CRITICAL;SOFT;1;Connection refused
> [1168038824] SERVICE ALERT: ebro;Website App Server
MS2;CRITICAL;SOFT;2;Connection refused
> [1168039004] SERVICE ALERT: ebro;Website App Server
MS2;CRITICAL;SOFT;3;Connection refused
> 
> But I do not recieve an email. The following service is defined:
> 
> define service{
>         host_name               ebro
>         service_description     Website App Server MS2
>         check_command           check_http_fitness_app
>         max_check_attempts      5
>         normal_check_interval   5
>         retry_check_interval    3
>         check_period            24x7
>         contact_groups          jboss-admins
>         notification_interval   30
>         notification_period     24x7
>         notification_options    w,u,c,r,f
> }
> 
> The following contact is setup for the jboss-admins groups:
> 
> define contactgroup{
>  contactgroup_name jboss-admins
>  alias JBoss Administrators
>  members wnorth
> }
> 
> The following contact is setup for wnorth:
> define contact{
>         contact_name                    wnorth
>         alias                           Wes North
>         service_notification_period     24x7
>         host_notification_period        24x7
>         service_notification_options    w,u,c,r,f
>         host_notification_options       d,u,r,f
>         service_notification_commands   notify-by-email
>         host_notification_commands      host-notify-by-email
>         email                           wnorth at verizon.net
> }
> 
> If I bring a host offline I see the following alert in the nagios.log:
> 
> [1168037707] HOST NOTIFICATION:
wnorth;ebro;DOWN;host-notify-by-email;CRITICAL - Host Unreachable
(10.0.33.8)
> [1168037767] HOST ALERT: ebro;UP;HARD;1;PING OK - Packet loss = 0%, RTA =
0.40 ms
> [1168037767] HOST NOTIFICATION: wnorth;ebro;UP;host-notify-by-email;PING
OK - Packet loss = 0%, RTA = 0.40 ms
> 
> But if I bring a web service offline it fails to email me. I don't know
why, I have specified everything correctly. Any insight would be much
appreciated.
> 
> -Wes
> 
> 
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null



-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list