Service Alerts and Notifications

wnorth wnorth at verizon.net
Sat Jan 6 02:39:36 CET 2007


It does thanks much, it makes perfect sense, I didn't even realize that one
can specify the interval in seconds. I am pretty impressed with nagios as
is, compared to thinks like netcool or topaz it has quite a ways to go, but
for the simple checks, and even advanced scripting it is very powerful.

Thanks again,

-Wes

-----Original Message-----
From: Andy Shellam (Mailing Lists)
[mailto:andy.shellam-lists at mailnetwork.co.uk] 
Sent: Friday, January 05, 2007 4:34 PM
To: wnorth
Cc: 'Josh Yost'; nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] Service Alerts and Notifications

Don't know if it helps, but on my services, I check services every 5 
minutes.
If it fails, it retries 3 times every 30 seconds (so a max. of 1.5 
minutes) then it sends me an e-mail/SMS (because it switches to HARD state.)

What this will do...

max_check_attempts 3
retry_check_interval 5
normal_check_interval 5


...is it will check your service every 5 minutes - if it goes off-line, 
it will set it to a SOFT fail, wait 5 minutes then check again, if it 
still fails (2nd SOFT fail), wait 5 minutes, then check again, then if 
it fails a third time, you'll get a HARD fail - so in theory, if the 
service is down, you won't find out for 15 minutes.

What might be better is:

max_check_attempts 3
retry_check_interval 1
normal_check_interval 5


This will check your service every 5 minutes - if it fails, it'll re-try 
3 times with a 1 minute interval between each, so you'll get notified if 
it's still down after 3 minutes.

What you can also do is set retry_check_interval to a seconds interval, 
like:

max_check_attempts 3
retry_check_interval 5s
normal_check_interval 5

This tells Nagios to wait 5 seconds between non-OK states, and 5 minutes 
between active checks.

You could of course also set "max_check_attempts" to 2 and 
"retry_check_interval" to 1 - so the first-time it fails, it waits a 
minute then checks again - and if it still fails you get a notification, 
so in theory you only get a minute's lag.

hope this random rambling works for you :)

Andy.

wnorth wrote:
> That is actually interesting, when the host goes down I see a HARD service
> alert as follows:
>
> HOST ALERT: ebro;DOWN;HARD;5;CRITICAL - Host Unreachable (10.0.33.8)
>
> But for the check_http I only see the following:
>
> SERVICE ALERT: ebro;Website App Server MS2;CRITICAL;SOFT;3;Connection
> refused
>
> Once I changed the retry interval to 1 and the max attempts to 1 I saw the
> email, so I just wasn't waiting long enough...makes sense. In theory I
would
> want it to try 3 times in a row, if it fails send an email, then wait 5
> minutes and retry again.
>
> For that to work I tried the following: 
> max_check_attempts 3
> retry_check_interval 5
> normal_check_interval 5
>
> This should force it to try 3 times before setting a HARD alert and wait 5
> minutes between normal intervals, however that is not what it does, it
> appears it sets the retry_check_interval to 5 minutes between non-OK
service
> alerts, so if I tell it to try 3 times, it will try 3 times and wait
> in-between tries for 5 minutes, if I set it to 2 on the retry it will wait
2
> minutes in between tries, which comes out to a total of 6 minutes. I'd
> rather it fail after a minute or so, so if I set it to 0 it will inherit a
> standard minute...the only way to solve this is to set it at a 1 minute
> interval and just wait.
>
> Sound about right?
>
> -----Original Message-----
> From: Josh Yost [mailto:Josh.Yost at epsiia.com] 
> Sent: Friday, January 05, 2007 3:56 PM
> To: wnorth at verizon.net
> Cc: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Service Alerts and Notifications
>
> Hi,
> 	This is kind of stupid/obvious, but
>
> a) I don't see a HARD service alert in your log snip for the service.
> Did it actually get to that state?  Your retry interval is 3 min, so it
> would take you 15 min or so to get an alert.
>
> b) If it did get to HARD, what was the cmd it tried to run & is that a
> valid cmd?
>
> c) Did you kill all the old processes and restart Nagios w/ the new
config?
>
> I don't see anything obvious in your cfgs that wouldn't be working.
>
> - Josh
>
>
> wnorth at verizon.net wrote:
>   
>> I have setup a few host and HTTP service checks and alerts. When a host
>>     
> goes down I recieve an email, but when the check_http service fails (e.g.
> the TCP port is shutdown on the web server) I see the service alert in the
> nagios.log as follows:
>   
>> [1168038639] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;ebro;Website App
>>     
> Server MS2;1168038636
>   
>> [1168038644] SERVICE ALERT: ebro;Website App Server
>>     
> MS2;CRITICAL;SOFT;1;Connection refused
>   
>> [1168038824] SERVICE ALERT: ebro;Website App Server
>>     
> MS2;CRITICAL;SOFT;2;Connection refused
>   
>> [1168039004] SERVICE ALERT: ebro;Website App Server
>>     
> MS2;CRITICAL;SOFT;3;Connection refused
>   
>> But I do not recieve an email. The following service is defined:
>>
>> define service{
>>         host_name               ebro
>>         service_description     Website App Server MS2
>>         check_command           check_http_fitness_app
>>         max_check_attempts      5
>>         normal_check_interval   5
>>         retry_check_interval    3
>>         check_period            24x7
>>         contact_groups          jboss-admins
>>         notification_interval   30
>>         notification_period     24x7
>>         notification_options    w,u,c,r,f
>> }
>>
>> The following contact is setup for the jboss-admins groups:
>>
>> define contactgroup{
>>  contactgroup_name jboss-admins
>>  alias JBoss Administrators
>>  members wnorth
>> }
>>
>> The following contact is setup for wnorth:
>> define contact{
>>         contact_name                    wnorth
>>         alias                           Wes North
>>         service_notification_period     24x7
>>         host_notification_period        24x7
>>         service_notification_options    w,u,c,r,f
>>         host_notification_options       d,u,r,f
>>         service_notification_commands   notify-by-email
>>         host_notification_commands      host-notify-by-email
>>         email                           wnorth at verizon.net
>> }
>>
>> If I bring a host offline I see the following alert in the nagios.log:
>>
>> [1168037707] HOST NOTIFICATION:
>>     
> wnorth;ebro;DOWN;host-notify-by-email;CRITICAL - Host Unreachable
> (10.0.33.8)
>   
>> [1168037767] HOST ALERT: ebro;UP;HARD;1;PING OK - Packet loss = 0%, RTA =
>>     
> 0.40 ms
>   
>> [1168037767] HOST NOTIFICATION: wnorth;ebro;UP;host-notify-by-email;PING
>>     
> OK - Packet loss = 0%, RTA = 0.40 ms
>   
>> But if I bring a web service offline it fails to email me. I don't know
>>     
> why, I have specified everything correctly. Any insight would be much
> appreciated.
>   
>> -Wes
>>
>>
>> -------------------------------------------------------------------------
>> Take Surveys. Earn Cash. Influence the Future of IT
>> Join SourceForge.net's Techsay panel and you'll get the chance to share
>>     
> your
>   
>> opinions on IT & business topics through brief surveys - and earn cash
>> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when
>>     
> reporting any issue. 
>   
>> ::: Messages without supporting info will risk being sent to /dev/null
>>     
>
>
>
> -------------------------------------------------------------------------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the chance to share
your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null
>
> !DSPAM:37,459eeb74137101726516177!
>
>
>   


-- 
Andy Shellam
NetServe Support Team

the Mail Network
"an alternative in a standardised world"

p: +44 (0) 121 288 0832/0839
m: +44 (0) 7818 000834




-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list