How to troubleshoot when not receiving alerts?]

Marc Powell marc at ena.com
Fri Jul 25 06:12:55 CEST 2008


On Jul 24, 2008, at 4:59 PM, John Oliver wrote:

> No, nothing is getting logged.  But then, there are very few logs
> compared to the number of hosts / services it's monitoring... it looks
> like only emails are being logged.  I looked in nagios.cfg for a  
> logging
> level type of option, but no dice.

There are several variables that control logging in nagios.cfg. Look  
for log_ and debug_ in http://nagios.sourceforge.net/docs/3_0/configmain.html 
. I believe the default configuration is to log initial states, hard  
state changes, event handlers and notifications.

> It was working yesterday.  I was getting emails from this plugin every
> 24 minutes (notification_interval was 1440).  They were all errors.  I

Unless you've changed interval_length from it's default of 60, all  
_interval parameters are minutes, not seconds so that seems strange.

> thought I had the errors fixed... the last email I got said RECOVERED
> (even though I should be getting CRITICAL alerts, as there is 1% disk
> space left).  I changed the notification_interval, and never saw  
> another
> email.

Does the web interface show the status as CRITICAL? If you received a  
recovery notification the service was considered to be OK. What did  
you fix?

> This AM, I set notification_interval to 60  I should get an email  
> every
> minute.  I'm not.  And, yes, I'm restarting nagios ;-)
>
> Here's the stanza in services.cfg:
>
> define service{
>        use                             generic-service         ; Name
> of service template to use
>        host_name                       ftp
>        service_description             Disk Space
>        is_volatile                     0
>        check_period                    normalbusinesshours
>        max_check_attempts              3
>        normal_check_interval           120
>        retry_check_interval            10
>        contact_groups                  FTP_Alerts
>        notification_interval           60
>        notification_period             normalbusinesshours
>        notification_options            w,u,c,r
>        check_command                   check_remote_disk1
>        register                        1
>        }

Having notification_interval < normal_check_interval might be  
problematic. I am under the distinct impression that notification  
logic is only called after a check of the host/service. I don't have  
convenient access to the source right now to verify though.

Additionally, this service is not set is_volatile (they normally are  
not volatile). Nagios will only send a notification for it for a hard  
state _change_ unless there is some other escalation definition  
applied to it. This is normal.

> And I can check the remote system from the command line:
>
> [root at cerberus ~]# /usr/lib/nagios/plugins/check_nrpe -H ftp -c
> check_disk
> DISK OK - free space: / 2321 MB (1% inode=99%);|
> /=133114MB;142786;142796;0;142806

We'd have to see the actual command definition for check_disk from  
nrpe.conf on the remote host but it seems that you've indicated that  
1% free disk space is OK. Does it happen to be that you've specified  
your warning and critical levels in KB, not %? That's an easy mistake  
to make. Also, as a general rule you shouldn't test nagios plugins as  
root. It's common, but not likely in this case, that you'll see  
different results due to the difference in privilege levels between  
nagios and root.

> Yes, I just noticed the discrepancy between contact_groups in
> services.cfg and hosts.cfg  I doubt that's the issue, as I was getting
> emails yesterday.

It seems to me you're not receiving notifications because hard state  
changes are not occurring. This is generally desired behavior.

--
Marc


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list