Nagios Hang?

Mike Koponick mkoponick at redhawk.info
Wed Feb 15 17:32:35 CET 2006


Marc,

I doubled check the disk space last night thinking that might be the
issue, but I have plenty of space:

Filesystem            Size  Used Avail Use% Mounted on
/dev/hda3             109G   70G   33G  68% /
/dev/hda1              99M   28M   66M  30% /boot

As for the processes, I also thought of that scenario. All were killed
prior to restarting. I'm going to build a version of nagios with
debugging turned on this morning and run it.

Thanks!

Mike

Here are a couple of samples of my hosts/services from the sensor:

########################################################################
####

define  host {
        host_name                       Switch-35
        alias                           Switch-35
        address                         10.xx.xx.xx
        hostgroups                      Company_Switches
        max_check_attempts              10
        check_interval                  1
        active_checks_enabled           0
        passive_checks_enabled          1
        check_period                    24x7
        obsess_over_host                1
        check_freshness                 0
        event_handler_enabled           1
        flap_detection_enabled          1
        process_perf_data               0
        retain_status_information       1
        retain_nonstatus_information    1
        contact_groups                  Support
        notification_interval           2
        notification_period             24x7
        notification_options            d,u,r
        notifications_enabled           0
        register                        1
        }

########################################################################
####

########################################################################
####
define  service {
        hostgroup_name                  Company_Switches
        service_description             check_ping
        is_volatile                     1
        check_command                   check_ping!150.0,20%!200.0,60%
        max_check_attempts              2
        normal_check_interval           1
        retry_check_interval            1
        passive_checks_enabled          0
        active_checks_enabled           1
        check_period                    24x7
        parallelize_check               0
        obsess_over_service             1
        check_freshness                 0
        event_handler_enabled           0
        flap_detection_enabled          1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        contact_groups                  Support
        notification_interval           99
        notification_period             24x7
        notification_options            w,u,c,r,f
        notifications_enabled           0
        register                        1
        }

########################################################################
####


Hosts/Services from the Central Server:

########################################################################
####

define  host {
        host_name                       Switch-35
        alias                           Switch-35
        address                         10.xx.xx.xx
        hostgroups                      Company_Switches
        max_check_attempts              1
        check_interval                  1
        active_checks_enabled           0
        passive_checks_enabled          1
        check_period                    24x7
        obsess_over_host                1
        check_freshness                 0
        event_handler_enabled           1
        flap_detection_enabled          1
        process_perf_data               0
        retain_status_information       1
        retain_nonstatus_information    1
        contact_groups                  Support
        notification_interval           1
        notification_period             24x7
        notification_options            d,u,r
        notifications_enabled           1
        register                        1
        }

########################################################################
####

########################################################################
####

define  service {
        hostgroup_name                  Company_Switches
        service_description             check_ping
        is_volatile                     1
        check_command                   check_stale
        max_check_attempts              1
        normal_check_interval           2
        retry_check_interval            1
        active_checks_enabled           0
        passive_checks_enabled          1
        check_period                    24x7
        parallelize_check               1
        obsess_over_service             1
        check_freshness                 2
        freshness_threshold             660
        event_handler_enabled           1
        low_flap_threshold              0
        high_flap_threshold             0
        flap_detection_enabled          1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        contact_groups                  Support
        notification_interval           0
        notification_period             24x7
        notification_options            w,u,c,r
        notifications_enabled           1
        register                        1
        }

########################################################################
####


-----Original Message-----
From: nagios-users-admin at lists.sourceforge.net
[mailto:nagios-users-admin at lists.sourceforge.net] On Behalf Of Marc
Powell
Sent: Wednesday, February 15, 2006 8:21 AM
To: Nagios Users
Subject: RE: [Nagios-users] Nagios Hang?



> -----Original Message-----
> From: nagios-users-admin at lists.sourceforge.net [mailto:nagios-users-
> admin at lists.sourceforge.net] On Behalf Of Mike Koponick
> Sent: Wednesday, February 15, 2006 10:10 AM
> To: Nagios Users
> Subject: [Nagios-users] Nagios Hang?
> 
> 
> 
> I'm running Nagios 2.0 (Stable) on Redhat 9.0, in a distributed
> environment. I'm utilizing NSCA for checks and all appears to be
working
> properly.
> 
> 
> 
> I'm running into several issues that seemed to have "started all of a
> sudden".
> 
> 
> 
> 1)       On my distributed server, I don't see syslog messages any
longer,
> with the exception of "INITIAL SERVICE STATE" messages. Syslog is
working,
> and in the nagios.cfg file, "nagios.cfg:use_syslog=1" I used to see
all
> the check messages, etc. Nothing in the configuration has changed to
the
> best of my knowledge.
> 

Make sure you haven't run out of disk space. Verify your log_ settings
in nagios.cfg.
 
> 
> 2)       Nagios appears to "hang" on the remote sensor. Once I receive
> notifications that network devices are down, I never see a recovery of
the
> network devices, even though they are recovered. The work around is to
> restart nagios with "service nagios restart". Sometimes, this takes
> multiple tries.

Could be related to multiple nagios processes as below. One daemon sees
the down and another sees the up. What have you verified so far? I'd
check disk space, use strace to see what the daemon is doing, turn up
logging as much as possible for both nagios and nsca and watch the logs.
 
> 3)       When I have a massive network outage, I receive the
appropriate
> alerts but I receive multiple "PROBLEM" notifications. I'm only using
> service checks (I'm only using check_ping currently) and the
> notification_interval set to "0", which according to the documentation
> should limit the amount of messages I'm receiving to "1", unless I'm
using
> the service escalations, which I am not at this time. I am not
receiving
> multiple notifications for "OK" messages, which is what I would
expect.

Without seeing any example host and service config information this
sounds very much like you might have multiple nagios daemons running at
the same time. Stop nagios, make sure they're _all_ stopped and restart
nagios.

--
Marc


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log
files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=k&kid3432&bid#0486&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list