Attempting to monitor the "Nagios Server" itself

Marc Powell marc at ena.com
Wed Aug 13 00:33:57 CEST 2008


Please always respond on list so that others, now and in the future,  
obtain the benefit of your experience. More below...


On Aug 12, 2008, at 4:36 PM, Bret Goodfellow wrote:

> Hi Marc,
>
> Thanks for the quick response!  I added the following host definition:
>
> ######################################################################################
> # 'colorado' host  
> definition                                                         #
> ######################################################################################
> define host{
>        use                     generic-host            ; Name of  
> host template to use
>
>        host_name               colorado
>        alias                   colorado
>        address                 10.8.64.201
>        check_command           check-host-alive
>        contact_groups          linux-admins,linux-admins-page,oracle- 
> admins
>        max_check_attempts      10
>        max_check_attempts      10
>        notification_interval   480
>        notification_period     24x7
>        notification_options    d,u,r
>        }
>
> After adding this definition, I noticed on the nagios monitor that I  
> have "1" pending host.  This "pending status" never changes.  The  
> pending host of course, is colorado.

Pending is normal for a host with no services. It'll never be checked.  
That's expected.

> Adding the host definition for "colorado" only, does not cause  
> nagios to fail.

Good to know.

> The failure occurs when I add the attached "services" config file.   
> If I remove the "colorado" services config file, then nagios starts  
> up and runs fine.  My belief though, is that there is something  
> wrong with the "host" definition file.

Why do you believe that? It looks OK to me. I would think the problem  
is somehow associated with the newly included services file or  
something external to nagios.

> Since the server I want to monitor is the "localhost", do I need to  
> replace the alias with the name "localhost".

No, the alias doesn't matter, it's just for humans to know what the  
machine is. I don't see anything obviously wrong with the file*. I  
would try adding it in chunks of a few definitions at a time and see  
which one causes nagios to segfault. Jon Agliss's strace suggestion is  
good as well. I'd use 'strace -fs512 /usr/local/nagios/bin/nagios -d / 
usr/local/nagios/etc/nagios.cfg' myself.

* Some of the tests you're doing, just based on their names, could be  
implemented better. For example, why are you ssh'ing to the same  
machine, essentially a localhost IP, (check_ssh_disk) or using snmp  
(check_snmp_storage) to check disks when it seems that just using  
check_disk would do the same thing without the hoops and points-of- 
failure.

--
Marc


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list