Understanding check_cluster

Chris Beattie cbeattie at geninfo.com
Tue Feb 24 23:18:15 CET 2009


I need some help understanding the check_cluster plugin, please.  I'm
using version 1.4.13 of the plugins on Nagios 3.10, all compiled from
source on 64-bit CentOS 5.2.  We use VMWare ESX clusters, and I'd like
the hosts in Nagios that happen to be virtual machines to have one
parent instead of a list of parents comprising every ESX host in the
cluster.  Recently, an ESX host was moved from one cluster to another,
so I had to change a lot of parents.  If there's a better way to
represent VMs and their hosts, I'm open to suggestions too.

 

I don't have any problem running it as the Nagios user from the command
line and feeding it states, like so:

./check_cluster --host --data=0,0,2,1 --warning=0 --critical=1

CLUSTER CRITICAL: Host cluster: 2 up, 1 down, 1 unreachable

./check_cluster --host --data=0,0,0,0 --warning=0 --critical=1

CLUSTER OK: Host cluster: 4 up, 0 down, 0 unreachable

./check_cluster --host --data=0,0,0,1 --warning=0 --critical=1

CLUSTER WARNING: Host cluster: 3 up, 1 down, 0 unreachable

 

Adding --verbose just says "check_cluster - Warning: start=0 end=0;
Critical: start=0 end=1" first.

 

However, if I try anything with the $HOSTSTATEID$ macro, everything is
always OK, even if I just make up host names:

[./check_cluster --host
--data=$HOSTSTATEID:duck$,$HOSTSTATEID:cow$,$HOSTSTATEID:chicken$
--warning=0 --critical=1

CLUSTER OK: Host cluster: 3 up, 0 down, 0 unreachable

 

I thought maybe macros work better when executed by Nagios, so I added
check_host_cluster command a host with that as its check_command.

define command {

        command_name    check_host_cluster

        command_line    $USER1$/check_cluster --host --label=$HOSTNAME$
--warning=$ARG1$ --critical=$ARG2$ --data=$ARG3$

}

 

define host {

        use             linux-server

        host_name       ProductionCluster1

        alias           Production Cluster 1

        address         127.0.0.1

        parents         gisesx1,gisesx3,gisesx4

        check_command
check_host_cluster!1!2!$HOSTSTATEID:foo1$,$HOSTSTATEID:foo3$,$HOSTSTATEI
D:foo4$

        hostgroups      nogsupport

}

 

The check_interval for the linux-server template is set to 3.  I made
the assumption that it didn't matter what I set the address to since I'm
only interested in the state of other hosts, and it's not being
referenced in the check_command.

 

It shows up in the host information web page as being up, but I don't
have any hosts named foo:

Host Status:        

  UP  

 (for 0d 3h 41m 9s+)

Status Information:         CLUSTER OK: ProductionCluster1: 3 up, 0
down, 0 unreachable

 

I had better luck with check_icmp, but it looks like it goes straight to
CRITICAL if one host is down.


Nothing in this message is intended to make or accept and offer or to form a contract, except that an attachment that is an image of a contract bearing the signature of an officer of our company may be or become a contract. This message (including any attachments) is intended only for the use of the individual or entity to whom it is addressed. It may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, we hereby notify you that any use, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this message in error, please notify us immediately by telephone and delete this message immediately.

Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20090224/223604e6/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list