Understanding check_cluster

Lee Azzarello lee at dropio.com
Wed Feb 25 01:03:16 CET 2009


Here's my config. It's functional:

define command{
  command_name	    check-cluster-health
  command_line	    /usr/lib/nagios/plugins/check_cluster --service -l
$ARG1$ -w $ARG2$ -c $ARG3$ -d $ARG4$
}

define service{
  service_description	check-cluster-health
  host			app-proxy
  check_command		check-cluster-health!"App Thread
Health"!0!1!$SERVICESTATEID:app-1:mongrel-count$,$SERVICESTATEID:app-2:mongrel-count$,$SERVICESTATEID:app-3:mongrel-count$,$SERVICESTATEID:app-4:mongrel-count$
  use			serviceClusterTemplate
}

define service{
  service_description	mongrel-count
  hostgroup		app-servers,manager-servers
  check_command		check_nrpe_1arg!check_mongrel_count
  notifications_enabled	0
  use			serviceClusterTemplate
}

-lee

On Tue, Feb 24, 2009 at 5:18 PM, Chris Beattie <cbeattie at geninfo.com> wrote:
> I need some help understanding the check_cluster plugin, please.  I’m using
> version 1.4.13 of the plugins on Nagios 3.10, all compiled from source on
> 64-bit CentOS 5.2.  We use VMWare ESX clusters, and I’d like the hosts in
> Nagios that happen to be virtual machines to have one parent instead of a
> list of parents comprising every ESX host in the cluster.  Recently, an ESX
> host was moved from one cluster to another, so I had to change a lot of
> parents.  If there’s a better way to represent VMs and their hosts, I’m open
> to suggestions too.
>
>
>
> I don’t have any problem running it as the Nagios user from the command line
> and feeding it states, like so:
>
> ./check_cluster --host --data=0,0,2,1 --warning=0 --critical=1
>
> CLUSTER CRITICAL: Host cluster: 2 up, 1 down, 1 unreachable
>
> ./check_cluster --host --data=0,0,0,0 --warning=0 --critical=1
>
> CLUSTER OK: Host cluster: 4 up, 0 down, 0 unreachable
>
> ./check_cluster --host --data=0,0,0,1 --warning=0 --critical=1
>
> CLUSTER WARNING: Host cluster: 3 up, 1 down, 0 unreachable
>
>
>
> Adding --verbose just says “check_cluster - Warning: start=0 end=0;
> Critical: start=0 end=1” first.
>
>
>
> However, if I try anything with the $HOSTSTATEID$ macro, everything is
> always OK, even if I just make up host names:
>
> [./check_cluster --host
> --data=$HOSTSTATEID:duck$,$HOSTSTATEID:cow$,$HOSTSTATEID:chicken$
> --warning=0 --critical=1
>
> CLUSTER OK: Host cluster: 3 up, 0 down, 0 unreachable
>
>
>
> I thought maybe macros work better when executed by Nagios, so I added
> check_host_cluster command a host with that as its check_command.
>
> define command {
>
>         command_name    check_host_cluster
>
>         command_line    $USER1$/check_cluster --host --label=$HOSTNAME$
> --warning=$ARG1$ --critical=$ARG2$ --data=$ARG3$
>
> }
>
>
>
> define host {
>
>         use             linux-server
>
>         host_name       ProductionCluster1
>
>         alias           Production Cluster 1
>
>         address         127.0.0.1
>
>         parents         gisesx1,gisesx3,gisesx4
>
>         check_command
> check_host_cluster!1!2!$HOSTSTATEID:foo1$,$HOSTSTATEID:foo3$,$HOSTSTATEID:foo4$
>
>         hostgroups      nogsupport
>
> }
>
>
>
> The check_interval for the linux-server template is set to 3.  I made the
> assumption that it didn’t matter what I set the address to since I’m only
> interested in the state of other hosts, and it’s not being referenced in the
> check_command.
>
>
>
> It shows up in the host information web page as being up, but I don’t have
> any hosts named foo:
>
> Host Status:
>
>   UP
>
>  (for 0d 3h 41m 9s+)
>
> Status Information:         CLUSTER OK: ProductionCluster1: 3 up, 0 down, 0
> unreachable
>
>
>
> I had better luck with check_icmp, but it looks like it goes straight to
> CRITICAL if one host is down.
>
> This message (including any attachments) is intended only for
> the use of the individual or entity to which it is addressed and
> may contain information that is non-public, proprietary,
> privileged, confidential, and exempt from disclosure under
> applicable law or may constitute as attorney work product.
> If you are not the intended recipient, you are hereby notified
> that any use, dissemination, distribution, or copying of this
> communication is strictly prohibited. If you have received this
> communication in error, notify us immediately by telephone and
> (i) destroy this message if a facsimile or (ii) delete this message
> immediately if this is an electronic communication.
>
> Thank you.
>
>
> ------------------------------------------------------------------------------
> Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
> -OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
> -Strategies to boost innovation and cut costs with open source participation
> -Receive a $600 discount off the registration fee with the source code: SFAD
> http://p.sf.net/sfu/XcvMzF8H
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting
> any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>

------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list