Problem with check_cluster2?

Greg Vickers g.vickers at qut.edu.au
Thu Jan 27 06:57:57 CET 2005


Hi all,

I'm using Nagios to gather statistics on our central lab computers, and 
use that information to display availability graphs to the student body.

These lab hosts have a 'Lab PC available' service which is passive and 
is in state OK if the PC is at the C-A-D screen and in state CRITICAL if 
it is not (logged on, rebooting, got coffee spilt on the motherboard...)

We do a service cluster check on these services to see how many hosts 
are available (i.e. the PC is able to be logged onto) and we parse out 
the numbers from the cluster check of how many OK/CRITICAL states exist 
and convert that into a pretty web display.

I've just moved to Nagios 2.0b1 from 1.2 and upgraded to check_cluster2 
(due to the new format of the status record) and noticed something 
strange: The number that the cluster checks return do not add up to the 
numbers of OK and CRITICAL service checks that show up on the Service 
Status Details view of a given lab.

Here's an example:
A201 cluster check ok: 19 ok, 0 warning, 0 unknown, 2 critical

There are 87 hosts in A201, so there should be a total of 87 OK/CRITICAL 
states returned by the cluster check. There are more than two critical 
'Lab PC availability' services in the hostgroup for A201 and there are 
more than 19 'Lab PC availability' services in an OK state.

For all the other cluster checks that are performed, only up to 22 
results are included in the cluster check result, even though many more 
will be defined in the service definition and hence the cluster check 
(See example below.)

I've had a squiz in the check_cluster2.c code and I can't see anything 
that would hiccup at >22 services being shoved into it. (I would expect 
an error to occur if there was a limit on the line length? Truncation 
would occur?)

I've done testing and I've found the following results:
1) For the only lab that has <21 hosts, the cluster check works as 
expected. (Nine hosts in this lab.)
2) I incrementally changed the status of the 'Lab PC available' service 
in another lab to CRITICAL. As soon as have set the first 22 services to 
critical the cluster check will only return '1 ok, 21 critical'

So it seems that the check_cluster2 plugin will only return the status 
of the first 22 services and then only returns '1 ok, 21 critical' when 
the first 22 services are all critical. Has anyone seen this problem 
before? Argh!


Configs: (Service definition sanitised, the check_command line is all 
one line, it's wrapped by mail client)
define service{
    host_name                    <cluster-host>
    service_description          Lab PC availability in CA A201
    check_command 
check_service_cluster!CA-A201!88!89!$SERVICESTATEID:CA-A201-PC01:Lab PC 
available$,$SERVICESTATEID:CA-A201-PC02:Lab PC 
available$,$SERVICESTATEID:CA-A201-PC03:Lab PC 
available$,$SERVICESTATEID:CA-A201-PC04:Lab PC <snip hosts 05-83> 
available$,$SERVICESTATEID:CA-A201-PC84:Lab PC 
available$,$SERVICESTATEID:CA-A201-PC85:Lab PC 
available$,$SERVICESTATEID:CA-A201-PC86:Lab PC 
available$,$SERVICESTATEID:CA-A201-PC87:Lab PC available$
    contact_groups               <contact-group>
    use                          <template>
}

define command{
    command_name check_service_cluster
    command_line $USER1$/check_cluster2 -s -l $ARG1$ -w $ARG2$ -c $ARG3$ 
-d $ARG4$
}

Thanks,
-- 
Greg Vickers
Lab Monitor Project Manager
Teaching and Learning Services
Information Technology Services
Queensland University of Technology

email: g.vickers at qut.edu.au
phone: (07) 3864 8276

CIROS code: 00213J


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list