Nagios and cluster setup...few questions

Tarak Patel Tarak.Patel at ec.gc.ca
Tue Oct 9 15:32:07 CEST 2007


Hi all,

Here is a quick background of my current setup for monitoring:

I have an in-house tool monitoring clusters. The tool simply uses ssh to 
launch perl scripts on remote machines and grab all of the output to 
stores it on a central location in a logfile. This output is parsed and 
for any pre-defined tags (WARNING/CRITICAL/ERROR). If any of these tags 
are noticed the message is logged using syslog. The scripts residing on 
remote hosts is a collection of perl functions. Each one is executed one 
after another. Some of these functions utilize a status file from 
previous run to verify if state of items changed from last time. Some of 
these functions can be given a special argument to set the current state 
as default state for next iteration of checks.

Cluster are monitored from the head nodes since not all nodes are 
accessible from central location. Head node checks contain a special 
function that simply use DSH to launch checks on all nodes.

After looking at nagios and its check_cluster plugins I realized I would 
really like to monitor each of the nodes individually since I want to be 
able to disable a particular check on a particular node. Also I want to 
be able to use status files for some of the checks. As of now I have yet 
to find any plugin that utilizes a status file to monitor hosts. All 
plugin simply use current output from commands to verify the status.

I will be using active checks on the clusters therefore I will configure 
nrpe on all nodes. My plan of attack was to simply use head node as a 
gateway and all nodes and services to be defined on the head node  
(under nrpe). From central location I can simply execute a check_nrpe 
type script to verify backend nodes.

I still haven't figured out how I can use status files from each 
iteration of checks to validate status. I'd appreciate some inputs as to 
what are the best options in monitoring clusters where backend nodes are 
hidden from the central monitoring server. Also some help with use of 
state files.

Thanks all,

TP.


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list