First rev of plugin to batch up check_by_ssh calls

Steven Grimm koreth-nagios at midwinter.com
Fri Jan 17 05:57:58 CET 2003


I started setting up Nagios this week and quickly found that with my
site's mix of servers, checking the statuses of remote services was
getting to be a real headache.  I didn't want to scatter duplicate
per-host NRPE configuration files on all our clustered application and
HTTP servers, didn't want to suffer the overhead of a separate ssh
connection for each service on each host using check_by_ssh.  I saw
discussion on this list of using check_by_ssh to run multiple checks
in one go and report them back to Nagios as passive results, but that
would have meant constructing a separate check_by_ssh command object
for each unique combination of services on our various hosts.

So I wrote the following Perl script, which I call "batch_by_ssh",
which acts as a frontend to check_by_ssh and automates the construction
of passive-results-fetching command lines.

At a high level, the approach I took was to add a new "batch" service
mode that's halfway between active and passive.  You define these batch
services in the Nagios configuration as if they were going to run on
the monitoring host (with a slightly different syntax for specifying
the check command in the service object) but you set active_checks_enabled
to 0 and check_freshness to 1.  You can specify any number of batch
services for a given host.

Then you add one active service for the host, which runs batch_by_ssh.
batch_by_ssh scans the Nagios configuration to find all the batch
services for the host in question and runs check_by_ssh with the
appropriate command line to execute them all on the remote host.
Then it reports the results back to Nagios.

I also have an auxiliary script that generates servicedependency objects
based on the same configuration, so each host's batch services can be
marked as dependent on its batch_by_ssh service.

Hopefully I haven't duplicated someone else's work here, but I didn't
see anything like this after searching around the net and I think it
makes setting up remote monitoring a *LOT* easier.

batch_by_ssh can be found at

	http://www.midwinter.com/~koreth/nagios/batch_by_ssh

And the dependency-generating script:

	http://www.midwinter.com/~koreth/nagios/make_batch_dependencies

See the top of batch_by_ssh for documentation on the new config items.

Comments, bugfixes, etc. appreciated!  Once a few people other than me have
had a chance to try this out, I'll submit it to the Nagios plugins project,
naturally.

Here's an example configuration to probe the disk space and user count on
a remote host. This is the example in the documentation at the top of
batch_by_ssh, which has more details about what it all means, but
hopefully it'll give you an idea of what I'm talking about.  These all
go in the standard Nagios config files (hence the "#<>" in front of the
new keywords, so Nagios will ignore them.)

-Steve

P.S. Obviously you need to have passwordless ssh logins working before
     you can use this -- if you can't successfully run a remote command
     using the standard check_by_ssh plugin, this script won't be useful.


---
define host {
      host_name               myserver
      address                 1.2.3.4
#<>   $USER1$                 /usr2/nagios
}

define command {
      command_name            batch_by_ssh
      command_line            $USER1$/batch_by_ssh $HOSTNAME$
}

define command {
      command_name            check_local_disk
      command_line            $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}

define command {
      command_name            check_local_users
      command_line            $USER1$/check_users -w $ARG1$ -c $ARG2$
}

define service {
      use                     generic_service
      service_description     ssh
      host_name               myserver
      active_checks_enabled   1
      check_command           batch_by_ssh
      normal_check_interval   5
      retry_check_interval    1
}

define service {
      use                     generic_service
      service_description     User Count
      host_name               myserver,otherserver
      active_checks_enabled   0
      check_freshness         1
      freshness_threshold     430     ; 7 minutes = check interval + 2 retries
      check_command           no_report	; see Nagios freshness checking docs
#<>   batch_type              ssh
#<>   batch_command           check_local_users!20!25
}

define service {
      use                     generic_service
      service_description     /home disk space
      host_name               myserver,otherserver
      hostgroup_name          group1,group2
      active_checks_enabled   0
      check_freshness         1
      freshness_threshold     430
      check_command           no_report
#<>   batch_type              ssh
#<>   batch_command           check_local_disk!10%!5%!/home
}


-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache 
Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en




More information about the Users mailing list