Best way to manage host-specific thresholds?

Thomas Guyot-Sionnest dermoth at aei.ca
Fri Dec 21 07:16:23 CET 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 20/12/07 09:34 AM, Cipriani, Robert C wrote:
> Group,
> 
>  
> 
> I am trying to wrap my head around the best way to manage things like
> ping RTA/packet loss, partition free space warn/critical, cpu/memory
> use, etc.  I can set these up using custom object variables for a host
> template, and reference them via macros in the command definition.  If I
> need a different setting for a particular host, I can override these
> inherited values in that host’s config.  check_nrpe throws a bit of a
> wrench in this – I’d probably need a different command set up for each
> item I’d like to check via NRPE.  Does it sound like I’m on the right track?

The way I decided to go is to have variable NRPE configs on the servers.
I use a generic nrpe.conf with no command defined and use "include_dir"
instructions to add directories where config files resides. I have
default config definitions that I push to all servers or groups of
servers, and the ability to override commands on specific servers (when
re-defining commands, the latter one takes precedence so having an
include_dir that comes after the generic include_dir for the override
configs will do the trick).

I have a few bash scripts that help me copy the various config files to
the servers when needed. Ideally I'd like to write a small Perl
application that parses the Nagios config or status file (I suggest the
status file as I never been able to get Nagios::Config to handle my
special templated configuration) to allow specifying Nagios
hostgroups/servicegroups as list of hosts. Might be done one day but
there's no promise :)

> Another item I’m struggling with is how to monitor partitions easily. I
> can check all filesystems by just passing “/” as the argument. This
> makes it easy since I don’t care what the separate partitions are – if
> there is /var, /usr, and so on these will automatically be checked.  One
> problem is that if any one of these exceeds the threshold, the
> notification will occur, even if all the others are fine.  I am trying
> to avoid having to set up a service for each partition on each host. 
> Any thoughts on this? I’d rather use Nagios for this rather than
> something like Vertias Volume Manager’s space monitoring.

For that one, I must say we don't have many partitions to monitor... 95%
of our servers have only one partition, "/" (the other ones have a SCSI
raid, FC-attached storage or ramdisk (tmpfs) used by a server-specific
application).

The way we go is to have the check commands numerically ordered for all
unusual partition. check_disk0 is / and is distributed on all servers
(If needed we can override them as I pointed our earlier), then on
server-specific config files we have some check_disk1 and check_disk2
(we're not going any further than that!). We also have check_ramdisk for
some hostgroups (it's a tmpfs if you care). On the nagios side, I simply
have a service definition that includes all hostgroups (via template)
for check_disk0, then service definitions for check_disk1, check_disk2
and check_ramdisk that include the proper servers/hostgroups.


> My generic host template:

Roughly my config goes like this:

generic host template
  => hostgroup host template (1 per hostgroup)
    => host definition

The hostgroup host template defines the "hostgroups" and optionally
other hostgroup-specific options.

generic service template
  => hostgroup-specific service template (optionnal)
    => service definition


The hostgroup-specific service templates is used to group similar
hostgroups. For example all web servers, all HTTPS servers, all database
servers, etc. They only define a "hostgroup_name" list so I can add
hosts to the final definition with "host_name" (AFAIK Nagios 3 allow
stacking templates so this trick isn't that useful on a well designed N3
config).



The Nagios dcumentation is very complete and an excellent reference for
all kind of configuration you can do (the rest is up to your
creativity). It is definitely the best place to learn Nagios and most
tricks to can do with it. If you have questions regarding the above feel
free to ask too :).

Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHa1o36dZ+Kt5BchYRAhpcAJ4uHC7Q7YTrBJ3QPZ1hdEDijVjTtACg5+Lj
dya5sMGiae33ivzxgXd1ZA0=
=dcoe
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list