Monitoring large (ish) numbers of servers with exceptions to the rules...

Anthony Montibello amontibello at gmail.com
Wed Jun 18 01:00:16 CEST 2008


Hi,

Using REgExp and Object Templats is a key for optimizing maintenance.

I read some good details on handling what needs to be configured and what
can be inhereted and automatically associated in the current Nagios 3
Documentation.   I think much of the framework was in Nagios 2, but the
documentaiton is a bit easier to read in nagios 3 so look at that for some
tips. then check the nagios 2 docs to see if the option is also in there.

A few years ago I converted a nagios 1.2 were all hosts and services were
defined in a single to file to a scalable configuration similar to what was
initialy described here.

I found that if you have a need of suporting different clients with daily
changes it was convient to have one Config directory for each clinet then in
that directory have a single host file, and for each host a seperate Config
file.

on a host being removed it is just a matter of removing it from the Host
file configuration and renaming its Config file.
on adding a new host is was only adding it to the host file, then adding
copy an existing service file and then cut and past to get all the services
defined.

then maintain the entire directory substructer through CVS or some other
version controle.
This as noted does get tedious to maintain, but it alows for customization
of services per host without much thinking.
The Disadvantage of this is the time involved for maintaining,  when there
are few changes getting made.

OTHER options using templates work well,
setting up Inheritance, using REG EXP as well as , other techniques using
HostGroups all assist with orginizing the files but depending on skill
levels  somtimes lead to less readability (Whle for other admins it would
lead to easier maintenance)

Hope this helps,


On Tue, Jun 17, 2008 at 8:22 AM, Wheeler, JF (Jonathan) <
J.F.Wheeler at rl.ac.uk> wrote:

> > -----Original Message-----
> > From: nagios-users On Behalf Of Matthew Macdonald-Wallace
> > Sent: 17 June 2008 13:14
> >
> > I currently help maintain and monitor around 50 servers across various
> > parts of the UK using Nagios 2.  At the moment, we have a
> configuration
> > file for each host (%hostname%.cfg) and in that file we specify all
> the
> > services for the named host.
> >
> > We are trying to reduce the number of configuration files as we take
> on
> > more and more servers because there are a large number checks that we
> > need to be rolled out to all servers and we feel that we are
> > duplicating our workload.
> >
> > I'm open to ideas on how to achieve this however my thoughts were a
> > setup along the lines of the following:
> >
> >  - A "master" host template is created in which all services are
> defined
> >    for a host.
> >
> >  - If a check does not need to be run for a given host (for example it
> >    is not a web server), a stanza is added to that particular host's
> >    config file that effectively tells nagios "don't check for this
> >    service on this host"
> >
> > I've tried defining all the services in a master templates file and
> > this works perfectly however when I come to exclude certain services,
> I
> > am at a loss on how to do it.
> >
> > Initially I tried adding a stanza with the same service name and
> > "register 0" as one of the options, however this didn't work.
> >
> > We have used HostGroups in the past to achieve a similar goal, however
> > we ran into the issue that whilst we need to check the CPU Usage on
> all
> > of the servers, a few of the servers that we monitor can take a lot
> > more of a beating than the majority.  This lead to us defining the CPU
> > checks on a per-host basis as if we defined it separately from the
> > hostgroup for the more powerful servers we presented with a load of
> > errors regarding duplicate service names.
> >
> > I hope I've made myself clear on what we're after and I look forward
> to
> > receiving your input on this.
>
> One thing that I use in the configuration that I maintain is to have
> something like this:
>
> define service{
>        use                     generic-hung-mounts
>        hostgroup_name          experiments
>        hosts                   !lfc0448
>        contact_groups          experiments
> }
>
> where "lcg0448" is a host in host group "experiments" and I want to
> apply the "generic-hung-mounts" check to all hosts in that group except
> for "lcg0448".
>
> This can lead to configuration like this:
>
> define service{
>        use                     check-pbs-offline
>        hostgroup_name          workers
>        hosts                   !lcg0614,!lcg0617,!lcg0618,!lcg0626
>        contact_groups          tier1a
> }
> define service{
>        use                     check-pbs-offline
>        hosts                   lcg0614,lcg0617,lcg0618,lcg0626
>        contact_groups          tier1a,grid-team
> }
>
> where the only difference is that the hosts in the second definition
> have a second contact group.
>
> HTH
>
> Jonathan Wheeler
> e-Science Centre
> Rutherford Appleton Laboratory
>
> -------------------------------------------------------------------------
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://sourceforge.net/services/buy/index.php
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20080617/11756fb2/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list