Design question

Sean McAfee smcafee at collaborativefusion.com
Thu Jul 31 17:12:17 CEST 2008


Michael Weiner wrote:
> HMMMMMMM now you've peaked my interest. Anything you can share before
> i start building? I like the idea and wouldnt mind implementing a
> similar solution
>
> Michael
I just spent a while trying to come up with a comprehensive quick 
explanation, but it's just not possible.  The internal documentation for 
the system design is something like 15+ pages, the majority of which 
contains data that needs to be thoroughly sanitized.  This is the meat 
of it though and should give you an idea of what things need to be 
considered.  Feel free to ask questions about how I solved specific 
problems or suggest ways to improve it.

Here is the repo layout:
| root -- All non-object configs (nagios.cfg, cgi.cfg, resources.cfg, 
nsca.cfg, etc...)
|-- config -- All object configs (notify_cmds.cfg, templates, etc..).
|   `-- contacts
|-- htpasswd
|-- scripts -- All shell scripts (event handlers, self-promotion, etc...)
|   `-- checks -- Custom checks not found in the FreeBSD nagios-plugin port
`-- targets -- All hosts and services
    |-- exemptions -- See step 2.1 below - removes some "global" checks 
from individual facilities
    |   |-- facil0
    |   |-- facil1
    |   `-- facil2
    |-- global -- Checks to be run from ALL facilities
    |-- facil0 -- Checks for the slave instance at facilx
    |-- facil1
    `-- facil2


In order to comply with the automation requirements, a handful of DNS 
entries had to be created at each slave facility:

nagios-host.[facil].example.com
    This is a CNAME to the slave instance at each facility. It is used 
as the destination target for rsyncing configs.

nagios-master.[facil].example.com
    This is a CNAME to the master server. Due to the distributed nature 
of our setup and Nagios' use of hostnames as unique identifiers, this 
was required to give each slave server a unique target to monitor for 
self-promotion purposes.


The svn post-commit script does the following from the master instance:

   1. Checks out the newest version of the repo to /var/tmp/nagios/staging
   2. Creates directories at 
/var/tmp/nagios/[master|facil0|facil1|facil2] and rsyncs the repo into 
each one
         1. During this step, '.svn' is excluded and the -f option is 
used to specify exclusions for slaves: rsync -avz --delete-before 
--exclude=.svn 
--exclude-from=$STAGING_DIR/targets/exemptions/${this_facil} 
$STAGING_DIR/ ./${this_facil}
   3. Moves nagios-hq.cfg or nagios-slave.cfg (as appropriate) to nagios.cfg
   4. Uses grep & sed to perform search & replace on "magic" words:
          * FACIL_PLACEHOLDER: maximizes portability and automation of 
configs (examples: nagios-slave.cfg references 
cfg_dir=targets/FACIL_PLACEHOLDER to eliminate need for 
hand-manipulation; the "from" address in email is set to 
nagios_FACIL_PLACEHOLDER@; check_snmpagent!FACIL_PLACEHOLDER_[common 
suffix]; etc...);
          * FACIL_ROLE: dynamically adjusts service_templates.cfg; 
necessary to get the master instance to schedule active checks on ONLY 
his local checks (Nagios slaves, its own nsca daemon, its gsm modem); 
sets 0 for master, 1 on slaves
   5. Slaves without GSM capabilities only - 
[host|service]_notification_commands=notify-[host|service]-by-sms to 
notify-[host|service]-by-epager.
   6. Performs a local Nagios config validation for each facility 
(nagios -v /var/tmp/nagios/{facil})
   7. Rsyncs /var/tmp/nagios/{facil} to 
nagios-host.[facil].example.com:/usr/local/etc/nagios/
         1. $RSYNC -avz --delete-before $STAGING_ROOT/$this_facil/ 
nagios-host.[facil].example.com:/usr/local/etc/nagios/
   8. Peforms a remote Nagios config validation on each system
   9. Reloads Nagios via the rc script on each server

Self-promoption is done via an event handler script that echos  
ENABLE_NOTIFICATIONS, STOP_OBSESSING_OVER_HOST_CHECKS, 
STOP_OBSESSING_OVER_SVC_CHECK into the external command file should it 
lose contact with the Master instance.  Self-demotion is simply the 
inverse of that.

Sean McAfee
System Engineer

Collaborative Fusion, Inc.
 smcafee at collaborativefusion.com
 412-422-3463 x 4025

5849 Forbes Avenue
Pittsburgh, PA 15217

****************************************************************
IMPORTANT: This message contains confidential information
and is intended only for the individual named. If the reader of
this message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list