How do distributed setups work? (longish)

Tobias Klausmann klausman at schwarzvogel.de
Wed Nov 22 21:41:24 CET 2006


Hi all,

I'm having a conceptual/logical/mindset problem which I hope you
can help me with. It's a bit long, but the question/problem I
have is complex, so please bear with me.

What I dream of:

I have a central machine which is the interface to the users.
Using the/a web interface, the users can do exactly the same
stuff they can do with a single-host installation: acknowledge
problems, schedule downtimes, disable checks etc. It's the
CGI-side of current affairs, so to speak.

However, there are also N dedicated checking machines (I call
them "checkers") which work the same way as the Nagios core does
(i.e.  without the CGIs). There's no Apache running and none of
the Users really know about them (except that they had to poke
holes for them into their firewalls). These machines ideally only
do automatic scheduling of checks and execute the checks
themselves. As for the return values, info strings and perfdata
returned by the plugins, they simply pass them on to the central
machine described above.

This way, I can scale the entire setup if the/a checking machine
runs out of CPU/memory when scheduling checks. Also, I can build
dedicated checkers inside DMZs and the like.

As for notification, this could possibly be done by the checkers
directly, but then, acknowledgments and disabled notifications
(which are entered centrally) would have to find their way to the
checkers. I think handling notification centrally would be
better. Even if the central machine is overloaded with
notifications, it could be delegated to a dedicated machine that
is used as a smart host.

As far as the marketing goes ;) I had the impression that Nagios
and friends can do this kind of setup. However when I tried to
set up something like this, I ran into numerous problems.

1) Documentation for NSCA is - mildly put - lacking. As far as I
can tell, send-NSCA expects data tab-separated on stdin. It
would've been nice to actually see an example for getting host
and service data into it. Am I supposed to do something like
"printf $X$\t$Y$\t$Z$|send_nsca -H ..." for the OCSP command?

2) How does the information that a check should be disabled get
from the central machine to the checkers? I've found no "usual"
way of doing it?  Would it be necessary to setup some
distribution via SSH to the checkers?

3) All machines setup to be check passively (i.e. by a checker)
are displayed as "disabled" in the web front end. This is very
counter-intuitive (they *are* checked, after all). 

4) There would have to be some mechanism of config distribution.
Both the central machine and the checker need to agree on which
services there are. Otherwise, some checks would never be
executed or the central machine would ignore the submitted
results.

The only solution I have thought of so far which *might* work is
running NRPEs on the checkers which get used by the central
machine. This would mean that the checkers only have an NRPE and
the Nagios plugins.  For host internal checks, I'd have an "NRPE
cascade" or NRPE using check_snmp. This has the downside that the
central machine might run into congestion problems when
scheduling.

Another "solution" would be to have multiple completely Nagios
installations for different (sets of) projects. I'm very wary of
this.  I'm part of the team that is responsible for the whole
enchilada, i.e.  we need to have monitoring access to all of
those projects.  Having to log into N web front ends for a
"quick" overview is not really an option.  One might be able to
work with reverse proxying and/or custom-tailored CGIs here, but
I'd rather not.

So my question to the "big boys" out there: how exactly is a
distributed setup *supposed* to work?

Thanks for your time!

Regards,
Tobias

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list