Trying to understand check_by_ssh

Paul L. Allen pla at softflare.com
Wed Mar 30 01:01:56 CEST 2005


Noel Carroll writes: 

> I'm trying to figure out the best way to do some monitoring on some remote 
> hosts.

There is no "best" way for everyone.  Just various ways that have tradeoffs
and you have to decide which is the least-worst choice given your
circumstances. 

My take on the various options: 

1) NRPE 

This is OK if each machine you're monitoring is directly accessible and
running the the NRPE daemon, but the config syntax is greatly different
from that of nagios itself so it's a bit of a pain to configure and keep
things in synch when you add services.  If you have a choice between
configuring NRPE checks and banging your head against a brick wall, find
yourself a brick wall - the headaches go away a lot quicker after you stop
doing it. 

If you have machines behind a firewall that are not accessible directly
then you can do all the monitoring by NRPE from a directly-reachable
machine (such as the firewall itself) by having it invoke check_by_ssh
or some other method.  But then the configuration headaches multiply
exponentially - unless you allow NRPE to accept arguments (by default
it doesn't), which simplifies configuration but is a BIG security risk
(unless you use the SSL option and even then it's not very safe).  I have
tried monitoring hosts that were not directly-reachable this way and I'd
rather flay my skin off with a cheese grater then roll in some salt than
do that again. 

NRPE does NOT do any authentication.  The only protection you have is
NRPE's allowed_hosts config option (rudimentary checks) and tcpwrappers.
But tcpwrappers can be spoofed, although only somebody sniffing your
network traffic would get answers.  In your situation you may not care
that some packet-sniffing techy can tell if one of your services is down
if he spoofs his IP address.  OK, if you use the SSL option the packet-
sniffing techy can't tell what requests he needs to send to NRPE, but
trial-and-error can probably expose a lot of your services if you use
comprehensible naming conventions.  If you check the disk space on
not-directly-visible machine X with check_w12rJksdh he probably won't
get anywhere, but if you use check_filestore_disk he probably will (after
a lot of failed attempts that you''ll catch with an IDS if you run one). 

If your monitoring host and monitored hosts are all on the same internal
network then the packet-sniffing stuff isn't a problem (provided you
trust everyone on that network segment).  If you're monitoring external
sites then it is a problem. 

2) Check_by_ssh 

This works if you're checking machines directly.  The overheads of
establishing an SSH connection on a congested network may cause you
problems with plugin timeouts unless you over-ride the defaults. 

If you have to check machines indirectly then you need a script that has
been posted here before called something like deep_check_by_ssh.  This
is going to increase the time delays and may result in plugin timeouts
unless you over-ride them.  It's also very inelegant (but inelegant
sometimes pays the bills). 

Andreas will be along shortly to point out a security hole in check_by_ssh
IF somebody can compromise your monitoring host to the extent that they
can become the nagios user.  That hole allows somebody who can become
nagios on your monitoring host to become nagios on your monitored hosts
and execute arbitrary commands as the nagios user.  That may expose
sensitive data if you're lax about permissions.  That may allow local root
exploits on your monitored hosts.  However, if your monitoring host runs a
subset of the services your monitored hosts do then any compromise of the
monitoring host can be applied directly to the monitored hosts, so it's
not a real problem.  If your monitoring host runs services that your
monitored hosts do not then it IS a worry because your monitoring host
could be compromised via one of the services your monitored hosts don't
run - any service you run on your monitoring host that isn't run on your
monitored hosts is likely to be little-used and therefore not examined
much for security holes. 

3) NSCA 

This runs on your monitoring host.  However, at least one of the directly-
visible monitored hosts must have nagios (bare bones, you don't need the
web interface) installed on it.  The directly-visible monitored host
checks itself and hosts behind firewalls that are not directly visible
(using check_by_ssh or NRPE or NSCA) and submits passive check results to
the monitoring host. 

NSCA uses one of a user-configurable choice of encryption algorithms for
the data in a way which also authenticates the user.  As long as you choose
a good encryption algorithm no packet spoofer/sniffer (apart from the US
National Security Agency, the UK's GCHQ and the like) will learn that one
of your services is down or submit a bogus passive check result. 

Configuration on the host submitting passive check results is very
similar to that on the monitoring host.  In simple situations you could
write a script to take the config from the host submitting passive check
results and turn it into a config for the monitoring host, but you'll
probably have to generate it by hand (but it's still a lot easier than
translating from nagios config to NRPE config). 

With nagios 1.x you cannot submit passive host checks, which means that
if you're monitoring machines which are not directly visible you can't
tell if they're down or not.  With nagios 2.x you can submit passive host
checks.  Actually, the documentation is contradictory about 1.x passive
host checks - in one place Ethan says it's not possible, in another he says
it's possible but complex so he won't bother explaining how it can be done. 

NSCA has only one password, which is a problem if you want to monitor
customer sites and ensure that none of your customers can interfere with
any of the others.  If you're in that situation you have to run separate
instances of the NSCA daemon with different config files on different ports
so each customer has a different password.  A hassle, but not a major one. 

Where NSCA scores is if you have a large network.  Having satellite
monitoring servers submit passive host checks takes a lot of load off
your monitoring server. 

Like I said, there are trade-offs.  I doubt I've listed all of them.  I
doubt I've listed all the ones that I've seen mentioned here, just the ones
I can remember right now.  And I've probably made some mistakes (coming up
on midnight here)  Unless your boss is paranoid about security then
it sounds like check_by_ssh is what you need, unless you're going to be
monitoring a large network or need redundant monitoring.  In the end the
only way you'll know for sure is to try all of them yourself and form your
own opinion rather than blindly accept the opinions of others. 

-- 
Paul Allen
Softflare Support 



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list