check_by_ssh returning UNKNOWN

Paul L. Allen pla at softflare.com
Fri Jan 9 16:29:16 CET 2004


Rasmus Plewe writes: 

> I refer to it by IP. 

Oh.  That's that theory dead, then. 

> That's what I always do (for no logical reason whatsoever, I don't like 
> su). 

There is good reason in this case.  If you su to nagios from a different
user and then ssh somewhere, the host key for the destination machine
goes into the original user's list of host keys, not the nagios user's
list.  Which means that you think you've established that check_by_ssh
will work because you accepted the host key but really you haven't.  A
bit of a gotcha if you don't know where SSH really puts the key. 

>> 3) Fix whatever SSH complains about (usually the first manual login
>> gets rid of the xauthority message that trips up check_by_ssh). 
> 
> No complaints. And, as I said: Sometimes it returns OK, but mostly 
> UNKNOWN. 

Check the number of sshd processes running on the box being monitored.
You could be hitting some sort of connection limit.  Another possibility
is that there are intermittent network problems between the two machines -
especially if one of them is on ADSL (ADSL from some companies in the
UK is very unreliable and often goes away for 5 or 10 minutes late at
night). 

>> Now, if you know a way of using check_by_ssh to itself run check_by_ssh
>> on another box, that would be something I'd find useful. :) 
> 
> If I understood you correctly: for certain... sick values of $useful ;-) 

For complex reasons, which involve decisions made by a PHB.  We
want to check a remote site with external firewall -> dmz/internal
firewall -> internal machines.  The internal machines can only be
reached from the internal firewall. 

I could get around it with nrpe, and that's what I started out doing.
But it's not very secure because IP spoofing would let somebody mount
a DoS attack.  More to the point, the machine doing the monitoring
lives at the end of the PHB's 2M ADSL and sometimes the IP changes. 

Yes, I know, but bandwidth at our public servers (the logical place to put
it) is charged to us by the amount we use while the PHB's ADSL is flat
rate and is not fully utilized.  Also, if nagios is on our public servers
and the link to our public servers goes down, it's going to have a
difficult job letting anybody know.  Also, the servers are in a secure
facility about 30 minutes away from any of our engineers whereas the PHB
can get at his machine very quickly most of the time.  So I'm stuck
with a monitoring machine on an IP that can change and I don't want
e-mail and SMS tormenting me because his IP changed and I then have to
go around dozens of machines changing hosts.allow lines.  The only fix,
if I stick with NRPE, is to allow anyone to connect, and that's something
I'm not happy with. 

I could get around it with NCSA, I think, but it looks a little messy
and I'm not entirely sure it's feasible for what we want to do.  Also,
as with NRPE, it involves two sets of configurations to be maintained,
which is a bit of a pain. 

I could get around it by opening a port on the external firewall and
redirecting it to the ssh port on the internal firewall.  Which is
probably the way I'll end up doing it. 

check_by_ssh chaining was something I tried but kept getting a broken pipe.
Maybe I was doing something wrong, maybe not. 

> *sigh* Since I'm out of logically possible explanations, I will 
> probably have to start with wild guesses and phenomenological 
> trial and error.

Rebuild the machine being monitored.  Windows users have to do that
every few months... 

-- 
Paul Allen
Softflare Support 




-------------------------------------------------------
This SF.net email is sponsored by: Perforce Software.
Perforce is the Fast Software Configuration Management System offering
advanced branching capabilities and atomic changes on 50+ platforms.
Free Eval! http://www.perforce.com/perforce/loadprog.html
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list