Nagios crashes badly and takes out both machines!

Jonathan Soong jon.soong at imvs.sa.gov.au
Thu Apr 29 09:21:50 CEST 2004


Hi there

I'm looking for some help.

Over the last week my mail server and the machine monitoring it with
Nagios has crashed 3 times at the same time.

I'm not sure if it is the Nagios machine crashing and taking my mail
server with it somehow or the other way around.

In both situations i have seen increased load on my mail server, to the
point of nrpe sending me a socket timeout warning. Shortly after this
the machines become unusable and a hard-reboot is the only way to fix it.

When both machines crash (mailserver=Redhat 9, nagio=fedora), i've gone
to the console on both machines and they are both filled with messages
saying "status=0". This is on BOTH machines. At this point it does not 
accept a login (you can still type, but it hangs once you put the 
username in)

I'm running nrpe on the mailserver checking load, number of processes,
disk space etc. The only anamolous thing is that i run my own plugin
which i called check_ps which scans 'ps' for a given process (just so i
know postfix is actually running!).

I was wondering if anyone could confirm whether or not it is Nagios that
is crashing my machines???

Kind Regards

Jon

-- 
Jonathan Soong
Information Services
Institute of Medical and Veterinary Science (IMVS)
Email:   jon.soong at imvs.sa.gov.au
Web  :   http://www.imvs.sa.gov.au
Tel  :   +61 8 82223095
Fax  :   +61 8 82223147	





-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click




More information about the Developers mailing list