Nagios crashed?

Mick michaelkintzios at gmail.com
Wed Jan 30 22:26:10 CET 2008


Hi All,

For about a week Nagios had gone quiet and although I should have tweaked that 
something suspicious had happened, I had no previous reasons for concern.  
Until I was alerted by a user that they were getting no alerts when they knew 
that their network device had gone offline.

I quickly checked and indeed the nagios service status was shown as stopped!  
Nagios is installed on CentOS and looking at the logs I saw just this:
============================================
[1200807131] SERVICE ALERT: 
router1.XXXXXXXXX.com;Camera;CRITICAL;HARD;1;CRITICAL: - failed: A temporary 
error occurred on an authoritative name server. Try again later.
[1200807301] SERVICE ALERT: router1.XXXXXXXXX.com;WAP;CRITICAL;SOFT;1;(Service 
Check Timed Out)
[1200807381] SERVICE ALERT: router1.XXXXXXXX.com;Router;CRITICAL;SOFT;1;
(Service Check Timed Out)
[1200807461] HOST ALERT: router2.XXXXXXXXX.com;DOWN;SOFT;1;(No output returned 
from host check)
[1200807461] SERVICE ALERT: router2.XXXXXXXXX.com;WAP;CRITICAL;SOFT;1;(Service 
Check Timed Out)
[1200807551] SERVICE ALERT: 
router2.XXXXXXXXX.com;Camera;CRITICAL;HARD;1;CRITICAL: - failed: A temporary 
error occurred on an authoritative name server. Try again later.
[1200807551] SERVICE ALERT: 
router2.XXXXXXXXX.com;Router;CRITICAL;HARD;1;CRITICAL: - failed: A temporary 
error occurred on an authoritative name server. Try again later.
[1200807551] SERVICE ALERT: router1.XXXXXXXXX.com;WAP;CRITICAL;HARD;1;(Service 
Check Timed Out)
[1200807551] SERVICE ALERT: router1.XXXXXXXXX.com;Camera;CRITICAL;HARD;1;
(Service Check Timed Out)
[1200807551] SERVICE ALERT: router1.XXXXXXXXX.com;Router;CRITICAL;HARD;1;
(Service Check Timed Out)
[1200807551] SERVICE ALERT: router2.XXXXXXXXX.com;WAP;CRITICAL;HARD;1;(Service 
Check Timed Out)
[1200807591] HOST ALERT: router2.XXXXXXXXX.com;DOWN;SOFT;2;(Host Check Timed 
Out)
[1200807691] HOST ALERT: router2.XXXXXXXXX.com;DOWN;HARD;3;(Host Check Timed 
Out)
[1201469001] Nagios 3.0a3 starting... (PID=20058
============================================

The last line is when I restarted the Nagios service.  I can't see anything 
else in the system logs.

Any ideas as to what might have happened?  IS the message about the DNS server 
something that could have caused Nagios to implode (I don't think so, but 
just asking for your experience here).

BTW, the server receives the usual unwanted attention from script-kiddies 
trying to crack their way in.  Fail2ban tries its best to keep them out.  
Could it be that the server has been compromised?  A Nagios vulnerability?  
What else could I look at, or test?
-- 
Regards,
Mick
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <https://www.monitoring-lists.org/archive/users/attachments/20080130/cd4a77cc/attachment.sig>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list