Separate mail server problems cause Nagios to plotz (or vice versa?)

up at up at
Fri Jun 24 18:54:20 CEST 2011

We have Nagios monitoring a variety of services on roughly 50 separate servers.  Several of them
are mail servers, but only the "main" (that contains most of the Nagios notification recipients)
one has this problem.

The mail server will start to become unresponsive so just about any input (but pings fine). 
Simultaneously, Nagios, which is on a separate server, will send out notifications that every
service on every server is down because Nagios cannot reach them.  Since almost all of them go
through this problem mail server, including those that forward to text messaging services, they
will stop and resume again when the mail server is either rebooted, or otherwise is brought back
to life...sometimes by restarting the LDAP server process on it.

There are perhaps a few dozen total email destinations for notifications.  Even multiplying this
times the total number of services that Nagios monitors, it doesn't seem likely that it's just
volume of emails generated by Nagios would cause all this.  It is a fairly modern, multiprocessor
server (CentOS/Sendmail).

Can anyone offer any insight or similar experiences?

Thanks in Advance!

All the data continuously generated in your IT infrastructure contains a 
definitive record of customers, application performance, security 
threats, fraudulent activity and more. Splunk takes this data and makes 
sense of it. Business sense. IT sense. Common sense..
Nagios-users mailing list
Nagios-users at
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

More information about the Users mailing list