Strategies for coping with self-DOS?

Chris Beattie cbeattie at geninfo.com
Thu Aug 23 16:42:33 CEST 2012


Where I work, the server engineers want Nagios to notify them fairly 
quickly when a problem develops.  During the day, the settings are fine. 
  Recently, however, the nightly backups and scheduled antivirus scans 
began causing enough load that monitored hosts to become briefly 
unavailable, but still long enough that Nagios sends notifications that 
make it to their pagers.

What are some of the strategies you use to deal with this?

The last time I dealt with this, I had two service template files, which 
specified different max_check_attempts and retry_intervals for day and 
night.  I used a cron job to copy the appropriate template file to a 
name Nagios was configured to load, and restart Nagios.

As we upgraded things, the problem went away, so I ditched that setup. 
It always seemed like a kludge.  Scheduled reboots just smell like 
failure to me and they don't scale well if you have multiple thousands 
of hosts and services.  Well, our server estate has continued to expand 
and now we're back to committing own-goals with the midnight pages.

This time, I'm thinking about defining escalations with different 
timeperiods, but I'm curious to find out what other approaches have been 
successful.

Thanks!
-- 
-Chris

Nothing in this message is intended to make or accept an offer or to form a contract, except that an attachment that is an image of a contract bearing the signature of an officer of our company may be or become a contract. This message (including any attachments) is intended only for the use of the individual or entity to whom it is addressed. It may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law or may constitute as attorney work product. If you are not the intended recipient, we hereby notify you that any use, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this message in error, please notify us immediately by telephone and delete this message immediately.
 

Thank you.


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list