Overloaded master

Martin Melin martinm at op5.org
Tue Jan 26 09:12:30 CET 2010


I would stop doing the time-consuming ticket integration (or whatever is
taking up the time) from within the notification command itself. I don't
think you need to build a separate notification server - after all,
notification logic is one of the things Nagios does best - but I do think
you could build a quick program to do the heavy lifting for notifications,
per Nagios' instructions. Without knowing what kind of info you need, I
would probably have the Nagios notification command write to a spool
directory, then send a signal to a separate program to have it read the
spool directory and create/search for tickets etc.

On Tue, Jan 26, 2010 at 2:02 AM, Mike Lindsey <mike-nagios at 5dninja.net>wrote:

> A typical first tier notification goes to 20 people.  One of those will be
> a pager, and is very simple.
>
> The rest are fairly complex.
>
> Notifications include a link to existing and recent tickets in our
> ticketing system (this also allows me to not send a ticket opening
> notification if a ticket already exists)..  I populate the notification with
> links to cacti graphs, links to wiki documentation for the event as well as
> fire off a secondary notification handler that adds in additional
> information based on the host, service, and state.
>
> The first notification of the cycles does all the heavy lifting and takes
> about 6 seconds.  The other 19 finish relatively quickly.
>
> I've been thinking of building a notification server - so I could have
> separate and discrete notification escalations for different service states
> - which would also let me fire off one notification with just the contents
> of $ENV{NAGIOS_*}..  Perhaps that's my best option?
>
> Martin Melin wrote:
>
>> What kind of notifications are you doing and how many are you sending out?
>> Why does a notification cycle take 9 seconds to complete?
>>
>> On Sat, Jan 23, 2010 at 12:13 AM, Mike Lindsey <mike-nagios at 5dninja.net<mailto:
>> mike-nagios at 5dninja.net>> wrote:
>>
>>    What kind of options does one have, if your master nagios server is
>>    getting overloaded?
>>
>>    I have half a dozen slaves doing polling, submitting passive check
>>    results back via send_nsca.  The master does no active polling, just
>>    event processing, notifications, and web ui.
>>
>>    Under normal circumstances, it works alright.  But after a restart it
>>    can take up to half an hour before the master catches up; and if there
>>    are a lot of events, the act of sending out notifications can cause it
>>    to fall behind.
>>
>>    I'm pre-caching my object file, I'm skipping circular dependency
>> checks,
>>    and I've gotten a notification cycle down to 9 seconds.  I tried
>>    modifying nagios to fork before notifications, but that failed pretty
>>    spectacularly; so that 9 seconds is a time where 900 or so passive
>> check
>>    submissions block until the notifications are done.
>>
>>    Are there any options for running a dual-master setup, or other ways to
>>    spread the load across multiple machines?
>>
>>    Has anyone patched nsca to submit check results into the checkresults
>>    directory, instead of via the nagios.cmd pipe?  What kind of
>> improvement
>>    can one expect from that?
>>
>>    Any other advice?
>>
>
>
> --
> Mike Lindsey
>



-- 
Martin Melin
____________________________
op5 AB
http://www.op5.com

http://www.op5.org/
http://www.op5.com/op5/products/network-monitor/nagios/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20100126/312a2888/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list