Overloaded master

Mike Lindsey mike-nagios at 5dninja.net
Tue Jan 26 02:02:26 CET 2010


A typical first tier notification goes to 20 people.  One of those will 
be a pager, and is very simple.

The rest are fairly complex.

Notifications include a link to existing and recent tickets in our 
ticketing system (this also allows me to not send a ticket opening 
notification if a ticket already exists)..  I populate the notification 
with links to cacti graphs, links to wiki documentation for the event as 
well as fire off a secondary notification handler that adds in 
additional information based on the host, service, and state.

The first notification of the cycles does all the heavy lifting and 
takes about 6 seconds.  The other 19 finish relatively quickly.

I've been thinking of building a notification server - so I could have 
separate and discrete notification escalations for different service 
states - which would also let me fire off one notification with just the 
contents of $ENV{NAGIOS_*}..  Perhaps that's my best option?

Martin Melin wrote:
> What kind of notifications are you doing and how many are you sending 
> out? Why does a notification cycle take 9 seconds to complete?
> 
> On Sat, Jan 23, 2010 at 12:13 AM, Mike Lindsey <mike-nagios at 5dninja.net 
> <mailto:mike-nagios at 5dninja.net>> wrote:
> 
>     What kind of options does one have, if your master nagios server is
>     getting overloaded?
> 
>     I have half a dozen slaves doing polling, submitting passive check
>     results back via send_nsca.  The master does no active polling, just
>     event processing, notifications, and web ui.
> 
>     Under normal circumstances, it works alright.  But after a restart it
>     can take up to half an hour before the master catches up; and if there
>     are a lot of events, the act of sending out notifications can cause it
>     to fall behind.
> 
>     I'm pre-caching my object file, I'm skipping circular dependency checks,
>     and I've gotten a notification cycle down to 9 seconds.  I tried
>     modifying nagios to fork before notifications, but that failed pretty
>     spectacularly; so that 9 seconds is a time where 900 or so passive check
>     submissions block until the notifications are done.
> 
>     Are there any options for running a dual-master setup, or other ways to
>     spread the load across multiple machines?
> 
>     Has anyone patched nsca to submit check results into the checkresults
>     directory, instead of via the nagios.cmd pipe?  What kind of improvement
>     can one expect from that?
> 
>     Any other advice?


-- 
Mike Lindsey

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list