Multithreaded Macro Support wrapper proposal

Ton Voon ton.voon at opsera.com
Sat Aug 22 08:48:17 CEST 2009


Hi Steven,

On 21 Aug 2009, at 18:35, Steven D. Morrey wrote:

> To that end I have decided to robustify the macro system by creating  
> a handful of wrapper functions that will make the macros thread safe  
> (as long as all macro calls are passed through them).
> These functions are

Taking a different approach, which part of the macro setting routines  
is taking the most time? My guess is that the summary macros takes the  
most time because it has to walk through the entire list of hosts and  
services. http://nagios.sourceforge.net/docs/3_0/macrolist.html

You could disable summary macro processing with the large installation  
tweaks (http://nagios.sourceforge.net/docs/3_0/ 
largeinstalltweaks.html) and see if the timings still show the macro  
portion to be causing the bottleneck. I think you are on Nagios 2  
though, so this option is not available. You could try just commenting  
out that entire block and see how it affects the profiling.

For Opsview, we found for a customer that their CPU was spinning at  
100%. Using strace, we found it was in the notifications logic setting  
all the macro environment variables. But we knew that the customer  
**didn't have notifications enabled for any contacts**. Turns out that  
when nagios got an alert event, it would set macros first, and then  
work out if the contact should be notified. We changed the loop so  
that it checked if the contact should be notified and then calculated  
the macros. This reduced their CPU down to 10%.

Patch for Nagios 2.10: https://secure.opsera.com/svn/opsview/branches/BRAN-2.14/opsview-base/patches/nagios_reduce_notifications_load.patch

Patch for Nagios 3: https://secure.opsera.com/svn/opsview/branches/BRAN-3.1/opsview-base/patches/nagios_reduce_notifications_load.patch

I haven't put this into core code yet because I'm trying to work out a  
way to test this. Even though I know this works for the thousands of  
users using Opsview, I set myself a different standard when it comes  
to the hundreds of thousands of users of Nagios :)

I'd be grateful if anyone wants to write a libtap test that proves  
this problem, so then I can get it applied to core code.

Ton


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july




More information about the Developers mailing list