nagios blocking on notifications?

Mike Lindsey mike-nagios at 5dninja.net
Thu Jan 14 22:56:38 CET 2010


I've got a high volume site.  Everything seems to keep up reasonably 
well, unless there are a good number of state changes.  Once services 
start changing state, and notifications start getting sent out, nagios 
falls behind.

Did some digging in the logs and it looks like while a batch of 
notifications are being sent out, it's rate limiting to about one per 
five seconds.  Also, from the first notification for a service to the 
last notification for that service, nothing else is written to the logs.

Since a typical notification goes out to 15+ people, that's over a 
minute with no service check handling.

Is there something going on under the hood that I'm not aware of (like, 
is it just not doing the log writing, but still doing the passive 
service check handling, and there's something else causing my latency?)

Is that delay configurable?  I don't see anything in the docs for that.

I've even set my notification script to just call and background a 
secondary script, to try and see if it wasn't a delay in the 
notification script, but that seemed not to do anything at all.  Should 
I be forking the notification script instead?

Here's a log snippet:
[1263505735] EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;<redacted>;System Check;0;OK load mem ntp 
swap cfengine disk|
[1263505735] EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;<redacted>;System Check;0;OK load mem ntp 
swap cfengine disk|
[1263505735] EXTERNAL COMMAND: 
PROCESS_SERVICE_CHECK_RESULT;<redacted>;System Check;1;WARNING [swap 
utilization 25%] [/data/ at 77% (inodes 0%)]|
[1263505735] PASSIVE SERVICE CHECK: 
<redacted>;check_mtime-redlist.txt;0;OK - redlist.txt 102 seconds old
[1263505735] PASSIVE SERVICE CHECK: <redacted>;pre_queuedepth;2;CRITICAL 
- <redacted> pre_queuedepth status: 2159 > 500
<There's close to 50 line entries with that time stamp>
[1263505735] SERVICE NOTIFICATION: 
<redacted>;<redacted>;pre_queuedepth;CRITICAL;notify-by-email;CRITICAL - 
<redacted> pre_queuedepth status: 2159  500
[1263505741] SERVICE NOTIFICATION: 
<redacted>;<redacted>;pre_queuedepth;CRITICAL;notify-by-email;CRITICAL - 
<redacted> pre_queuedepth status: 2159  500


The SERVICE NOTIFICATION entries keep rolling in every 5-6 seconds for 
the next minute+, then it goes back to it's usual happy speed.

Is this an artifact of the way it logs, or is the whole system choking 
while it sends email?  I've searched the list archives and not found 
anything on this.

-- 
Mike Lindsey

------------------------------------------------------------------------------
Throughout its 18-year history, RSA Conference consistently attracts the
world's best and brightest in the field, creating opportunities for Conference
attendees to learn about information security's most important issues through
interactions with peers, luminaries and emerging and established companies.
http://p.sf.net/sfu/rsaconf-dev2dev
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list