Message storms, service statuses, intervals and so on

Marc Powell marc at ena.com
Wed Oct 20 23:00:56 CEST 2004


----Original Message----
From: nagios-users-admin at lists.sourceforge.net
[mailto:nagios-users-admin at lists.sourceforge.net] On Behalf Of Mohr
James Sent:
Wednesday, October 20, 2004 7:33 AM To:
Nagios-users at lists.sourceforge.net
Subject: [Nagios-users] Message storms, service statuses, intervals and
so on

[lots of stuff related to trying to overwhelm nagios removed]

> We have a notification set up to create a trouble ticket in our help
desk
> tool. That mechanism works fine. 
> 
> There are a couple of things that have made me curious that I cannot
explain.
> To begin with, it is not the first event that creates the trouble
ticket.
> Once it a was at count 150 another time was count 351.  
> So why isn't the first event triggering the trouble ticket? If I
understand
> it correctly,  "max_check_attempts 1" says Nagios should react the
very first
> time. How come it then waits for the 351st event before reacting? I
could
> understand it if the notifcation is triggered and then notification
program
> reads the current state (including the current event with the current
count).
> By the time the notification gets around to ready the state info the
count
> has increased.      

It should and does work for me unless there is something that is unique
to setting the max_check_attempts to 1 which I am not aware of (I use
3). What does your host definition and specifically the check_command
look like (and it's corresponding command definition)? Before sending a
service notification, nagios will execute the host check and if that
returns critical it will only send a host notification, not a service
notification. Are you changing the host status during your testing? What
does your nagios.log file look like around the time that it should send
the first service notification (check result 1)? Does it try to send it?
Are there any hints or pointers there?
 
> 
> The next is the "Status Information" field in the web browser. The
content is
> obviously changing as I can see that the count value increases almost
up to
> the 10000. The status does not change, but the system is always
getting the
> "current" message. My problem is I am not sure as to what mechanism is
used
> to determine how long the system would wait before updating.    

I believe this is what you're looking for. In nagios.cfg --

# AGGREGATED STATUS UPDATES
# This option determines whether or not Nagios will 
# aggregate updates of host, service, and program status
# data.  Normally, status data is updated immediately when
# a change occurs.  This can result in high CPU loads if
# you are monitoring a lot of services.  If you want Nagios
# to only refresh status data every few seconds, disable
# this option.
# Values: 1 = enable aggregate updates, 0 = disable aggregate updates

aggregate_status_updates=1

# AGGREGATED STATUS UPDATE INTERVAL
# Combined with the aggregate_status_updates option,
# this option determines the frequency (in seconds!) that
# Nagios will periodically dump program, host, and 
# service status data.  If you are not using aggregated
# status data updates, this option has no effect.

status_update_interval=15

--
Marc


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list