AW: Message storms, service statuses, intervals and so on

Mohr James james.mohr at elaxy.com
Thu Oct 21 15:09:18 CEST 2004


> -----Ursprüngliche Nachricht-----
> Von: Marc Powell [mailto:marc at ena.com] 
> Gesendet: Mittwoch, 20. Oktober 2004 23:01
> An: Mohr James; Nagios-users at lists.sourceforge.net
> Betreff: RE: [Nagios-users] Message storms, service statuses, 
> intervals and so on
> 
> [lots of stuff related to trying to overwhelm nagios removed]
> 
> > We have a notification set up to create a trouble ticket in our help
> desk
> > tool. That mechanism works fine.
> > 
> > There are a couple of things that have made me curious that I cannot
> explain.
> > To begin with, it is not the first event that creates the trouble
> ticket.
> > Once it a was at count 150 another time was count 351.
> > So why isn't the first event triggering the trouble ticket? If I
> understand
> > it correctly,  "max_check_attempts 1" says Nagios should react the
> very first
> > time. How come it then waits for the 351st event before reacting? I
> could
> > understand it if the notifcation is triggered and then notification
> program
> > reads the current state (including the current event with 
> the current
> count).
> > By the time the notification gets around to ready the state info the
> count
> > has increased.      
> 
> It should and does work for me unless there is something that 
> is unique to setting the max_check_attempts to 1 which I am 
> not aware of (I use 3). What does your host definition and 
> specifically the check_command look like (and it's 
> corresponding command definition)? Before sending a service 
> notification, nagios will execute the host check and if that 
> returns critical it will only send a host notification, not a 
> service notification. Are you changing the host status during 
> your testing? What does your nagios.log file look like around 
> the time that it should send the first service notification 
> (check result 1)? Does it try to send it? Are there any hints 
> or pointers there?

Hmmmmm. For the host I am using the default check-host-alive command and I am not changing the status of the host, just the service. What I got in Nagios.log was this:

[1098363547] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;sol-sys-02;http;2;HELLO WORLD. This is my message. 38
[1098363549] SERVICE NOTIFICATION: ovsd;sol-sys-02;http;CRITICAL;generate-incident;HELLO WORLD. This is my message. 1

This says it was on message number 38, but sent message 1 to the trouble ticket system. I tried it a couple of other times and still got message on being sent to the trouble ticket system. Hmmmmm.

I **know** that it created a trouble ticket with a different number yesterday because I can look at the old tickets and see it. Hmmmm. Hmmmm. 

<aggregation info snipped>

We have aggregate_status_updates=1, so that explains why it only shows intermittent values. I don't think we're going to change this, but now I know why. Thanks. 

Regards,

Jim Mohr


-------------------------------------------------------
This SF.net email is sponsored by: IT Product Guide on ITManagersJournal
Use IT products in your business? Tell us what you think of them. Give us
Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more
http://productguide.itmanagersjournal.com/guidepromo.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list