Inexplicable service escalation behaviour

Ralph.Grothe at itdz-berlin.de Ralph.Grothe at itdz-berlin.de
Thu Oct 6 16:08:13 CEST 2005


Dear List Subscribers,

although I have already asked how to properly set up an
escalation scheme
a couple of weeks ago here (sorry, only find time to continue
Nagios fumbling
every now and then at work, and at home it would be useless due
to lacking testing ground
(ok, I could emulate a host and network farm by vmware or xen
etc., but that's too much fuss))
I desperately need further assistance.

I am not getting on with this.

I swear that I've read carefully the sections on escalations in
the Nagios docs at least thrice 
by now.
And the presented examples in the docs sound very convincing to
me 
(though a bit far-fetched) so that I very well can gather, I
suppose, how it should work - in *theory*.

My objective seems very trivial to me.

I just want Nagios to send a *single* notification by using my
"file-service-sc-ticket"
(misc)command definition to our trouble ticketing system,
but at the same time keep continuing sending out repetetive
notifications to the 
various admin recipients at the common notification intervall
(at least the latter is working).

The filing of the ticket works great.
In fact too great, as it turns out to be a flooding of the
service center.

Tickets keep being generated at common notification intervals,
even for recovery alerts
(which I never intended).

Also are tickets generated for downed hosts I wouldn't have
thought to be able
(per my Nagios definitions) to send a ticketing request to the
service center.

I wonder what the host_name directive in the serviceescalation
definition is any use
for if tickets are filed for other hosts despite?

A hostescalation definition so far doesn't exist yet.

I deliberately restricted it to a fumbling host called "fiddle"
until I get this trivial task working, whereafter I would of
course extend it to all my 
monitored hosts and services.


So this is the only escalation definition so far:

$ cat escalations.cfg 
define serviceescalation {
    host_name                   fiddle
    service_description         icmp-host-alive
    first_notification          3
    last_notification           3
    notification_interval       0
    contact_groups              service_center
}


This is the above service:

define service {
    use                         generic-service
    service_description         icmp-host-alive
    hostgroup_name              non_fwalled_hosts
    check_command               check-host-alive
    contact_groups              nagiosadmin,service_center
}


This is the inherited service template:

define service {
    name                                generic-service
    is_volatile                         0
    max_check_attempts                  5
    normal_check_interval               5
    retry_check_interval                3
    check_period                        24x7
    active_checks_enabled               1
    passive_checks_enabled              0
    parallelize_check                   1
    obsess_over_service                 0
    check_freshness                     0
    event_handler                       notify-by-email
    event_handler_enabled               0
    flap_detection_enabled              0
    process_perf_data                   0
    retain_status_information           1
    retain_nonstatus_information        1
    notification_interval               30
    notification_period                 24x7
    notification_options                w,u,c,r
    notifications_enabled               1
    contact_groups                      nagiosadmin
    register                            0
}



This is the host definition for fiddle:


define host {
    use                                 generic-host
    host_name                           fiddle
    alias                               MC/SG Cluster Package
FIDDLE
    address                             123.123.123.123
    hostgroups                          non_fwalled_hosts
    contact_groups                      nagiosadmin
}


This is the contact group definition receiving the tickets (i.e.
service center)


define contactgroup { 
    contactgroup_name   service_center
    alias               Service Center TT Filer Accounts
    members             scadmin
}


And finally this is the contact (inclusive template, but with
bogus mail address here):

define contact {
    name                                generic-contact
    register                            0
    contact_name                        grothe
    alias                               Must be overridden
    contactgroups                       sazadmin 
    host_notification_period            workhrs
    service_notification_period         workhrs
    host_notification_options           d,u,r
    service_notification_options        w,u,c,r
    host_notification_commands          host-notify-by-email
    service_notification_commands       notify-by-email
    email                               nagios
}

define contact {
    use                                 generic-contact
    contact_name                        scadmin
    alias                               Service Center TT Filer
    email                               scadmin at our.rotten.com
    host_notification_period            24x7
    service_notification_period         24x7
    host_notification_commands          file-host-sc-ticket
    service_notification_commands       file-service-sc-ticket
    address1                            SC Token
    address2                            Another SC Token
}

I think I can skip the command definition for
"file-service-sc-ticket" here
(I surely know by the sheer ticket flood that at least this part
is doing its duty as expected)



I am absolutely clueless why the service center is receiving
those ticket filing requests
repetitvely, and even from other hosts of host group
"non_fwalled_hosts" when I did
in fact specify host fiddle in the service escalation definition
(something I would consider a clear disambiguator directive).

If I can't get this trivial but important functionality of ticket
generation working
I will have to dismiss the whole Nagios experience and look out
for another tool,
which I think would be a very sad thing, given the time spent so
far and the positive
impressions from the working parts.


P.S. I don't know if this is of any importance at all, but these
are the releases I run:


$ printf "%s\n\n" "$(uname -srv)";/opt/sw/nagios/bin/nagios
-V|head -5
AIX 3 4


Nagios 2.0b3
Copyright (c) 1999-2005 Ethan Galstad (www.nagios.org)
Last Modified: 04-03-2005
License: GPL


Many thanks for your kind notice

Ralph


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list