Service status not resetting

Brett Stevens brett.stevens at hubbub.com.au
Thu Feb 17 05:50:18 CET 2005


Hi. Ive only just started to use nagios and so far it has been a great
system. Ive configured a plugin check_rrd_data to check rrds created by
cacti. This seems to be working well but if a service goes to critical it
never returns. For example server x cpu0 util goes to 99.9999 for a few
minutes I get a critical showing in the gui and an email. This is exactly
what I would expect
server x         Down         Date     duration    message CPU Util
CRITICAL: 99.9999    
however when it comes back on line I get the same except the message will
show a good value such as CPU OK: 25.99 but the gui shows the host as  still
down. 
 
I think this shows that the plugin is working but I may have screwed up the
server config or the service config.
This behaviour exists if a server is non contactable as well and shows the
same behaviour in the host detail cgi
 
Service config for the example (sanitised)
define service{
        use                         generic-service
        host_name              problem_server     
        service_description   CPU0 Utilization
        check_command      check_rrd_data!$USER3$/grandma_cpudpc_71.rrd
!cpuProcessor!50!70!CPU
        }
 
Generic service def
define service{
        name                            generic-service
        active_checks_enabled           1 ; Active service checks are
enabled
        passive_checks_enabled          1 ; Passive service checks are
enabled/accepted
        parallelize_check               1 ; Active service checks should be
parallelized
        obsess_over_service             1 ; We should obsess over this
service (if necessary)
        check_freshness                 0 ; Default is to NOT check service
'freshness'
        check_period                    24x7
        notifications_enabled           1 ; Service notifications are
enabled
        event_handler_enabled           1 ; Service event handler is enabled
        flap_detection_enabled          1 ; Flap detection is enabled
        process_perf_data               1 ; Process performance data
        retain_status_information       1 ; Retain status information across
program restarts
        retain_nonstatus_information    1 ; Retain non-status information
across program restarts
        max_check_attempts              3 ; Number of times ito check before
sending an alert.
        normal_check_interval           5 ; Check the service every 5 mins
        retry_check_interval            1 ; Time to wait before scheduling a
re-check of a service
        notification_interval           5 ; The number of "time units" to
wait before re-notifying a contact that this service is still in a non-OK
state.Time units are minutes
        notification_period             24x7
        notifications_enabled           1 ; Enable notifications
        contact_groups                  Win32-Admins
        register                        0
        }

 
host def
define host{
        host_name                   problem_server     
        alias                           problem_server     
        address                       www.xxx.yyy <http://www.xxx.yyy> .zzz
        use                             generic-win32
        parents                         vLan3,vLan4
        }

generic host def
define host{
        name                                       generic-win32
        check_command                       check_http
        max_check_attempts                5
        process_perf_data                     0
        retain_nonstatus_information      0
        notification_interval                    30
        notification_period                      24x7
        notification_options                    d,u,r
        contact_groups                          Win32-Admins
        notifications_enabled                  1       ; Host notifications
are enabled
        event_handler_enabled               1       ; Host event handler is
enabled
        flap_detection_enabled               1       ; Flap detection is
enabled
        process_perf_data                      1        ; Process
performance data
        retain_status_information            1       ; Retain status
information across program restarts
        retain_nonstatus_information       1       ; Retain non-status
information across program restarts
        register                                      0       ; DONT
REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }
 
Ive probably screwed up a definition somewhere as I have been mucking around
a bit to try different config layouts. We have quite a few servers and gear
to montior 
Any help would be greatly apreciated.
 
thanks in advance
 
Brett Stevens

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20050217/1ee9788f/attachment.html>


More information about the Users mailing list