Distributed Monitoring

Burnson, Richard rburnson at cps.k12.il.us
Thu Jan 16 18:49:35 CET 2003


I have gotten closer to resolving this issue.  I setup the same distributed
server config on another RedHat 7.2 box, and it worked flawlessly.  The only
difference between the two distributed servers is that the original one was
installed with a high security setting during the redhat installation.  The
only change on the central server was to allow nsca connections from the new
server.  I'm not sure what security feature would prevent the service check
results from being sent out.  I had already double checked the permissions,
and found no issues.  Any ideas on what could possibly cause this?  I could
just rebuild the box, but if it's just a simple config change I would rather
do that.

TIA,
Richard 
 
    

-----Original Message-----
From: Burnson, Richard [mailto:rburnson at cps.k12.il.us] 
Sent: Tuesday, January 07, 2003 10:16 AM
To: nagios-users at lists.sourceforge.net
Subject: RE: [Nagios-users] Distributed Monitoring


I haven't seen any responses yet, perhaps more information is required?

Here is some of the debugging I have done:

Verified services have "obsess over service" enabled:

define service{
        name                            check-service ; The 'name' of this
service template, referenced in other service definitions
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           15
        retry_check_interval            2
        notification_interval           120
        notification_period             24x7
        notification_options            w,u,c,r
        active_checks_enabled           1       ; Active service checks are
enabled
        passive_checks_enabled          1       ; Passive service checks are
enabled/accepted
        parallelize_check               1       ; Active service checks
should be parallelized (disabling this can lead to major per
formance problems)
================================================
        obsess_over_service             1       ; We should obsess over this
service (if necessary)
================================================
        check_freshness                 1       ; Default is to NOT check
service 'freshness'
        notifications_enabled           1       ; Service notifications are
enabled
        event_handler_enabled           1       ; Service event handler is
enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information
across program restarts
        retain_nonstatus_information    1       ; Retain non-status
information across program restarts

        register                        0       ; DONT REGISTER THIS
DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }

 
The "submit_check_result" command is defined as follows:

define command{
        command_name    submit_check_result
        command_line
/usr/local/nagios/libexec/eventhandlers/submit_check_result $HOSTNAME$
'$SERVICEDESC$' $SERVICESTATE$ '$OUTPUT$'
        }
    

The "ocsp_command" is configured in nagios.cfg as follows with obsess over
services enabled:

obsess_over_services=1

ocsp_command=submit_check_result


I have removed the original commands from the documentation, and implemented
the two scripts that came with the nagios tar ball in the
contrib./distributed monitoring directory.  I am able to send a service
check manually, as Nagios, to the central server.  

[nagios at ilnetmon03 nagios]$
/usr/local/nagios/libexec/eventhandlers/submit_check_result 3390-RTR PING-RI
1 test
1 data packet(s) sent to host successfully. 

However the ocsp command never seems to be executed after any service
checks.  Here is the output with nagios compiled with debug set to 3:

*** Event Details ***
        Event type: 0 (service check)
                Service Description: PING
                Associated Host:     3390-KRO
        Event time: Tue Jan  7 08:41:36 2003
        Checking service 'PING' on host '3390-KRO'...

*** Event Check Loop ***
        Current time: Tue Jan  7 08:41:36 2003
        Next High Priority Event Time: Tue Jan  7 08:41:39 2003
        Next Low Priority Event Time:  Tue Jan  7 08:43:15 2003
Current/Max Outstanding Checks: 1/0

*** Event Check Loop ***
        Current time: Tue Jan  7 08:41:37 2003
        Next High Priority Event Time: Tue Jan  7 08:41:39 2003
        Next Low Priority Event Time:  Tue Jan  7 08:43:15 2003
Current/Max Outstanding Checks: 1/0

*** Event Check Loop ***
        Current time: Tue Jan  7 08:41:38 2003
        Next High Priority Event Time: Tue Jan  7 08:41:39 2003
        Next Low Priority Event Time:  Tue Jan  7 08:43:15 2003
Current/Max Outstanding Checks: 1/0

*** Event Check Loop ***
        Current time: Tue Jan  7 08:41:39 2003
        Next High Priority Event Time: Tue Jan  7 08:41:39 2003
        Next Low Priority Event Time:  Tue Jan  7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Details ***
        Event type: 10 (status save)
        Event time: Tue Jan  7 08:41:39 2003

*** Event Check Loop ***
        Current time: Tue Jan  7 08:41:39 2003
        Next High Priority Event Time: Tue Jan  7 08:41:44 2003
        Next Low Priority Event Time:  Tue Jan  7 08:43:15 2003
Current/Max Outstanding Checks: 1/0

*** Event Check Loop ***
        Current time: Tue Jan  7 08:41:40 2003
        Next High Priority Event Time: Tue Jan  7 08:41:44 2003
        Next Low Priority Event Time:  Tue Jan  7 08:43:15 2003
Current/Max Outstanding Checks: 1/0

*** Event Check Loop ***
        Current time: Tue Jan  7 08:41:41 2003
        Next High Priority Event Time: Tue Jan  7 08:41:44 2003
        Next Low Priority Event Time:  Tue Jan  7 08:43:15 2003
Current/Max Outstanding Checks: 1/0

*** Event Check Loop ***
        Current time: Tue Jan  7 08:41:42 2003
        Next High Priority Event Time: Tue Jan  7 08:41:44 2003
        Next Low Priority Event Time:  Tue Jan  7 08:43:15 2003
Current/Max Outstanding Checks: 1/0

*** Event Check Loop ***
        Current time: Tue Jan  7 08:41:43 2003
        Next High Priority Event Time: Tue Jan  7 08:41:44 2003
        Next Low Priority Event Time:  Tue Jan  7 08:43:15 2003
Current/Max Outstanding Checks: 1/0

*** Event Check Loop ***
        Current time: Tue Jan  7 08:41:44 2003
        Next High Priority Event Time: Tue Jan  7 08:41:44 2003
        Next Low Priority Event Time:  Tue Jan  7 08:43:15 2003
Current/Max Outstanding Checks: 1/0
*** Event Details ***
        Event type: 7 (service check reaper)
        Event time: Tue Jan  7 08:41:44 2003
Starting to reap service check results...

        Found check result for service 'PING' on host '3390-KRO'
                Check Type:    ACTIVE
                Parallelized?: Yes
                Exited OK?:    Yes
                Return Status: 0
                Plugin Output: 'FPING OK - 10.1.1.1 (loss=0.000000%,
rta=27.800000 ms)'
Finished reaping service check results.


Any ideas on why it's not working?

TIA,
Richard



-----Original Message-----
From: Burnson, Richard 
Sent: Friday, January 03, 2003 3:53 PM
To: nagios-users at lists.sourceforge.net
Subject: [Nagios-users] Distributed Monitoring


I am trying to setup distributed monitoring with Nagios 1.0 (Stable) on
RedHat 7.2.  I have the nsca daemon running on the central server, and I
have been able to successfully send a service check result via nsca-send
from the distributed server.  The issue appears to be that the distributed
server is not executing the ocsp_command.  Here are the settings on the
distributed server:
1.      Obsess over services is enabled both globally and per service. 
2.      The ocsp_command is defined in nagios.cfg as
ocsp_command=submit_check_result 
3.      "submit_check_result" is defined in the command definitions section
exactly from the documentation. 
4.      The submit_check_result script was created in the libexec directory
and the command definition points directly to this file.
I can log in as nagios and run the submit_check_result shell script
successfully, and the service check is received by the central server.   It
simply seems to be that Nagios is not executing the ocsp command with every
service check.  I've tried to watch the service check as they happen via the
nagios.log file and even compiled nagios with debug 3.  Is there a better
way to debug this?  Anyone have any ideas on what I may be missing?
Thanks,
Richard


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users


-------------------------------------------------------
This SF.NET email is sponsored by: Thawte.com
Understand how to protect your customers personal information by implementing
SSL on your Apache Web Server. Click here to get our FREE Thawte Apache 
Guide: http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0029en




More information about the Users mailing list