Setting up a passive check problem

Marc Powell marc at ena.com
Wed Apr 13 01:09:17 CEST 2005



> -----Original Message-----
> From: Lewis Getschel [mailto:lgetschel at denver.westerngeco.slb.com]
> Sent: Tuesday, April 12, 2005 5:08 PM
> To: Marc Powell
> Cc: Nagios Users
> Subject: Re: [Nagios-users] Setting up a passive check problem
> 
> Sorry to describe so much and then leave out my actual problem...
> 
> Being an impatient person I've changed my services.cfg a little... now
> they are:
> 
> services.cfg:
> define service{
>         use                             linux-service
>         name                            ibm_disk_array_status
>         service_description             ibm_disk_array_status
>         active_checks_enabled           0
>         passive_checks_enabled          1
>         check_command                   check_dummy
>         check_freshness                 0
>         register                        0
>         }
> 
> same config- hosts.cfg:
> # service definition
> define service{
>      use           ibm_disk_array_status
>      host_name     fs004,fs005,fs006,fs007,fs008
> }
> 
> commands.cfg:
> # 'check_dummy' command definition
> define command{
>         command_name    check_dummy
>         command_line    $USER1$/check_dummy 0
>         }

Yup. Still looks ok.

> 
> Now, If I understand ...
> the idea of  "active_checks_enabled           0",   means do NOT
> actually check anything (don't run the command_line defined).
> the idea of  "passive_checks_enabled          1"   means that nagios
> will only get updates that I put into the  command_file
> ("/usr/local/nagios/var/rw/nagios.cmd") through another script that is

Correct. Freshness checking will ignore the value of
active_checks_enabled I believe. That would only come into play if
you've enabled freshness checking of course.

> called. This much IS working because I see the following line in my
> event log:
> [04-12-2005 14:57:15] EXTERNAL COMMAND:
> PROCESS_SERVICE_CHECK_RESULT;fs008;ibm_disk_array_status;0;OK - No
> errors reported
> 
> 

This indicates that nagios saw an external command, not necessarily that
it accepted it. I'm going to guess it did as the next line would have
been an error of some type if nagios rejected it.

> When I look at the scheduling queue it shows that my service
> "ibm_disk_array_status" is scheduled to be run!
> fs004    ibm_disk_array_status    04-12-2005 14:34:16    04-12-2005
> 14:54:16    ENABLED
> 
> When I view my fileserver services, it shows:
> fs004 ibm_disk_array_status          OK 04-12-2005 14:34:16 0d 1h 33m
> 37s 1/4 Status is OK
> 
> The problem is that the "Status is OK" message is coming from the
> check_dummy command, and it _SHOULD_ be "OK - No errors reported" as
my
> external command shows.

This could be explained if you have state retention enabled in
nagios.cfg. See the notes on Retention at
http://nagios.sourceforge.net/docs/1_0/xodtemplate.html.

> 
> ------------I've done the following commands:---------------
>  $ sudo /etc/rc.d/init.d/nagios stop
> Stopping network monitor: nagios
> $ ps -ef | grep nagios | grep -v grep
> $ sudo /etc/rc.d/init.d/nagios start
> Starting network monitor: nagios
>   PID TTY          TIME CMD
> 30767 ?        00:00:00 nagios
> $ ps -ef | grep nagios | grep -v grep
> nagios   30767     1  8 15:05 ?        00:00:00
> /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
> $
>
-----------------------------------------------------------------------
> So I don't have an extra copy of nagios running.

Good thinking. It's a common problem.

> Here is what I want to happen:
> 1) tell nagios to accept passive results for these 5 servers, display
> the last known status value it had for the service

Looks like you've got that configured properly.

> 2) don't perform any active checks for whatever I need to specify as a
> command

Again, it looks like you have that configured properly.

> 3) When my script places a status of OK, or CRITICAL (the only 2
cases),
> accept that as the new status value, and notify as appropriate
> until/unless the status is changed or the service is acknowledged.

This will happen as a natural occurrence of submitting passive checks.

> 4) repeat
> 
> After all this time, I thought I understood the basic operation of
> Nagios, but it doesn't seem that I do.

You're close. I'll bet it's state retention that's throwing you, based
on the information so far.

> (If someone has example configs for a passive service, could you
please
> post your file entries so I can see how someone else does it)

Here's how I do it. Note that I have active checks enabled but the
check_period to none. That prevents the annoying X from being displayed
in the GUI but the command still never gets run as an active check.

# Generic service definition template
define service{
        name                            generic-service
        active_checks_enabled           1       ; Active service checks
are enabled
        passive_checks_enabled          1       ; Passive service checks
are enabled/accepted
        parallelize_check               1       ; Active service checks
should be parallelized
        obsess_over_service             0       ; We should obsess over
this service (if necessary)
        check_freshness                 0       ; Default is to NOT
check service 'freshness'
        notifications_enabled           1       ; Service notifications
are enabled
        event_handler_enabled           1       ; Service event handler
is enabled
        flap_detection_enabled          1       ; Flap detection is
enabled
        process_perf_data               0       ; Process performance
data
        retain_status_information       1       ; Retain status
information across program restarts
        retain_nonstatus_information    1       ; Retain non-status
information across program restarts
        is_volatile                     0
        check_period                    none
        max_check_attempts              4
        normal_check_interval           5
        retry_check_interval            3
        notification_interval           10080
        notification_period             none
        notification_options            c,r

        register                        0       ; DONT REGISTER THIS
DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }

# Host definition
define host {
        use                     generic-host
        host_name               host-name
        alias                   The Renaissance Center
        address                 <ip address removed>
        }

#Service definition
define service {
        use                     generic-service
        host_name               host-name
        service_description     PING
        contact_groups          tnops
        check_command           check_ping
        }

# 'check_ping' command definition
define command{
        command_name    check_ping
        command_line    $USER1$/check_ping $HOSTADDRESS$ 30 60 500.0
1000.0 -p 10 -t 30
        }

--
Marc


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list