retention issue

Lori Adams ladams at cloudmark.com
Fri Nov 18 19:35:43 CET 2005


Nagios 1.2

Linux

 

I'm using a couple of templates for this particular check.  There are
many services checks using this template.  When one of these checks
becomes critical, the status in status.log changes to say it's critical.
If I stop/start nagios, then the status saved in status.sav is
incorrect, and says "No data yet (service was in a soft problem state
during state retention)".

 

Here are the templates, before everyone tells me to turn on state
retention:

define service{

        name                            generic-service-template

        ...

        retain_status_information       1       ; Retain status
information across program restarts

        retain_nonstatus_information    1       ; Retain non-status
information across program restarts

        ...

        register                        0       ; DONT REGISTER THIS
DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!

        }

 

define service {

        use                             generic-service-template

        name                            server-template

        host_name                       server

        contact_groups                  admins

        register                        0

        }

 

define service {

        use                             server-template

        name                            server-spool-template

        normal_check_interval           60

        retry_check_interval            30

        check_period                    workhours_with_weekend

        register                        0

        }

 

define service {

        use                             server-spool-template

        service_description             check

        check_command                   check_spool_nrpe!"-d
/srv/smtp/Maildir/check -w 24hours -c 36hours -m 35000 -W 10000000 -C
20000000"

        }

 

>From nagios.cfg:

retain_state_information=1

retention_update_interval=60

use_retained_program_state=1

 

I ran these commands all immediately one after the other, to show what
is happening.

 

root at aspire(var)# date; grep check status.log; /etc/init.d/nagios-prod
stop; date; grep check status.sav; /etc/init.d/nagios-prod start; date;
grep check status.log

Fri Nov 18 10:23:24 PST 2005

[1132338202]
SERVICE;server;spool-check;CRITICAL;1/4;SOFT;1132338029;1132339829;ACTIV
E;1;1;1;1132338037;0;OK;4225413;0;0;0;0;0;1;3;0;1;0;0.00;0;1;1;1;/srv/sm
tp/Maildir/check last modified 11/14/05 16:49:00

 

Stopping network monitor: nagios

Fri Nov 18 10:23:24 PST 2005

Starting network monitor: nagios

21897 ?        00:00:00 nagios-prod

 

Fri Nov 18 10:23:26 PST 2005

[1132338205]
SERVICE;server;spool-check;OK;1/4;HARD;1132338029;1132338377;ACTIVE;1;1;
1;1132338037;0;OK;4225581;0;0;0;0;0;1;0;0;1;0;0.00;0;1;1;1;No data yet
(service was in a soft problem state during state retention)

 

This is only happening when the checks using server-spool-template are
in a critical state.

 

Thanks,

-Lori

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20051118/c3f1f91d/attachment.html>


More information about the Users mailing list