Service checks still not being executed

Nolan Martin Nolan.Martin at co.travis.tx.us
Fri Aug 16 23:59:50 CEST 2002


And you encountered this while you were running 1.0b4 (whereas I am on
1.0b3) and the problem went away (I assume) while you were/are still on
1.0b4?  In other words, I take it you resolved this problem prior to
this latest release...

I had seen in your previous e-mails about deleting the status and
Nagios logs, and restarting - and that it worked sometimes for you...
but I have not tried that yet, based on the fact that it seemed to only
work intermittently for you. 

I am currently running the process manually from a shell console
prompt.  I have found that if I simply stop it (CTL-C) and restart it,
that it seems to get better, but only for awhile.  But, like you also
mentioned, I have found that after some periods of time, that they just
begin working again on their own.  

Here are my answers to the questions you posed...

>> Some questions that might help people help you:
>> Do you have any service checks that have long scheduled intervals?

I am currently running all service checks at 5 minute intervals.

>> Do you note that most services get checked, while a handful don't
ever get
checked?
Sort of.  What I have found is that most continue to check regularly. 
However, some just stick indefinitely.  However, after some extended
periods of time, hours or so, they may begin processing again, while
others seem to continue to stick.

>> Do those services have a scheduled check time that they exceed?
Can you clarify this?  If I understand correctly, yes -  if I watch the
Scheduling Queue, I notice that they are behind schedule as they sit at
the top of the queue.  

>>Do you sometimes leave the monitoring process offline for some time?
No, it is running constantly, and has run constantly for some time.

>> Does the problem start when you start up the process?
Actually, restarting the process seems to help clear things up, but
after awhile, service checks will continue to hang or stop
processing...

I appreciate any suggestions or assistance.  

Darren - you mention that you made a variety of tweaking and changes. 
Is there anything you can think of to list that you did that might
possibly have corrected this?

Thanks.

>>> Darren Gamble <Darren.Gamble at sjrb.ca> 08/16/02 04:30PM >>>
Good day,

> Darren,
> 
> Did you reach a resolution to this?  I was following your thread,
but
> it stops abruptly (unless I missed some e-mails).

It stopped abruptly.  There was no resolution.  However, I have not
been
able to reproduce the problem since then, and my configuration has
been
under constant tweaking since then, so it could have been any number
of
things.

> I know Ethan recommended a few changes.  Did these changes 
> resolve your
> problem?

No, they did not (most were already in place).  However the
recommendations
involved changing how Nagios schedules its services.  The problem I
observed
was that services didn't actually get executed when they were
scheduled.
 
> I am currently running Nagios 1.0b3 on a workstation class machine
> using Red Hat 7.3.  I recall that you were running Nagios on a
> reasonably robust machine and were not experiencing a heavy load.  
> 
> Appreciate your feedback or any others that might be able to assist. 

> 
> Thanks. 

I'm afraid I don't have much to offer.  The group here is a good
bunch,
though.  Hopefully someone else will be able to offer some assistance. 
As a
workaround, try stopping the service, destroying your status and nagios
log,
and restarting the process (forcing Nagios to reschedule everything). 
That
_sometimes_ worked for me, when I had this problem. 

Some questions that might help people help you:

Do you have any service checks that have long scheduled intervals?
Do you note that most services get checked, while a handful don't ever
get
checked?
Do those services have a scheduled check time that they exceed?
Do you sometimes leave the monitoring process offline for some time?
Does the problem start when you start up the process?

============================
Darren Gamble
Planner, Regional Services
Shaw Cablesystems GP
630 - 3rd Avenue SW
Calgary, Alberta, Canada
T2P 4L4
(403) 781-4948


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net 
https://lists.sourceforge.net/lists/listinfo/nagios-users


-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390




More information about the Users mailing list