<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">
<TITLE>host check strangeness - odd behavior in Nagios scheduling queue</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->

<P><FONT SIZE=2 FACE="Arial">Greetings All, </FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">I'm seeing a problem with our host check scheduling.  There are two major issues, I can't tell if they are symptoms of the same problem or two separate issues.  I've provided the configs and information that I know to be applicable, if there's other pertinent information please let me know, I'm more than happy to provide it.  </FONT></P>

<P><FONT SIZE=2 FACE="Arial">First Here's my Nagios config:</FONT>

<BR><FONT SIZE=2 FACE="Arial">Single Nagios box (no distributed setup)</FONT>

<BR><FONT SIZE=2 FACE="Arial">64-bit RHEL 5.3</FONT>

<BR><FONT SIZE=2 FACE="Arial">Nagios 3.1.2 (I upgraded from 3.0.6 to see if that would fix the issues)</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">Problem 1. Some host checks are getting *stuck* in scheduling queue.  When I look at the scheduling queue these hosts are always listed with the 'last check' time the same as it's 'next check' time.  See attached screen shot (problem 1).  They typically stay at the top of the queue for an hour or two.</FONT></P>

<P><FONT SIZE=2 FACE="Arial">Host configuration for one of them:</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">        host_name               hostxxx</FONT>

<BR><FONT SIZE=2 FACE="Arial">        alias                   Oracle</FONT>

<BR><FONT SIZE=2 FACE="Arial">        use                     srvhost-os-2000,srvhost-physical,srvhost-oracle,srvhost-non-production,srvhost-all</FONT>

<BR><FONT SIZE=2 FACE="Arial">        notification_period             aperture</FONT>

<BR><FONT SIZE=2 FACE="Arial">        register                        1</FONT>

<BR><FONT SIZE=2 FACE="Arial">        }</FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">Applicable Templates:</FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     generic-host</FONT>

<BR><FONT SIZE=2 FACE="Arial">       check_period                             24x7</FONT>

<BR><FONT SIZE=2 FACE="Arial">       event_handler_enabled                    1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       flap_detection_enabled                   1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       process_perf_data                        1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       retain_status_information                1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       retain_nonstatus_information             1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notifications_enabled                    1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>

<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     generic-pnp</FONT>

<BR><FONT SIZE=2 FACE="Arial">       action_url                               /pnp/index.php?host=$HOSTNAME$' onmouseover="get_g('$HOSTNAME$','_HOST_')" onmouseout="clear_g()"</FONT></P>

<P><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>

<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     srvhost-all</FONT>

<BR><FONT SIZE=2 FACE="Arial">       alias                                    All Servers</FONT>

<BR><FONT SIZE=2 FACE="Arial">       check_command                            check-nt-alive</FONT>

<BR><FONT SIZE=2 FACE="Arial">       use                                      generic-pnp,generic-host</FONT>

<BR><FONT SIZE=2 FACE="Arial">       max_check_attempts                       3</FONT>

<BR><FONT SIZE=2 FACE="Arial">       check_interval                           60</FONT>

<BR><FONT SIZE=2 FACE="Arial">       retry_interval                           1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       active_checks_enabled                    1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       passive_checks_enabled                   1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       flap_detection_enabled                   1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       process_perf_data                        1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       retain_status_information                1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       retain_nonstatus_information             1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       contact_groups                           +servers</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notification_interval                    240</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notification_period                      24x7</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notification_options                     d,u,r</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notifications_enabled                    1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>

<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     srvhost-non-production</FONT>

<BR><FONT SIZE=2 FACE="Arial">       alias                                    Non production servers</FONT>

<BR><FONT SIZE=2 FACE="Arial">       hostgroups                               +SRV_Cls-non-production</FONT>

<BR><FONT SIZE=2 FACE="Arial">       check_interval                           120</FONT>

<BR><FONT SIZE=2 FACE="Arial">       retry_interval                           20</FONT>

<BR><FONT SIZE=2 FACE="Arial">       passive_checks_enabled                   1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       contact_groups                           +servers</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notification_interval                    480</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notification_period                      workhours</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notification_options                     d,u,r</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notifications_enabled                    1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>

<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     srvhost-oracle</FONT>

<BR><FONT SIZE=2 FACE="Arial">       alias                                    Oracle servers</FONT>

<BR><FONT SIZE=2 FACE="Arial">       hostgroups                               +SRV_app-oracle</FONT>

<BR><FONT SIZE=2 FACE="Arial">       contact_groups                           +oracle</FONT>

<BR><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>

<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     srvhost-physical</FONT>

<BR><FONT SIZE=2 FACE="Arial">       alias                                    Servers that are running on physical hardware</FONT>

<BR><FONT SIZE=2 FACE="Arial">       hostgroups                               +SRV_platform-physical</FONT>

<BR><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>

<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     srvhost-os-2000</FONT>

<BR><FONT SIZE=2 FACE="Arial">       alias                                    Servers running Windows 2000 Server</FONT>

<BR><FONT SIZE=2 FACE="Arial">       hostgroups                               +SRV_os-win2000</FONT>

<BR><FONT SIZE=2 FACE="Arial">       check_command                            check-nt-alive</FONT>

<BR><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>

<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>
<BR>

<P><FONT SIZE=2 FACE="Arial">Problem 2.  Many of our hosts are not running host checks, they are in the scheduling queue but don't execute.  Looking at the scheduling queue I can see many of the hosts that have host 'last check' times from several weeks ago.  They show up in the queue but never run their host checks (or don't seem to).  These same hosts run service checks on time without issue.  Screen shot attached (problem 2).</FONT></P>

<P><FONT SIZE=2 FACE="Arial">Host config for one of the hosts not running host checks:</FONT>

<BR><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">        host_name                       hostxxxx</FONT>

<BR><FONT SIZE=2 FACE="Arial">        alias                           media server</FONT>

<BR><FONT SIZE=2 FACE="Arial">        use                             srvhost-production,srvhost-physical,srvhost-os-2003,srvhost-all</FONT>

<BR><FONT SIZE=2 FACE="Arial">        register                        1</FONT>

<BR><FONT SIZE=2 FACE="Arial">        }</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     generic-host</FONT>

<BR><FONT SIZE=2 FACE="Arial">       check_period                             24x7</FONT>

<BR><FONT SIZE=2 FACE="Arial">       event_handler_enabled                    1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       flap_detection_enabled                   1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       process_perf_data                        1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       retain_status_information                1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       retain_nonstatus_information             1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notifications_enabled                    1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>

<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     generic-pnp</FONT>

<BR><FONT SIZE=2 FACE="Arial">       action_url                               /pnp/index.php?host=$HOSTNAME$' onmouseover="get_g('$HOSTNAME$','_HOST_')" onmouseout="clear_g()"</FONT></P>

<P><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>

<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     srvhost-all</FONT>

<BR><FONT SIZE=2 FACE="Arial">       alias                                    All Servers</FONT>

<BR><FONT SIZE=2 FACE="Arial">       check_command                            check-nt-alive</FONT>

<BR><FONT SIZE=2 FACE="Arial">       use                                      generic-pnp,generic-host</FONT>

<BR><FONT SIZE=2 FACE="Arial">       max_check_attempts                       3</FONT>

<BR><FONT SIZE=2 FACE="Arial">       check_interval                           60</FONT>

<BR><FONT SIZE=2 FACE="Arial">       retry_interval                           1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       active_checks_enabled                    1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       passive_checks_enabled                   1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       flap_detection_enabled                   1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       process_perf_data                        1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       retain_status_information                1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       retain_nonstatus_information             1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       contact_groups                           +servers</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notification_interval                    240</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notification_period                      24x7</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notification_options                     d,u,r</FONT>

<BR><FONT SIZE=2 FACE="Arial">       notifications_enabled                    1</FONT>

<BR><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">}</FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     srvhost-os-2003</FONT>

<BR><FONT SIZE=2 FACE="Arial">       alias                                    Servers running Windows 2003</FONT>

<BR><FONT SIZE=2 FACE="Arial">       hostgroups                               +SRV_os-win2003</FONT>

<BR><FONT SIZE=2 FACE="Arial">       check_command                            check-nt-alive</FONT>

<BR><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">}</FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     srvhost-physical</FONT>

<BR><FONT SIZE=2 FACE="Arial">       alias                                    Servers that are running on physical hardware</FONT>

<BR><FONT SIZE=2 FACE="Arial">       hostgroups                               +SRV_platform-physical</FONT>

<BR><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">}</FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">define host {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       name                                     srvhost-production</FONT>

<BR><FONT SIZE=2 FACE="Arial">       alias                                    All servers in production mode</FONT>

<BR><FONT SIZE=2 FACE="Arial">       hostgroups                               +SRV_Cls-production</FONT>

<BR><FONT SIZE=2 FACE="Arial">       contact_groups                           +helpdesk,servers,servers-off-hours,thesolver</FONT>

<BR><FONT SIZE=2 FACE="Arial">       register                                 0</FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">}</FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">define command {</FONT>

<BR><FONT SIZE=2 FACE="Arial">       command_name                             check-nt-alive</FONT>

<BR><FONT SIZE=2 FACE="Arial">       command_line                             $USER1$/check_tcp -H $HOSTADDRESS$ -p 135 -t 30</FONT>

<BR><FONT SIZE=2 FACE="Arial">}</FONT>
</P>
<BR>

<P><FONT SIZE=2 FACE="Arial">Any ideas or help is tracking this down is appreciated.  I'm pretty sure it's a bug in the code, but I suppose it's possible my configuration is off somehow.  :-) </FONT></P>

<P><FONT SIZE=2 FACE="Arial">Thanks Again, </FONT>
</P>

<P><FONT SIZE=2 FACE="Arial">-greg</FONT>
</P>

</BODY>
</HTML>