Removing host checks for non-OK passive results

Bruce Campbell nagios-devel at vicious.dropbear.id.au
Wed May 24 11:37:42 CEST 2006


On Tue, 23 May 2006, Ton Voon wrote:

> On 19 May 2006, at 19:06, Bruce Campbell wrote:
>> Or more precisely, the host may well be 'down' from one monitoring node's 
>> point of view, and 'up' from another monitoring node's point of view. Imho, 
>> each monitoring node should maintain its own idea of host's up/down state, 
>> and not send/accept host check results between themselves. Service check 
>> results are a different issue.
>
> We setup distributed monitoring across internationally spread datacenters. 
> With firewall policies, only the local monitoring server can ping their local 
> hosts. Thus the central monitoring server really has no idea about whether a 
> node is up or down - it has to rely on the slave monitoring server.

My own distributed setup doesn't have a central monitoring server, and all 
nodes are assumed to be able to ping all monitored hosts.  Theres some 
other magic happening to ensure that only one notification is sent per 
event and that nsca doesn't loop on the same check result.

>> Ideally, Nagios just runs one host check after the first non-OK service 
>> result comes in, and uses the cached value as long as it is within the 
>> host's freshness_threshold.  Otherwise, your check_latency for everything 
>> goes way up, and you eventually write your own scheduler out of irritation 
>> at seeing service checks being executed at 5 hour intervals.
>
> Hmm, not sure about writing your own scheduler :)

Attached, together with the patches required for Nagios::Config 
(Nagios::Object 0.08).  Even follows most dependencies, although you could 
probably craft a configuration that would break this without too much 
effort.

I have this running on my hosts for just the host checks at the present 
time, pending some tuits to track down some weird service check 
interactions caused by leaving check_freshness enabled.

One obvious gotcha with it at the moment is that the first execution of it 
as the Nagios (host|service)_perfdata_processing_command is 
Nagios_starttime + (host|service)_perfdata_processing_interval, not 
Nagios_starttime.  If the interval is a long time span to avoid the cpu 
load of perl parsing the config file, Nagios won't receive any results for 
that period of time.

> We considered using a "cache" value for a host status - I think the idea has 
> merit and would reduce a large number of host checks, especially if something 
> suddenly happened to a large set of services on one host. However, we baulked 
> at going ahead because there's bound to be some subtle situation where this 
> would be undesireable.

See the "Workaround for 'Host DOWN' false-positives" thread for another 
way of doing it (slurp in the entire status.dat file if you've got a small 
installation, submit passive host check results from a service check if 
you've got a large installation).  Both have the advantage of being driven 
by Nagios.

On further consideration, there is another subtle niggle in Nagios which 
would stop this from reliably working for the initial 
max_service_check_spread time.  You can see this niggle in action when you 
start up Nagios, and watch how long it takes for a service with a 
relatively low check interval to be executed.  If you're unlucky, the 
first execution of it will be several multiples of its check interval 
after Nagios has been started, and you will have seen its 'Next Check' 
time change several times.

> If the idea is validated through this thread (seems like the best way to test 
> a design!), then we maybe able to subsidise the development of it at 
> Altinity.

-- 
   Bruce Campbell

   Freelance admin, coder and cynic.  Pessimistic commentary with
   sprinklings of sardonic humour a speciality.
-------------- next part --------------
#!/usr/bin/perl

# Main functionality.
$| = 1;
use Getopt::Long qw( :config no_ignore_case );
my $chkobj = new Nagios::CheckDaemon;

if( $chkobj->parse_argv() ){


        # Run the main routine.  Exit with the inverse of the value
        # returned so shell scripts are happy.
        exit( ! $chkobj->main() );
}else{
	$chkobj->show_help();
        print STDERR "$0: Unable to continue (Tried perldoc on me?)\n";
}

exit( 1 );

=head1 NAME

run_background_checks.pl / Nagios::CheckDaemon

=head1 SYNOPSIS

run_background_checks.pl -c nagios_base_config -t number_of_seconds_to_run
-H local_host_name -m max_load_avg -l load_average_file -d -E -S

nagios_base_config is the same configuration file that Nagios is running.

number_of_seconds_to_run is the number of seconds before new checks stop
being launched.  The program will exit after any checks running have finished
or have been killed.

local_host_name is required for a slight check ordering optimisation.

max_load_avg is the load average at which point only one check is allowed to be running at a time, and the launch rate is kept to 1 per second.

load_average_file is a /proc/loadavg type file, or any file with a 
floating-point number as the first field on the first line.  This is checked
every 5 seconds for a new number.

-d enables debug mode.

-E enables running of host checks (default disabled).

-S enables running of service checks (default disabled).

=cut

package Nagios::CheckDaemon;

use Nagios::Config;
use POSIX;
use Fcntl;

sub new {
	my ($class, %args) = @_;
	my $self = {};

	bless $self, $class;

	return( $self);
}


=head1 DESCRIPTION

Nagios::CheckDaemon schedules and executes host and service checks contained
within a Nagios (www.nagios.org) configuration file.  Results from these
tests are submitted to a running Nagios using the external command file
as passive check results.

The Nagios scheduling algorithm is very much akin to the stereotypical
platoon sergant who, on noticing that a particular latrine smells, will
bawl out a handy private to clean it up, and will keep scaring the pants
off the private until the latrine does not smell.

The problem with this method is that while the sergant (Nagios) is making
sure that the initial latrine is cleaned up, other latrines are also 
starting to smell.  By the time the first one is cleaned up, 3 others are
in a similar state, and the poor, hygiene-obsessed sergant is left having
to rush around all three of these making sure that the mess is cleaned up.

Keeping within the analogy, this script's scheduling algorithm is like
the NCO with his clipboard.  At the time when the latrine should be
inspected, an inspection is duly done (service check) and the results handed
off to the sergant (Nagios) to do something about it (notification).  Since
the NCO doesn't use the enlisted men's latrines, the NCO only cares about 
reporting the state of them, and not on ordering someone out to clean them.
( ie, checks will be run fairly close to the prescribed intervals ).

=head1 INTERACTION WITH NAGIOS

The perl script is intended to be run by Nagios as the 
host_perfdata_file_processing_command every few minutes.  On execution, 
the program will block while reading in the Nagios configuration, after which 
it will fork and hand control back to Nagios.  The child process then 
watches the status_file and executes any service checks which need to be 
executed.

Within the perl script, a package named 'Nagios::CheckDaemon' exists, 
simply because the self-contained OO style allows you to have implicit
global variables without too much fuss.

The package sends results of the check commands back to Nagios using
the (external) 'command_file' found within the configuration.  If this
file becomes unwritable, or the specified run time is exceeded, the 
program will exit.

Checks are run in a similar environment to being run under Nagios.  A subset
of macros are expanded, and where this is not possible, Nagios is told
to execute the specific check via the external command file.

When configuring Nagios to use this script/package, make sure that
passive checks are enabled, active checks are disabled and freshness
checking is also enabled with a threshold of at least a minute.  This 
ensures that Nagios will not execute the checks unless this script
fails to do so for some reason.

I also recommend that the small number of absolutely critical checks 
(cpu load on monitoring box, diskspace of nagios partition and ping to
the local router top my critical list) are left with active checks enabled, 
or with a smaller freshness threshold.  This ensures that with two different 
scheduling algorithms looking at it, these will definitely be executed.

The load time of the Nagios configuration (using Nagios::Config) and thus
the block time of the main Nagios process can be significant when used with
large installations.  For this reason, having a long 
host_perfdata_file_processing_interval is a good thing, something on the order 
of 10 minutes (600).  ( The reading of the configuration is done before 
forking so any errors can be passed back to the Nagios process and thus into 
the Nagios log file).

=head1 ALGORITHM SPECIFICS

Each time the status_file used by Nagios changes, a re-ordering of 
checks to be run is performed by the 'run_order' method.  The order that the 
checks is put into is as follows:

=over 2

=item 1: Time specific checks

These are checks which should be executed now or earlier.  Unlike Nagios 
which reschedules missed checks to some vague point in the future, this
script executes such checks right away on the grounds that such checks
probably will never get executed otherwise.  They are further sorted 
based on:

=over 2

=item 1a: Checks where 'host_name' is that of the local host.

Without the health of the local host, how can remote checks be properly done?

=item 1b: Checks belonging to any of the same host groups as the local host.

This is for use in a distributed setup.

=item 1c: All others

=back

=item 2: Checks which have not been run yet.

These are the checks showing 'pending' in the status display.  I like to
get these done relatively quickly.

=item 3: Checks with high 'check_latency'.

These are checks where Nagios has scheduled a check, but it has been
generally executed late.

=back

HostDependencies (both explicit and implicit via 'parents') are followed,
and hosts which have a dependency on a host that is in a SOFT state mentioned
in 'execution_failure_criteria' will not have their check scheduled. 

ServiceDependencies (both explicit and implicit based on the host check
result) are followed, and services which have a dependency on a service that
is in a SOFT state mentioned in 'execution_failure_criteria' will not have 
their check scheduled.

Checks which have been executed within the last 2 reads of the status_file
are not scheduled.  This is because the Author wishes to avoid executing
a service check while the result from its previous execution may still be
being processed by Nagios, and thus not in the status_file as written.
However, this means that this program cannot be used to run very frequent
checks, with an interval length shorter than the interval used to write
out the status_file.  Most people will not encounter this limitation.

=cut

# Ordering of checks is done as follows:
#	1/ Checks where max( next_check, last_check ) + min( retry_interval,
#		normal_interval ) < current time.
#		1a/ Checks where 'host_name' is that of the localhost.
#		1b/ Checks belonging to any of the same groups as the localhost.
#		1c/ Other checks.
#	2/ Checks where 'has_been_checked' is 0 (ie, the command has not
#		been run within this run of Nagios).
#	3/ Checks where 'check_latency' is greater than 2 interval_lengths.
#
# This function is called after the status.dat file has been read in.
sub run_order {
	my $self = shift;

	local $SIG{ALRM} = sub { die "timeout in performing ordering\n" }; # NB: \n required
	alarm( 120 );

	$self->debug( "Starting ordering" );

	my $retval = 0;

	my %hoststatecheck = ( "0",	"o",
				"1",	"d",
				"2",	"u",
				);
	my %svcstatecheck = ( "0",	"o",
				"1",	"w",
				"2",	"c",
				"3",	"u",
				);

	# $self has the following variables (OO's globals):
	#	config - Nagios config object.
	#	status - Status hash. {hostname}{service_desc}{status.dat-var}
	# The following is set:
	#	runorder - @array containing time;hostname;service_desc

	my $interval_length = 60;
	my $localhostobj = undef;
	my %localhostgroups = ();
	my %localsvcgroups = ();
	if( defined( $self->{'config'} ) ){
		if( defined( $self->{'config'}->get( 'interval_length') ) ){
			$interval_length = $self->{'config'}->get( 'interval_length');
		}
		if( $interval_length !~ /^\s*\d+\s*$/ ){
			$interval_length = 60;
		}
		$localhostobj = $self->{'config'}->find_object( $self->{'localhost'}, "Nagios::Host" );
		if( defined( $localhostobj ) ){
			foreach my $hostgroup( $localhostobj->list_hostgroups ){
				$localhostgroups{$hostgroup->name()}++;
				# $self->debug( "Got " . $hostgroup->name . " from $hostgroup\n" );
			}
			foreach my $service( $localhostobj->list_services ){
				# $self->debug( "Found service $service\n" );
				foreach my $servicegroup( $service->list_servicegroups ){
					# $self->debug( "Found servicegroup $servicegroup\n" );
					$localservicegroups{$servicegroup->name}++;
				}
			}
		}
	}

	@{$self->{'runorder'}} = ();

	# Start running through the status hash.
	foreach my $host( keys %{$self->{'status'}} ){
		next unless( defined( $host ) );

		# $self->debug( "Found host $host\n" );

		# Cache the status of this host's last check.  We'll
		# be referring to this fairly often.
		my $hoststatus = $self->{'status'}{$host}{$self->{'default_host_service'}}{'current_state'};
		$hoststatus = 0 unless( defined( $hoststatus ) );
		$hoststatus = 0 unless( $hoststatus =~ /^\d+$/ );
		my $hostchecked = $self->{'status'}{$host}{$self->{'default_host_service'}}{'has_been_checked'};
		$hostchecked = 1 unless( defined( $hostchecked ) );
		$hostchecked = 1 unless( $hostchecked =~ /^\d+$/ );

		# Scan the dependency tree for this host's parents.  Set to
		# 1 if any of them are down, which inhibits the check
		my $hostparents = 0;
		my %depchecks = ();
		my %dochecks = ( "$host;" . $self->{'default_host_service'}, "0" );
		my $doneall = 0;
		while( ! $doneall ){
			$doneall=1;
			foreach my $docheck( keys %dochecks ){
				next unless( defined( $docheck ) );
				next unless( $docheck =~ /^([^;]+)\;(.*)$/ );
				my $chost = $1;
				my $csvc = $2;
				next unless( defined( $self->{'depends'}{$chost}{$csvc} ) );

				foreach my $depval( @{$self->{'depends'}{$chost}{$csvc}} ){
					
					next unless( $csvc eq $self->{'default_host_service'} );
					next unless( defined( $depval ) );
					next if( defined( $depchecks{$depval} ) );
					$doneall = 0;
					$depchecks{$depval}++;

					my @scsplit = split( /;/, $depval );
					my $chkhost = $scsplit[0];
					my $chksvc = $scsplit[1];
					my $execfail = $scsplit[2];
					my $followparents = $scsplit[3];

					# Get the state and
					# has_been_checked of
					# the service.
					my $chkstate = $self->{'status'}{$chkhost}{$chksvc}{'current_state'};
					my $donecheck = $self->{'status'}{$chkhost}{$chksvc}{'has_been_checked'};
					$chkstate = 0 unless( defined( $chkstate ) );
					$donecheck = 0 unless( defined( $donecheck ) );

					# Run through the states where we do
					# not execute.
					my @statesplit = split( /,/, $execfail );
					foreach my $badstate( @statesplit ){
						$hostparents++ if( $badstate =~ /^\s*p\s*$/i && ! $donecheck );
						next unless( defined( $hoststatecheck{$chkstate} ) );
						$statecmp = $hoststatecheck{$chkstate};
						# $self->debug( "$chkhost has $statecmp against $badstate" );
						$hostparents++ if( $badstate =~ /^\s*$statecmp\s*$/i );
					}


					# If we have to follow
					# the parents, add them
					# to the check.
					if( $followparents && ! $hostparents ){
						$dochecks{$chkhost . ";" . $chksvc}++;
					}
				}
			}
		}

		foreach my $svc( keys %{$self->{'status'}{$host}} ){
			next unless( defined( $svc ) );
			# $self->debug( "Found service $svc\n" );

			# Skip unless it is time to run, or it has not been
			# run.
			my $nextrun = 0;
			my $position = 0;
			my $has_check_latency = 0;

			my $last_check = $self->{'status'}{$host}{$svc}{'last_check'};
			$last_check = 0 unless( defined( $last_check ) );
			$last_check = 0 unless( $last_check =~ /^\s*\d+\s*$/ );
			my $next_check = $self->{'status'}{$host}{$svc}{'next_check'};
			$next_check = 0 unless( defined( $next_check ) );
			$next_check = 0 unless( $next_check =~ /^\s*\d+\s*$/ );
			my $use_check = $last_check;
			if( $next_check > $use_check ){
				$use_check = $next_check;
			}
			my $normal_check_interval = $self->{'status'}{$host}{$svc}{'normal_check_interval'};
			$normal_check_interval = 0 unless( defined( $normal_check_interval ) );
			$normal_check_interval = 0 unless( $normal_check_interval =~ /^\s*\d+\s*$/ );
			my $use_interval = $normal_check_interval;
			if( $self->{'status'}{$host}{$svc}{'current_state'} ){
				if( defined( $self->{'status'}{$host}{$svc}{'current_attempt'} ) && defined( $self->{'status'}{$host}{$svc}{'max_attempts'} ) ){
					if( $self->{'status'}{$host}{$svc}{'current_attempt'} < $self->{'status'}{$host}{$svc}{'max_attempts'} ){
						my $retry_check_interval = $self->{'status'}{$host}{$svc}{'retry_check_interval'};
						$retry_check_interval = 0 unless( defined( $retry_check_interval ) );
						$retry_check_interval = 0 unless( $retry_check_interval =~ /^\s*\d+\s*$/ );
						$use_interval = $retry_check_interval;
					}
				}
			}

			if( ( $use_check + ( $use_interval * $interval_length ) ) < time ){
				# Ordering 1, run now.
				$nextrun = time;
			}elsif( ! $self->{'status'}{$host}{$svc}{'has_been_checked'} ){
				# Ordering 2, run soon.
				$nextrun = $use_check + ( $use_interval * $interval_length );
				$position = 1;
			}elsif( defined( $self->{'status'}{$host}{$svc}{'check_latency'} ) ){
				# Ordering 3, run rather soon.
				if( $self->{'status'}{$host}{$svc}{'check_latency'} =~ /^\s*\d+(\.\d+?)\s*$/ ){
					if( $self->{'status'}{$host}{$svc}{'check_latency'} > ( $interval_length * 2 ) ){
						$nextrun = time + 5;
						$position = 1;
						$has_check_latency = 1;
					}
				}
			}

			next unless( $nextrun );

			# Follow dependencies all the time.
			# Had thought to still run checks which have a high
			# check latency, but this would be against the
			# reading of 'execution_failure_criteria'.
			if( ! $has_check_latency || 1 == 1 ){
				# First check.  Skip if the host has not 
				# been checked yet.
				my $badparents = 0;
				$badparents++ unless( $hostchecked );

				# Second check.  Skip if the host itself is 
				# down.
				$badparents++ if( $hoststatus != 0 && $svc ne $self->{'default_host_service'} );

				# Third check.  Skip if any of the host's 
				# parents are down.
				$badparents++ if( $hostparents );

				# Fourth check.  Run through the dependency
				# tree and make sure that everything that
				# this check depends on is up.
				%depchecks = ();
				%dochecks = ();
				%dochecks = ( "$host;$svc", "0" );
				$doneall = 0;
				$doneall = 1 if( $badparents );
				while( ! $doneall ){
					$doneall=1;
					foreach my $docheck( keys %dochecks ){
						next unless( defined( $docheck ) );
						next unless( $docheck =~ /^([^;]+)\;(.*)$/ );
						my $chost = $1;
						my $csvc = $2;
						next unless( defined( $self->{'depends'}{$chost}{$csvc} ) );

						foreach my $depval( @{$self->{'depends'}{$chost}{$csvc}} ){
							
							# Don't do it if this
							# is a host check, as 
							# we've already done 
							# that previously to 
							# result in the 
							# $hostparents var.
							next if( $csvc eq $self->{'default_host_service'} );
							next unless( defined( $depval ) );
							next if( defined( $depchecks{$depval} ) );
							$doneall = 0;
							$depchecks{$depval}++;

							my @scsplit = split( /;/, $depval );
							my $chkhost = $scsplit[0];
							my $chksvc = $scsplit[1];
							my $execfail = $scsplit[2];
							my $followparents = $scsplit[3];

							# Get the state and
							# has_been_checked of
							# the service.
							my $chkstate = $self->{'status'}{$chkhost}{$chksvc}{'current_state'};
							my $donecheck = $self->{'status'}{$chkhost}{$chksvc}{'has_been_checked'};
							$chkstate = 0 unless( defined( $chkstate ) );
							$donecheck = 0 unless( defined( $donecheck ) );

							# Run through the 
							# states where we do
							# not execute.
							my @statesplit = split( /,/, $execfail );
							foreach my $badstate( @statesplit ){
								$badparents++ if( $badstate =~ /^\s*p\s*$/i && ! $donecheck );
								my $strcmp = $svcstatecheck{$chkstate};
								# $self->debug( "$chkhost;$chksvc badstate is $badstate, strcmp is $strcmp, chkstate is $chkstate - $depval" );
								next unless( defined( $svcstatecheck{$chkstate} ) );
								$badparents++ if( $badstate =~ /^\s*$strcmp\s*$/i );
							}
	

							# If we have to follow
							# the parents, add them
							# to the check.
							if( $followparents && ! $badparents ){
								$dochecks{$chkhost . ";" . $chksvc}++;
							}

							# $self->debug( "Checking dependencies of $depval for $chost, $csvc - $badparents, $chkstate, $donecheck" );
						}
					}
				}

				if( $badparents ){
					$self->debug( "$host;$svc has parents in bad state, not scheduling" );
					next;
				}

			}

			# If this does not belong to any groups that the
			# local host belongs to, we put it right at the
			# end; position == 2.  This ensures that services
			# on the local host or within the local site get
			# priority over services on remote hosts or sites,
			# which makes the whole distributed thing work far 
			# smoother.
			my $thishostobj = $self->{'config'}->find_object( $host, "Nagios::Host" );
			my $insamegroup = 0;
			my $oktogo = 0;
			if( defined( $thishostobj ) && defined( $localhostobj ) ){
				$oktogo = 1;
			}
			if( $oktogo ){
				foreach my $hostgroup( $thishostobj->list_hostgroups ){
					if( defined( $localhostgroups{$hostgroup->name()} ) ){
						$insamegroup++;
					}
				}
			}

			# Listing of services takes a long time.  We don't do
			# it.
			if( $oktogo && ! $insamegroup && 1 == 2 ){
				foreach my $service( $thishostobj->list_services ){
					next if( $insamegroup );
					foreach my $servicegroup( $service->list_servicegroups ){
						if( defined( $localservicegroups{$servicegroup->name()} ) ){
							$insamegroup++;
						}
					}
				}
			}

			if( $oktogo ){
				if( defined( $self->{'status'}{$host}{$svc}{'plugin_output'} ) && ! $insamegroup ){
					if( $self->{'status'}{$host}{$svc}{'plugin_output'} =~ /\((H|S)[^\)]+\)\s*$/ ){
						# Matches path tracing output.
						# Leave until the end, as the
						# other site may well do it
						# before we get around to it.
						$position = 2;
					}
				}elsif( ! $insamegroup ){
					$position = 2;
				}elsif( $position == 0 && $svc ne $self->{'default_host_service'} ){
					# Check to see if the host is down.
					# If it is, we want to put this
					# test at the end, unless it is a
					# host check.
					my $curstate = $self->replace_vars( Statusvar => "current_state", Host => $host );
					if( $curstate != 0 ){
						$position = 2;
					}
				}
			}
			
			# put into the appropriate position.
			my $lenrun = scalar @{$self->{'runorder'}};
			my $insertpos = 0;
			if( $position == 1 ){
				$insertpos = int( $lenrun / 2 ) + 1;
			}elsif( $position == 2 ){
				$insertpos = $lenrun;
			}

			# $self->debug( "Adding entry as $insertpos - $nextrun;$host;$svc  - " . scalar @{$self->{'runorder'}} . " X \n" );
			splice( @{$self->{'runorder'}}, $insertpos, 0, $nextrun . ";" . $host . ";" . $svc );
			$retval++;
		}
	}

	alarm( 0 );
	return( $retval );
}

=head1 PROGRAM OPERATION

After the arguments are parsed, the config file read in, and the
program has forked, life in the 'main_loop' routine settles down to 
a fairly easy life.

There are two conditions which cause the program to exit; the absence
of the external command file (indicating that Nagios has exited, and
there is nothing to submit results to anymore), and expiration of
the supplied run time.

Whilst waiting for either of these two conditions, the program checks
for an updated status_file, and on finding one, re-orders the 
checks to be run.  It then requests checks to be executed up to the 
max_concurrent_checks value read from the Nagios configuration file, assuming
that any checks can be executed based on time, and that the machine's 
cpu load is not over the supplied limit.

Checks executed are periodically checked to see whether they have finished
or have exceeded the greater of host_check_timeout or service_check_timeout,
at which point they are terminated.  Checks which have finished are
submitted to Nagios via the command_file .

If nothing was done, the program snoozes for a little period of time, then
starts all over again.

=cut

# The main loop which ends on certain conditions.
sub main_loop {
	my $self = shift;

	my $status_file = $self->{'config'}->get( 'status_file' );
	my $command_file = $self->{'config'}->get( 'command_file' );

	if( ! defined( $self->{'endtime' } ) ){
		$self->{'endtime'} = time - 1;
	}

	$self->{'mustendtime'} = $self->{'endtime'} + $self->{'default_exectime'} + 5;

	my $doloop = 1;

	$doloop = 0 unless( defined( $status_file ) );
	$doloop = 0 unless( defined( $command_file ) );
	my $nextstatus = 0;
	my $nextload = 0;
	my $curload = 0;
	my $lastsub = 0;
	my $numrunning = 0;
	my $maxrun = $self->{'config'}->get( 'max_concurrent_checks' );
	$maxrun = 10 unless( defined( $maxrun ) );
	$maxrun = 10 unless( $maxrun =~ /^\s*\d+\s*$/ );

	while( $doloop ){

		# Check to see whether more processes can be launched.
		# Have we run out of time?
		my $canlaunch = 0;
		if( time < $self->{'endtime'} ){
			$canlaunch = 1;
		}else{
			# When we must end by.
			if( time > $self->{'mustendtime'} ){
				$retval = 0;
				$doloop = 0;
				next;
			}else{
				$retval = 1;
			}
		}

		# Check to see whether results can be submitted.
		if( defined( $self->{'last_commandfile_write'} ) ){
			if( ! $self->{'last_commandfile_write'} ){
				# The last command could not write the
				# results.  Presumably it has gone away,
				# and we should not continue.
				$canlaunch = 0;
				$retval = 0;
			}
		}

		# Grab the most recent timestamp for the status.dat file.
		my $readstatus = 1;
		if( defined( $self->{'statustimes'} ) ){
			$readstatus = 0;
			if( time > $nextstatus ){
				my $curstatustime = ${$self->{'statustimes'}}[0];
				my @stats = stat( $status_file );
				if( $stats[9] > $curstatustime ){
					$readstatus = 1;
				}
				# $self->debug( "Checked status file = " . $stats[9] . " vs $curstatustime - $readstatus - " . join( ":", @{$self->{'statustimes'}} ) . " " );
				$nextstatus = time + 5;
			}
		}

		# Read and sort.
		if( $readstatus > 0 ){
			if( $self->read_status( $status_file ) ){
				$self->run_order();

				# Clean up the 'executed' cache.  This is
				# a construct which stops checks from being
				# executed too frequently.  Three reads of
				# the status file must pass before they get
				# deleted, and thus can be executed again.
				if( scalar @{$self->{'statustimes'}} > 3 && defined( $self->{'executed'} ) ){
					my $delbefore = pop @{$self->{'statustimes'}};
					foreach my $host( keys %{$self->{'executed'}} ){
						my %todel = ();
						foreach my $svc( keys %{$self->{'executed'}{$host}} ){
							next unless( defined( $svc ) );
							next unless( defined( $self->{'executed'}{$host}{$svc} ) );
							next unless( $self->{'executed'}{$host}{$svc} =~ /^\s*\d+\s*$/ );
							next unless( $self->{'executed'}{$host}{$svc} > $delbefore );
							$todel{$svc}++;
						}
						foreach my $svc( keys %todel ){
							delete( $self->{'executed'}{$host}{$svc} );
						}
					}
				}
			}else{
				# No status.dat?  Nagios has probably exited,
				# but it could also be that we tried in the
				# middle of a write (on OSes where the move 
				# by Nagios from temp to status.dat is not
				# atomic).  We don't do anything.
			}
		}

		local $SIG{ALRM} = sub { die "timeout in checking launched results\n" }; # NB: \n required
		alarm( 120 );

		# Can we launch?
		if( $canlaunch ){
			# We can.  Calculate how many are being run right
			# now, and how many are allowed to be run, then
			# pass that number off to the launch function.
			$canlaunch = $maxrun - $numrunning;

			# Slow down the rate of launching.
			if( $canlaunch > 1 ){
				$canlaunch = int( $canlaunch / 2 );
			}

			# Put an upper bound on it.
			$canlaunch = 0 if( $canlaunch < 0 );
			$canlaunch = $maxrun if( $canlaunch > $maxrun );

			# Time to check the cpuload.
			if( $canlaunch > 0 && time > $nextload ){
				# Get the current cpu time.  We assume
				# that we're dealing with /proc/loadavg.
				if( defined( $self->{'default_load_file'} ) ){
					if( open( LOADFILE, $self->{'default_load_file'} ) ){
						my $line = <LOADFILE>;
						if( $line =~ /^\s*(\d+\.?\d*)(\s+.*)?$/ ){
							$curload = $1;
						}
						close( LOADFILE );
					}
				}
				$nextload = time + 10;
			}

			# Over the maximum load, we only allow one process
			# to run at once.  This ensures that some things
			# are still run.  Ideally we would then reorder
			# and only run those that are important, but that
			# is hard to determine.
			if( $canlaunch > 0 ){
				if( defined( $self->{'default_max_load'} ) ){
					if( $curload > $self->{'default_max_load'} ){
						if( $numrunning == 0 && time > $lastsub ){
							$canlaunch = 1;
						}else{
							$canlaunch = 0;
						}
					}
				}
			}

			# Increment the count of how many are running.
			if( $canlaunch > 0 ){
				$numrunning += $self->launch_commands( $canlaunch );
			}else{
				# Snooze for a bit to be nice to the cpu.
				select( undef, undef, undef, 0.1 );
			}

			# We need this one's original value later.
			$canlaunch = 1;
		}

		# Scan for things to be collected.
		if( $numrunning > 0 && defined( $self->{'executing'} ) ){

			# Run through all of the possibles, check for
			# any pids that have exited or exceeded their
			# runtimes.  See if their filehandles are ready
			# to be read.
			my $curtime = time;
			my %todelhosts = ();
			my $numhosts = 0;
			my $numdid = 0;
			foreach my $host( keys %{$self->{'executing'}} ){
				next unless( defined( $host ) );
				$numhosts++;
				my $numsvcs = 0;
				my %todelsvcs = ();
				foreach my $svc( keys %{$self->{'executing'}{$host}} ){
					next unless( defined( $svc ) );
					$numsvcs++;
					next unless( defined( $self->{'executing'}{$host}{$svc}{'pid'} ) );

					# See if the pid has finished.
					my $waitret = waitpid( $self->{'executing'}{$host}{$svc}{'pid'}, WNOHANG );
					my $pidexit = $? >> 8;
					my $fh = $self->{'executing'}{$host}{$svc}{'fh'};
					if( $waitret != 0 ){
						# The pid possibly finished.
						$self->{'executing'}{$host}{$svc}{'exit'} = $pidexit;
					}elsif( $self->{'executing'}{$host}{$svc}{'end'} < $curtime ){
						$self->debug( "Killing process for $host, $svc, " . $self->{'executing'}{$host}{$svc}{'pid'} );
						kill 9, $self->{'executing'}{$host}{$svc}{'pid'};
						$self->{'executing'}{$host}{$svc}{'exit'} = 127;
					}

					# See if the filehandle is ready to 
					# give up anything.
					my $rbuf = '';
					my $rret = sysread( $fh, $rbuf, 2048 );
					if( defined( $rret ) ){
						$self->{'executing'}{$host}{$svc}{'output'} .= $rbuf;
						# $self->debug( "Sysread of $host, $svc returned $rbuf X" ) if( length( $rbuf ) > 0 );
					}else{
						# $self->debug( "No output from sysread\n" );
					}

					# If the process exited, we need to
					# submit the result and the output
					# to Nagios.
					if( defined( $self->{'executing'}{$host}{$svc}{'exit'} ) ){
						close( $fh );
						# $? is properly set after the close.
						$pidret = $? >> 8;
						if( $pidret != -1 && $self->{'executing'}{$host}{$svc}{'exit'} == -1 ){
							$self->{'executing'}{$host}{$svc}{'exit'} = $pidret;
						}

						$self->tell_nagios_output( Host => $host, Service => $svc, Output => $self->{'executing'}{$host}{$svc}{'output'}, Exit => $self->{'executing'}{$host}{$svc}{'exit'} );
						$todelsvcs{$svc}++;
						$numrunning--;
						$numdid++;
						$lastsub = time;
					}

					$self->{'executing'}{$host}{$svc}{'fh'} = $fh;
				}

				# Delete services that finished.
				foreach my $svc( keys %todelsvcs ){
					delete( $self->{'executing'}{$host}{$svc} );
					$numsvcs--;
				}

				# Flag this host for deletion if there
				# are no service entries here.
				if( $numsvcs <= 0 ){
					$todelhosts{$host}++;
				}
			}

			# Delete hosts with no service entries.
			foreach my $host( keys %todelhosts ){
				delete( $self->{'executing'}{$host} );
				$numhosts--;
			}

			# This is not needed I think.
			if( $numhosts <= 0 ){
				# delete( $self->{'executing'} );
			}

			# Snooze.
			if( $numdid == 0 ){
				select( undef, undef, undef, 0.2 );
			}


		}elsif( $canlaunch == 0 ){
			# Nothing running, and we were not allowed to
			# launch anything.  Obviously the end condition
			# was reached, either out of time or unable to
			# write.
			$doloop=0;
		}else{
			# Nothing running, and we are allowed to continue.
			# snooze until the next processes can be executed.
			select( undef, undef, undef, 0.4 );
		}
		alarm( 0 );
	}

	$self->debug( "Returning $retval" );
	return( $retval );
}

=head1 LAUNCHING CHECKS

The individual checks are executed by the 'launch_commands' method.  This
takes a single numeric argument giving the maximum number of checks to be
launched, and returns the number of checks launched.  When launching
the checks, no distinction is made between host and service checks (ie,
the infamous issue of the main routine blocking whilst waiting for a 
host check to complete does not occur).

The package will not execute more than two service checks against the
same host at the same time, continuing the principles explained in the
Nagios 'checkscheduling' documents; avoid overloading a given host with
too many checks at once.

Certain of the Nagios macros are expanded.  For the checks containing
macros which cannot be expanded, Nagios is told to schedule a check for it 
as soon as possible via the external command file.

Note that this launches checks which have been scheduled to run, and do
not have a dependency stopping them from being run.  If there appear to 
be checks which should be run, but are not being run, run the program in
debug mode to see which checks are not being scheduled due to one or more
of its parent's being down.

Depending on the number of services that you have defined, and the average
scheduled check_interval, you will want to tweak the max_concurrent_checks
value to ensure that all of the checks will be executed within the average
check_interval .  To calculate this, take the number of services defined
(875), remembering to also count the number of host checks (a further 100),
and divide this number by the average interval length (2 minutes), eg:
975 / 2 = 487.  Dividing this value by 60 (the interval length), gives
us an average of 8.1 checks per second.  After applying a fudge factor
to cater for Nagios possibly requesting checks at a higher interval, we 
end up with a figure of 10 for the max_concurrent_checks .

This figure differs from the Nagios suggestion for the same number of
service checks/interval length (73), as this package is not worrying about 
checks that are waiting for their result to be processed, only checks that 
still need to be executed.  In practice, putting the value up to 20 will 
ensure that all of the checks will be executed roughly around their
scheduled time.

=cut

# Launch a series of commands
sub launch_commands {
	my $self = shift;
	local $SIG{ALRM} = sub { die "timeout in launching commands\n" }; # NB: \n required
	alarm( 120 );

	my $max_launch = shift;

	$max_launch = 20 unless( defined( $max_launch ) );
	$max_launch = 20 unless( $max_launch =~ /^\s*\d+\s*$/ );

	my $retval = 0;
	my $counter = 0;
	my @executed = ();

	my $stime = time;
	my $etime = $stime + 15;
	my $maxrun = scalar @{$self->{'runorder'}};

	# Get a list of hosts that currently have checks running on them.
	# We try to avoid running too many checks against the same host.
	my %numhosts = ();
	foreach my $host( keys %{$self->{'executing'}} ){
		next unless( defined( $host ) );
		foreach my $svc( keys %{$self->{'executing'}{$host}} ){
			next unless( defined( $svc ) );
			# Increment the count of service checks being run
			# on this host.
			$numhosts{$host}++;
		}
	}

	# How many checks can we run on the same host at a given time?
	my $maxhost = 2;


	while( $counter < $maxrun && $retval < $max_launch ){
		# Examine the next one.  If the exec time is less than
		# the current time, it gets executed.  If it is still pending,
		# we skip it.
		my $curtime = time;
		if( $curtime > $etime ){
			$counter = $maxrun;
			next;
		}

		my $exectime = $curtime;
		my $host = undef;
		my $svc = undef;
		my $cmd = undef;
		my $cmdobj = undef;
		my @cmdsplit = ();

		my $entry = ${$self->{'runorder'}}[$counter];
		my $oldcounter = $counter;

		# Increment here so the next portions don't need to.
		$counter++;
		if( defined( $entry ) ){
			if( $entry =~ /^(\d+);(\S+);(\S+.*)$/ ){
				( $exectime, $host, $svc ) = ($1, $2, $3);
				if( ! defined( $self->{'status'}{$host}{$svc}{'check_command'} ) ){
					next;
				}else{
					$cmd = $self->{'status'}{$host}{$svc}{'check_command'};
					if( defined( $cmd ) ){
						@cmdsplit = split( /!/, $cmd );
						$cmdobj = $self->{'config'}->find_object( $cmdsplit[0], "Nagios::Command" );
					}

				}
			}
			if( $exectime > $curtime ){
				next;
			}
		}else{
			next;
		}

		if( defined( $cmdobj ) && ! defined( $self->{'executed'}{$host}{$svc} ) && ! defined( $self->{'executing'}{$host}{$svc} ) ){
			# Got a command object, and this one has not been
			# executed yet.  Is this a valid one to execute?
			if( $svc eq $self->{'default_host_service'} ){
				if( $self->{'_run_host'} == 0 ){
					push @executed, $oldcounter;
					next;
				}
			}else{
				if( $self->{'_run_service'} == 0 ){
					push @executed, $oldcounter;
					next;
				}
			}


			# Need to do search and replace on the command string.
			my $cmdline = $cmdobj->command_line;
			next unless( defined( $cmdline ) );

			# Avoid if there are too many checks running on this
			# host.
			if( defined( $numhosts{$host} ) ){
				next if( $numhosts{$host} >= $maxhost );
			}

			my $cmdcount = 1;
			while( $cmdcount < 33 ){
				if( $cmdline =~ /\$ARG$cmdcount\$/ && defined( $cmdsplit[$cmdcount] ) ){
					$cmdline =~ s/\$ARG$cmdcount\$/$cmdsplit[$cmdcount]/g;
				}
				if( $cmdline =~ /\$USER$cmdcount\$/ ){
					my $resourcevar = $self->{'resource'}->get_attr( '$USER' . $cmdcount . '$' );
					if( defined( $resourcevar ) ){
						$cmdline =~ s/\$USER$cmdcount\$/$resourcevar/g;
					}
				}

				$cmdcount++;
			}

			# Any more expansions to be performed?  Minor
			# optimisation in skipping this.
			if( $cmdline =~ /\$[^\$]+\$/ ){
				$cmdline = $self->replace_vars( Line => $cmdline, Host => $host, Service => $svc );
			}

			next unless( $cmdline =~ /^(\S+)(\s+.*)?$/ );
			my $cmdexec = $1;
			my $cmdrest = $2;
			$cmdrest = "" unless( defined( $cmdrest ) );

			next unless( -x $cmdexec && ! -d $cmdexec );

			# Actually execute it.
			my $fh = undef;
			if( $cmdline !~ /\$[^\$]+\$/ ){
				my $pid = open( $fh, $cmdexec . $cmdrest . " |" );
				if( $pid > 0 ){
					# Set the filehandle to be non-blocking.
					$self->debug( "Executed $cmdline, got pid $pid\n" );
					my $flags = '';
					fcntl( $fh, F_GETFL, $flags );
					$flags |= O_NONBLOCK;
					fcntl( $fh, F_SETFL, $flags );
					$self->{'executing'}{$host}{$svc}{'pid'} = $pid;
					$self->{'executing'}{$host}{$svc}{'fh'} = $fh;
					$self->{'executing'}{$host}{$svc}{'start'} = $curtime;
					$self->{'executing'}{$host}{$svc}{'end'} = $self->{'default_exectime'} + $curtime;
					push @executed, $oldcounter;
					$self->{'executed'}{$host}{$svc} = $curtime;

					# Increment the count of service
					# checks currently running on this 
					# host. 
					$numhosts{$host}++;
					$retval++;
				}else{
					# An error happened in execution.
					# Treat this one as having been
					# executed this time so it gets 
					# deleted, but it will be tried 
					# again after the next read of 
					# the status.dat file.
					push @executed, $oldcounter;
				}
			}else{
				# Submit this using the Nagios command
				# file, as there appear to be expansions
				# in there which this program cannot perform.
				if( $self->tell_nagios_to_execute( Host => $host, Service => $svc, Time => $exectime ) ){
					push @executed, $oldcounter;
					$retval++;
					$self->debug( "$host;$svc has an unrecognised macro in the check command.  I told Nagios to execute it." );
					$self->{'executed'}{$host}{$svc} = $curtime;
				}
			}
		}
	}

	# $self->debug( "After loop - $stime, $etime, $curtime, $retval\n";

	# Run through the items executed, and remove them from the listing.
	if( ( scalar @executed ) > 0 ){
		# Must use pop so we count from highest to lowest, and thus
		# don't break up the ordering of the array, deleting the
		# wrong value.
		foreach my $counter( pop @executed ){
			splice( @{$self->{'runorder'}}, $counter, 1 );
		}
	}else{
		# We did not find anything.  Snooze for a little bit
		# to be nice to the cpu.
		select( undef, undef, undef, 0.1 );
	}

	# Reset the alarm
	alarm( 0 );

	# Return the number of processes that were launched.
	return( $retval );
}

# This script runs the background checks for Nagios, as the scheduling
# algorithm in Nagios is occasionally better suited for deciding when
# to clean out the latrines.  To be more precise, Nagios tends to latch
# onto services/hosts which smell, at the expense of performing regular
# checks on other services/hosts, which eventually smell leaving a much
# larger mess which Nagios is unable to clean up before the next event
# occurs.

# This script is intended to be run as one of the host or service performance
# processing commands, which is triggered by Nagios every X seconds without
# fail.  The operation of this script is as follows:
#
#	Run by Nagios
#	Read in Nagios configuration file.
#	fork
#	Parent exits, returning control to Nagios
#	Child runs in loop until just before next invocation of this
#		script, according to Nagios config file.
#	Within loop, the service and host check_command definitions are
#		run as defined in the Nagios config files.
#


# Main routine.
sub main {
	my $self = shift;

	# Read in the configuration file first.
	$self->debug( "Reading Nagios configuration file" );
	my $retval = 0;
	if( $self->read_config( $self->{'config_file'} ) ){

		# Good to go.  Make sure the status file is readable.
		$self->debug( "Reading Nagios status file" );
		if( $self->read_status( $self->{'config'}->get( 'status_file' ) ) ){

			# Fork, allowing the parent to return.
			my $pid = fork();

			if( $pid == 0 ){
				# We are the child.  Close handles that
				# we inherited.
				POSIX::setsid() || die "Can't start a new session: $!";
				close( STDIN );
				close( STDOUT );
				# close( STDERR );
				local $SIG{INT} = sub { die "Received INT signal\n" }; # NB: \n required
				local $SIG{TERM} = sub { die "Received TERM signal\n" }; # NB: \n required

				# We shouldn't be running for long enough to
				# require re-reading the configuration file.
				local $SIG{HUP} = 'IGNORE';
				local $SIG{PIPE} = 'IGNORE';

				# Perform the initial ordering.
				$self->run_order;

				# Invoke the loop.
				$retval = $self->main_loop;
			}else{
				# We are the parent process.  Exit.
				$retval = 1;
			}
		}
	}
	$self->debug( "Returning $retval (should be exiting)" );
	return( $retval );
}

# Display of help
sub show_help {
	my $self = shift;

	print "$0: External scheduler and check runner for Nagios\n";
	print "Usage: $0 -c nagios_config_file \n";
	print "\t\t-H localhostname\n";
	print "\t\t-t runtime_in_seconds\n";
	print "\t\t-m max_load_average\n";
	print "\t\t-l /proc/loadavg-type-file\n";
	print "\t\t-d  (enable debugging)\n";
	print "\t\t-E  (enable host checks, default disabled)\n";
	print "\t\t-S  (enable service checks, default disabled)\n";
	print "\n";
	print "Executes upcoming host and service checks using a different scheduling\n";
	print "algorithm than the one Nagios uses.  Communicates results back to Nagios\n";
	print "using the command file defined in the configuration.  Relies on Nagios\n";
	print "updating the status file to determine which checks need to be run next.\n";
}

# Parsing of command line arguments.
sub parse_argv {
	my $self = shift;

	my %getopthash = ();
	# Works with individuals, not when combined with @arrays.
	my $result = main::GetOptions( \%getopthash,
					"c=s",
					"H=s",
					"t=i",
					"m=s",
					"l=s",
					"E",
					"S",
					"d",
                                        );

	my $retval = 0;

	if( defined( $getopthash{"c"} ) ){
		# Config File.
		if( -f $getopthash{"c"} && -r $getopthash{"c"} ){
			$retval++;
			$self->{'config_file'} = $getopthash{"c"};
		}else{
			print STDERR "Configuration file found is not a file, or not readable.\n";
		}
	}else{
		print STDERR "No nagios base configuration file found via -c \n";
	}
	if( defined( $getopthash{"H"} ) && $retval ){
		$self->{'localhost'} = $getopthash{"H"};
	}else{
		print STDERR "Both localhost (-H) and configuration file (-c) must be supplied.\n";
		$retval = 0;
	}

	if( defined( $getopthash{"t"} ) && $retval ){
		$self->{'endtime'} = time + $getopthash{"t"};
	}else{
		print STDERR "Need -H, -c and number of seconds to run for (-t)\n";
		$retval = 0;
	}

	if( defined( $getopthash{"d"} ) ){
		$self->{'_debug'} = 1;
	}

	if( defined( $getopthash{"S"} ) ){
		$self->{'_run_service'} = 1;
	}else{
		$self->{'_run_service'} = 0;
	}
	if( defined( $getopthash{"E"} ) ){
		$self->{'_run_host'} = 1;
	}else{
		$self->{'_run_host'} = 0;
	}


	if( defined( $getopthash{"l"} ) && defined( $getopthash{"m"} ) ){
		if( $getopthash{"m"} =~ /^\s*\d+(\.\d+?)\s*$/ ){
			$self->{'default_max_load'} = $getopthash{"m"};
		}
		if( -r $getopthash{"l"} && ! -d $getopthash{"l"} ){
			$self->{'default_load_file'} = $getopthash{"l"};
		}
	}

	return( $retval );

}



# Reading in of the nagios.cfg file.
sub read_config {
	my $self = shift;
	my $base_file = shift;

	my $retval = 0;
	local $SIG{ALRM} = sub { die "timeout in reading config file\n" }; # NB: \n required
	alarm( 120 );

	if( -f $base_file && -r $base_file ){
		$self->{'config'} = Nagios::Config->new( Filename => $base_file );
		$retval = 1;
	}

	if( defined( $self->{'config'} ) ){
		$self->debug( "Have config of " . $self->{'config'} . "\n" );
		my $resourcefile = $self->{'config'}->get( 'resource_file' );
		$self->{'resource'} = Nagios::Config::File->new( $resourcefile );

		# Run through the config, looking for a unique name.
		if( ! defined( $self->{'default_host_service'} ) ){
			my %svcs = ();
			foreach my $svc( $self->{'config'}->list_services ){
				next unless( defined( $svc ) );
				next unless( ref( $svc ) );
				$svcs{$svc->service_description}++;
			}

			while( ! defined( $self->{'default_host_service'} ) ){
				my $testval = "__host_pinger_" . rand();
				if( ! defined( $svcs{$testval} ) ){
					$self->{'default_host_service'} = $testval;
				}
			}
		}

		# default_exectime - used for killing off processes.
		my $hosttime = $self->{'config'}->get( 'host_check_timeout' );
		my $svctime = $self->{'config'}->get( 'service_check_timeout' );
		$hosttime = 60 unless( defined( $hosttime ) );
		$hosttime = 60 unless( $hosttime =~ /^\s*\d+\s*$/ );
		$svctime = 60 unless( defined( $svctime ) );
		$svctime = 60 unless( $svctime =~ /^\s*\d+\s*$/ );
		my $usetime = $hosttime;
		if( $svctime > $usetime ){
			$usetime = $svctime;
		}
		$self->{'default_exectime'} = $usetime;

		# Working out service dependencies.
		$self->debug( "Processing Service Dependencies." );
		my $servicedeps = $self->{'config'}->list_servicedependencies;
		%{$self->{'depends'}} = ();
		foreach my $svcdep( @{$servicedeps} ){
			next unless( defined( $svcdep ) );
			my $dhost = $svcdep->dependent_host_name->host_name;
			my $dsvc = $svcdep->dependent_service_description->service_description;
			my $ahost = $svcdep->host_name->host_name;
			my $asvc = $svcdep->service_description->service_description;
			my $aexec = $svcdep->execution_failure_criteria;
			my $aipar = $svcdep->inherits_parents;
			$aipar = 0 unless( defined( $aipar ) );
			$aipar = 0 unless( $aipar =~ /^\d+$/ );
			$aexec = "" unless( defined( $aexec ) );
			
			push @{$self->{'depends'}{$dhost}{$dsvc}}, $ahost . ";" . $asvc . ";" . $aexec . ";" . $aipar;
		}

		$self->debug( "Processing Host Dependencies." );
		my $hostdeps = $self->{'config'}->list_hostdependencies;
		foreach my $hostdep( @{$hostdeps} ){
			next unless( defined( $hostdep ) );
			my $dhost = $hostdep->dependent_host_name->host_name;
			my $ahost = $hostdep->host_name->host_name;
			my $aexec = $hostdep->execution_failure_criteria;
			my $aipar = $hostdep->inherits_parents;
			$aipar = 0 unless( defined( $aipar ) );
			$aipar = 0 unless( $aipar =~ /^\d+$/ );
			$aexec = "" unless( defined( $aexec ) );
			push @{$self->{'depends'}{$dhost}{$self->{'default_host_service'}}}, $ahost . ";" . $self->{'default_host_service'} . ";" . $aexec . ";" . $aipar;
		}

		$self->debug( "Faking Host Parents as Host Dependencies." );
		foreach my $hostobj( $self->{'config'}->list_hosts ){
			next unless( defined( $hostobj ) );
			my $parents = ( $hostobj->parents );
			my $dhost = $hostobj->host_name;
			my $aexec = "d,u,p";
			my $aipar = 1;
			# Anything explicitly defined in a hostdependency
			# wins over this implicit stuff.
			my %prevs = ();
			foreach my $kval( @{$self->{'depends'}{$dhost}{$self->{'default_host_service'}}} ){
				my @scsplit = split( $kval );
				$prevs{$scsplit[0]}++;
			}
			foreach my $parentobj( @{$parents} ){
				next unless( defined( $parentobj ) );
				my $ahost = $parentobj->host_name;
				next if( defined( $prevs{$ahost} ) );
				push @{$self->{'depends'}{$dhost}{$self->{'default_host_service'}}}, $ahost . ";" . $self->{'default_host_service'} . ";" . $aexec . ";" . $aipar;
			}

			# The implicit dependency that service checks have
			# on their host check is performed in run_order.  There
			# is little point in putting them in here.
		}
	}

	alarm( 0 );
	return( $retval );
}

# Reading in of the status.dat file.  It is a simple structure
# consisting of an opening 'type {', a number of 'name=value' lines,
# and a closing '}', for each object.
sub read_status {
	my $self = shift;

	my $status_file = shift;

	# We like to be interrupted.  If the reading of the file takes
	# longer than that, something is wrong.
	local $SIG{ALRM} = sub { die "timeout in reading status file\n" }; # NB: \n required
	alarm( 120 );

	my $retval = 0;

	my %wantnames = (	"acknowledgement_type",			1,
				"active_checks_enabled",		1,
				"active_host_checks_enabled",		1,
				"active_service_checks_enabled",	1,
				"check_command",			1,
				"check_execution_time",			1,
				"check_host_freshness",			1,
				"check_latency",			1,
				"check_service_freshness",		1,
				"check_type",				1,
				"created",				1,
				"current_attempt",			1,
				"current_notification_number",		1,
				"current_state",			1,
				"daemon_mode",				1,
				"enable_event_handlers",		1,
				"enable_failure_prediction",		1,
				"enable_flap_detection",		1,
				"enable_notifications",			1,
				"event_handler",			1,
				"event_handler_enabled",		1,
				"failure_prediction_enabled",		1,
				"flap_detection_enabled",		1,
				"global_host_event_handler",		1,
				"global_service_event_handler",		1,
				"has_been_checked",			1,
				"host_name",				1,
				"is_flapping",				1,
				"last_check",				1,
				"last_command_check",			1,
				"last_hard_state",			1,
				"last_hard_state_change",		1,
				"last_log_rotation",			1,
				"last_notification",			1,
				"last_state_change",			1,
				"last_time_critical",			1,
				"last_time_down",			1,
				"last_time_ok",				1,
				"last_time_unknown",			1,
				"last_time_unreachable",		1,
				"last_time_up",				1,
				"last_time_warning",			1,
				"last_update",				1,
				"max_attempts",				1,
				"modified_attributes",			1,
				"modified_host_attributes",		1,
				"modified_service_attributes",		1,
				"nagios_pid",				1,
				"next_check",				1,
				"next_notification",			1,
				"no_more_notifications",		1,
				"notifications_enabled",		1,
				"obsess_over_host",			1,
				"obsess_over_hosts",			1,
				"obsess_over_service",			1,
				"obsess_over_services",			1,
				"passive_checks_enabled",		1,
				"passive_host_checks_enabled",		1,
				"passive_service_checks_enabled",	1,
				"percent_state_change",			1,
				"performance_data",			1,
				"plugin_output",			1,
				"problem_has_been_acknowledged",	1,
				"process_performance_data",		1,
				"program_start",			1,
				"scheduled_downtime_depth",		1,
				"service_description",			1,
				"should_be_scheduled",			1,
				"state_type",				1,
				"version",				1,
				);

	if( -f $status_file && -r $status_file ){
		if( open( STATUSFILE, $status_file ) ){
			$self->debug( "Reading Nagios status file" );
			# Record when this file was created.
			my @stats = stat( STATUSFILE );
			splice( @{$self->{'statustimes'}}, 0, 0, $stats[9] );

			%{$self->{'status'}} = ();
			my $curtype = undef;
			my $curhost = undef;
			my $cursvc = undef;
			while( my $line = <STATUSFILE> ){
				chomp( $line );
				if( ! defined( $curtype ) ){
					next unless( $line =~ /^(host|service)\s+\{\s*$/ );
					$curtype = $1;
					if( $curtype eq "host" ){
						$cursvc = $self->{'default_host_service'};
					}
					next;
				}else{
					if( $line =~ /^\s*\}\s*$/ ){
						$curtype = undef;
						$curhost = undef;
						$cursvc = undef;
					}elsif( $line =~ /^\s*(\S+)\=(\S+.*)\s*$/ ){
						my $varname = lc($1);
						my $value = $2;
						next unless( defined( $wantnames{"$varname"} ) );
						if( $varname eq "host_name" || $varname eq "service_description" ){
							$curhost = $value if( $varname eq "host_name" );
							$cursvc = $value if( $varname eq "service_description" );
							# Workaround.
							if( defined( $curhost ) && defined( $cursvc ) ){
								$self->{'status'}{$curhost}{$cursvc}{'host_name'} = $curhost;
								$self->{'status'}{$curhost}{$cursvc}{'service_description'} = $cursvc;
							}

						}elsif( defined( $curhost ) && defined( $cursvc ) ){
							# print "Storing $varname for $curhost and $cursvc\n";
							$self->{'status'}{$curhost}{$cursvc}{$varname} = $value;
							$retval++;
						}
					}
				}
			}
			close( STATUSFILE );
		}
	}

	alarm( 0 );
	return( $retval );
}

# This replaces some of the Nagios $MACROS$ with their equivilant.
# Not all of the macros are replaced, as they depend on information which
# this program simply does not have.
sub replace_vars {
	my $self = shift;
	my %args = (	Line => undef,
			Host => undef,
			Service => undef,
			Confvar => undef,
			@_,
			);

	my $retval = $args{"Line"};

	# Deal with a missing 'Service' argument.
	if( ! defined( $args{"Service"} ) ){
		$args{"Service"} = $self->{"default_host_service"};
	}

	# Variables that we know what to do with.
	my %allowvars = (
				"HOSTNAME",		"host_name",
				"HOSTALIAS",		"alias,host_name",
				"HOSTADDRESS",		"address,host_name",
				"HOSTSTATE",		"current_state:0=UP;1=DOWN;2=UNREACHABLE",
				"HOSTSTATEID",		"current_state",
				"HOSTSTATETYPE",	 "state_type:0=SOFT;1=HARD",
				"HOSTATTEMPT",		"current_attempt",
				"HOSTLATENCY",		"check_latency",
				"HOSTEXECUTIONTIME",	"check_execution_time",
				"HOSTDURATIONSEC",	"\$curtime - last_state_change",
				"HOSTDOWNTIME",		"scheduled_downtime_depth",
				"HOSTPERCENTCHANGE",	"percent_state_change",
				"LASTHOSTCHECK",	"last_check",
				"LASTHOSTSTATECHANGE",	"last_state_change",
				"LASTHOSTUP",		"last_time_up",
				"LASTHOSTDOWN",		"last_time_down",
				"LASTHOSTUNREACHABLE",	"last_time_unreachable",
				"HOSTOUTPUT",		"plugin_output",
				"HOSTPERFDATA",		"performance_data",
				"HOSTCHECKCOMMAND",	"check_command",
				"SERVICEDESC",		"service_description",
				"SERVICESTATE",		"current_state:0=UP;1=WARNING;2=CRITICAL;3=UNKNOWN",
				"SERVICESTATEID",	"current_state",
				"SERVICESTATETYPE",	"state_type:0=SOFT;1=HARD",
				"SERVICEATTEMPT",	"current_attempt",
				"SERVICELATENCY",	"check_latency",
				"SERVICEEXECUTIONTIME",	"check_execution_time",
				"SERVICEDURATIONSEC",	"\$curtime - last_state_change",
				"SERVICEDOWNTIME",	"scheduled_downtime_depth",
				"SERVICEPERCENTCHANGE",	"percent_state_change",
				"LASTSERVICECHECK",	"last_check",
				"LASTSERVICESTATECHANGE", "last_state_change",
				"LASTSERVICEOK",	"last_time_ok",
				"LASTSERVICEWARNING",	"last_time_warning",
				"LASTSERVICECRITICAL",	"last_time_critical",
				"LASTSERVICEUNKNOWN",	"last_time_unknown",
				"SERVICEOUTPUT",	"plugin_output",
				"SERVICEPERFDATA",	"performance_data",
				"SERVICECHECKCOMMAND",	"check_command",
				"TIMET",		"\$curtime",
				"MAINCONFIGFILE",	"\$configfile",
				"STATUSDATAFILE",	"\$conf( status_file )",
				"COMMENTDATAFILE",	"\$conf( comment_file )",
				"DOWNTIMEDATAFILE",	"\$conf( downtime_file )",
				"RETENTIONDATAFILE",	"\$conf( retention_file )",
				"OBJECTCACHEFILE",	"\$conf( object_cache_file )",
				"TEMPFILE",		"\$conf( temp_file )",
				"LOGFILE",		"\$conf( log_file )",
				"RESOURCEFILE",		"\$conf( resource_file )",
				"COMMANDFILE",		"\$conf( command_file )",
				"HOSTPERFDATAFILE",	"\$conf( host_perfdata_file )",
				"SERVICEPERFDATAFILE",	"\$conf( service_perfdata_file )",
			);



	if( defined( $retval ) ){
		# $self->debug( "replace - got $retval X\n" );
		my @dolsp = split( /\$/, $retval );
		my $dollen = ( scalar @dolsp ) + 10;
		my $prevcount = 0;
		while( $retval =~ /^(.{$prevcount,}?)\$([^\$]+)\$(.*)$/ ){
			my $start = $1;
			my $var = $2;
			my $end = $3;

			$prevcount = length( $start );

			# $self->debug( "Matched S $start ($prevcount) V $var and E $end\n" );

			# See if there is a recognised variable in there.
			# Get the on-demand stuff out of the way.
			if( $var =~ /^([^:]+):([^:]+)(:[^:]+)$/ ){
				my $chkvar = $1;
				my $rehost = $2;
				my $resvc = $3;
				if( defined( $resvc ) ){
					$resvc =~ s/^://;
				}
				my $rtext = $self->replace_vars( Line => '$' . $chkvar . '$', Host => $rehost, Service => $resvc );
				$prevcount += length( $rtext );
				$retval = $start . $rtext . $end;
			}elsif( defined( $allowvars{"$var"} ) ){
				my $newtext = $allowvars{"$var"};

				# $self->debug( "Got newtext - $newtext\n" );
				
				# Deal with specials first.
				if( $newtext =~ /^\$(\S+)/ ){
					my $spec = $1;
					if( $spec eq 'conf(' || $spec eq 'conf' ){
						# Look up something from the
						# configuration.  This is the
						# only text.
						if( $newtext =~ /^\$\S+\s*\(\s+(\S+)\s*\)\s*$/ ){
							my $lookval = $1;
							$newtext = $self->{'config'}->get( $lookval );
							$retval = $start . $newtext . $end;
							$prevcount += length( $newtext );
							next;
						}
					}elsif( $spec eq 'curtime' ){
						my $curtime = time;
						$newtext =~ s/^\$\S+/$curtime/;
						if( $newtext =~ /^\d+\s+\-\s+(\S+)\s*$/ ){
							my $chkvar = $1;
							my $newvar = $self->replace_vars( Statusvar => $chkvar, Host => $args{"Host"}, Service => $args{"Service"} ) || $self->replace_vars( Configvar => $chkvar, Host => $args{"Host"}, Service => $args{"Service"} );
							if( defined( $newvar ) ){
								my $newvar = $self->{'status'}{$args{"Host"}}{$args{"Service"}}{$chkvar};
								if( $newvar =~ /^\s*\d+\s*$/ ){
									$newtext = $curtime - $newvar;
								}else{
									$newtext = "$curtime - $newvar";
								}
							}
						}
						$retval = $start . $newtext . $end;
						$prevcount += length( $newtext );
						next;
					}elsif( $spec eq 'configfile' ){
						$newtext = $self->{'config_file'};
						$retval = $start . $newtext . $end;
						$prevcount += length( $newtext );
						next;
					}
				}elsif( $newtext =~ /^([^:]+):(\S+)$/ ){
					my $chkvar = $1;
					my $remain = $2;
					my $newvar = $self->replace_vars( Statusvar => $chkvar, Host => $args{"Host"}, Service => $args{"Service"} ) || $self->replace_vars( Configvar => $chkvar, Host => $args{"Host"}, Service => $args{"Service"} );
					if( defined( $newvar ) ){
						my @vsp = split( /;/, $remain );
						my $donechange = 0;
						foreach my $chk( @vsp ){
							if( $chk =~ /^$newvar=(\S+)$/ ){
								$newtext = $1;
								$donechange++;
							}
						}
						$newtext = $var unless( $donechange );
					}else{
						$newtext = $var;
					}
					$retval = $start . $newtext . $end;
					$prevcount += length( $newtext );
					next;
				}elsif( $newtext =~ /,/ ){
					my @newsp = split( /,/, $newtext );
					my $foundit = 0;
					foreach my $newv ( @newsp ){
						next if( $foundit > 0 );
						# $self->debug( "Checking $newv for " . $args{"Host"} . " and " . $args{"Service"} . " X\n" );
						my $stchk = $self->replace_vars( Statusvar => $newv, Host => $args{"Host"}, Service => $args{"Service"} ) || $self->replace_vars( Configvar => $newv, Host => $args{"Host"}, Service => $args{"Service"} );
						if( defined( $stchk ) ){
							# $self->debug( "Found $newv\n" );

							$newtext = $stchk;
							$foundit++;
						}
					}
					if( $foundit ){
						$retval = $start . $newtext . $end;
						$prevcount += length( $newtext );
						next;
					}else{
						$retval = $start . '$' . $var . '$' . $end;
						$prevcount += length( $var ) + 2;
						next;
					}

				}elsif( defined( $self->{'status'}{$args{"Host"}}{$args{"Service"}}{$newtext} ) ){
					$newtext = $self->{'status'}{$args{"Host"}}{$args{"Service"}}{$newtext};
					$retval = $start . $newtext . $end;
					$prevcount += length( $newtext );
					next;
				}else{
					# $self->debug( "No match for newtext\n" );
				}
			}else{
				$retval = $start . '$' . $var . '$' . $end;
				$prevcount += length( $var ) + 2;
				next;
			}
		}
	}elsif( defined( $args{'Configvar'} ) ){
		# Scan through the config for the matching host or service.
		$args{"Service"} = $self->{'default_host_service'} unless( defined( $args{"Service"} ) );

		# Find the host object.
		my $thishostobj = $self->{'config'}->find_object( $args{"Host"}, "Nagios::Host" );
		if( $args{"Service"} eq $self->{'default_host_service'} && defined( $thishostobj ) ){
			# $self->debug( "checking host object" );
			# $retval = $thishostobj->$args{'Configvar'} if( defined( $thishostobj->{$args{'Configvar'}} ) );
			if( defined( $thishostobj->{$args{'Configvar'}} ) ){
				# my $evalstr = "\$retval = \$thishostobj->$args{'Configvar'}";
				my $evalstr = '$' . "retval = " . '$' . "thishostobj->" . $args{'Configvar'};
				# # $self->debug( "host object has var - evaling $evalstr" );
				eval $evalstr;
				# $self->debug( $thishostobj->address . " X $retval " );
			}
		}elsif( defined( $thishostobj ) ){
			# $self->debug( "checking service objects" );
			foreach my $svc( $thishostobj->list_services ){
				next unless( defined( $svc ) );
				next unless( $svc->service_description eq $args{"Service"} );
				# $retval = $svc->$args{'Configvar'} if( defined( $svc->{$args{'Configvar'}} ) );
				if( defined( $svc->{$args{'Configvar'}} ) ){
					my $evalstr = '$' . "retval = " . '$' . "svc->" . $args{'Configvar'};
					eval $evalstr;
				}
			}

			if( defined( $thishostobj->{$args{'Configvar'}} ) && ! defined( $retval ) ){
				# $self->debug( "checking host object" );
				my $evalstr = '$' . "retval = " . '$' . "thishostobj->" . $args{'Configvar'};
				eval $evalstr;
			}
		}else{
			# $self->debug( "No host object found matching " . $args{"Host"} );
		}

	}elsif( defined( $args{'Statusvar'} ) ){
		$retval = $self->{'status'}{$args{"Host"}}{$args{"Service"}}{$args{'Statusvar'}};
	}

	# $self->debug( "Replace - returning $retval\n";
	return( $retval );

}

# This tells nagios to execute the command itself.
sub tell_nagios_to_execute {
	my $self = shift;
	my %args = (	Host => undef,
			Service => undef,
			Time => undef,
			@_,
			);

	my $retval = 0;

	# Work out what to say.
	my $submittype = "SVC";
	if( ! defined( $args{"Service"} ) ){
		$submittype = "HOST";
	}elsif( $args{"Service"} eq $self->{'default_host_service'} ){
		$submittype = "HOST";
	}

	if( defined( $args{"Host"} ) ){
		if( ! defined( $args{"Time"} ) ){
			$args{"Time"} = time;
		}elsif( $args{"Time"} !~ /^\d+$/ ){
			$args{"Time"} = time;
		}
		my $substr = "[" . time . "] SCHEDULE_FORCED_" . $submittype . "_CHECK;" . $args{"Host"} . ";";
		if( $submittype eq "SVC" ){
			$substr .= $args{"Service"} . ";";
		}
		$substr .= $args{"Time"};

		$retval = $self->notify_nagios( $substr );
	}
	return( $retval );
}

# This tells Nagios what the output of the command is.
sub tell_nagios_output {
	my $self = shift;
	my %args = (	Host => undef,
			Service => undef,
			Output => undef,
			Exit => 2,
			@_,
			);

	my $retval = 0;

	# Work out what to say.
	my $submittype = "SERVICE";
	if( ! defined( $args{"Service"} ) ){
		$submittype = "HOST";
	}elsif( $args{"Service"} eq $self->{'default_host_service'} ){
		$submittype = "HOST";
	}

	if( defined( $args{"Host"} ) ){
		if( ! defined( $args{"Time"} ) ){
			$args{"Time"} = time;
		}elsif( $args{"Time"} !~ /^\d+$/ ){
			$args{"Time"} = time;
		}

		my $substr = "[" . time . "] PROCESS_" . $submittype . "_CHECK_RESULT;" . $args{"Host"} . ";" ;
		
		if( $submittype eq "SERVICE" ){
			$substr .= $args{"Service"} . ";";
		}

		$substr .= $args{"Exit"} . ";" . $args{"Output"};
		chomp( $substr );

		$retval = $self->notify_nagios( $substr );
	}
	return( $retval );
}

# Submit a line to Nagios.  This function is always called with an
# alarm handler active in the (grand)parent routine.
sub notify_nagios {
	my $self = shift;
	my $line = shift;

	my $command_file = $self->{'config'}->get( 'command_file' );

	my $retval = 0;

	if( defined( $command_file ) ){
		if( -p $command_file && -w $command_file ){
			if( open( CMDFILE, ">> $command_file" ) ){
				$retval++;
				$self->debug( "Submitting to $command_file $line X\n" );
				print CMDFILE $line . "\n";
				close( CMDFILE );
			}
		}
	}

	# Record the success or otherwise, used to determine whether to
	# exit or not later.
	$self->{'last_commandfile_write'} = $retval;

	return( $retval );
}

# Display a debug message.
sub debug {
	my $self = shift;


	return unless( defined( $self->{'_debug'} ) );

	my $arg = shift;

	chomp( $arg );

	my @calledwith = caller(1);
	my $callingname = $calledwith[3];
	my $callingpkg = $calledwith[0];
	my $lineno = $calledwith[2];
	my $selfref = ref( $self );
	if( $selfref eq $callingpkg ){
		$callingname =~ s/^$callingpkg\:\://g;
	}else{
		$callingname =~ s/^.*://g;
	}

	print STDERR "DEBUG: $lineno: $self" . "->" . "$callingname: $arg\n";
}

=head1 COPYRIGHT

Copyright (c) 2006 Bruce Campbell.  All rights reserved.  
This program is free software; you can redistribute it
and/or modify it under the same terms as Perl itself.

=head1 AUTHOR

Bruce Campbell, 2006, Zerlargal.  http://cpan.zerlargal.org/ .

Bug Reports/Feature Requests for this should be made on the nagios-devel 
mailing list.

=cut

1;
-------------- next part --------------

*** lib/Nagios/Object/Config.pm	2006/05/18 07:47:59	1.1
--- lib/Nagios/Object/Config.pm	2006/05/18 09:57:18
***************
*** 454,460 ****
  Returns an array/arrayref of objects of the given type.
  
   ->list_hosts;
!  ->list_hostroups;
   ->list_services;
   ->list_timeperiods;
   ->list_commands;
--- 454,461 ----
  Returns an array/arrayref of objects of the given type.
  
   ->list_hosts;
!  ->list_hostgroups;
!  ->list_servicegroups;
   ->list_services;
   ->list_timeperiods;
   ->list_commands;
***************
*** 478,483 ****
--- 479,485 ----
  
  sub list_hosts                { shift->_list('host') }
  sub list_hostgroups           { shift->_list('hostgroup') }
+ sub list_servicegroups        { shift->_list('servicegroup') }
  sub list_services             { shift->_list('service') }
  sub list_timeperiods          { shift->_list('timeperiod') }
  sub list_commands             { shift->_list('command') }
***************
*** 521,526 ****
--- 523,610 ----
      return @retval;
  }
  
+ # List the hostgroups that this host entry belongs to.
+ sub Nagios::Host::list_hostgroups {
+ 	my $self = shift;
+ 	my $conf = $self->{object_config_object};
+ 	my @retval = ();
+ 	my %uhgs = ();
+ 
+ 	# Bug Workaround.  A host can be a member of a hostgroup in two
+ 	# ways: By being mentioned in a 'members' line in a 'hostgroup' object,
+ 	# or by mentioning a defined 'hostgroup' object in the 'hostgroups'
+ 	# line in the 'host' object.  Since Nagios::Config doesn't join the
+ 	# two together, we have to look in two places.
+ 	foreach my $hg ( $conf->list_hostgroups ) {
+                 foreach my $h ( @{$hg->members} ) {
+ 			if ( $h->host_name eq $self->host_name ) {
+ 				if( ! defined( $uhgs{$hg->hostgroup_name} ) ){
+ 					push @retval, $hg;
+ 					$uhgs{$hg->hostgroup_name}++;
+ 				}
+ 			}
+ 		}
+ 	}
+ 
+ 	if( defined( $self->{'hostgroups'} ) ){
+ 		foreach my $hg ( @{$self->hostgroups} ){
+ 			if( ! defined( $uhgs{$hg->hostgroup_name} ) ){
+ 				push @retval, $hg;
+ 				$uhgs{$hg->hostgroup_name}++;
+ 			}
+ 		}
+ 	}
+ 		
+ 	return @retval;
+ }
+ 
+ sub Nagios::Service::list_servicegroups {
+ 	my $self = shift;
+ 	my $conf = $self->{object_config_object};
+ 
+ 	my @retval = ();
+ 	my %usgs = ();
+ 
+ 	# foreach my $kkey( keys %{$self} ){
+ 		# print "$self has key $kkey - " . $self->{$kkey} . " X\n";
+ 	# }
+ 
+ 	# See comments in list_hostgroups above.
+ 	if( defined( $self->{'servicegroups'} ) ){
+ 		foreach my $sg ( @{$self->servicegroups} ){
+ 			push @retval, $sg;
+ 			$usgs{$sg->servicegroup_name}++;
+ 		}
+ 	}
+ 
+ 	# foreach my $sg ( $conf->list_servicegroups ) {
+ 		# my $donepush = 0;
+ 		# # print "SG $sg - " . $sg->servicegroup_name . "\n";
+ 		# next if( defined( $usgs{$sg->servicegroup_name} ) );
+ # 
+ 		# # Problem with the 'members' routine, depending on 
+ 		# # context.  If called as '@{$sg->members}', the program
+ 		# # dies.  If called as '$sg->members', it returns a
+ 		# # single string of the original listing.
+ 		# my @memlist = split( /,/, $sg->members );
+ 		# my $memmax = scalar @memlist;
+ 		# my $counter = 0;
+ 		# while( $counter < ( $memmax - 1 ) ){
+ 			# if( $memlist[$counter] eq $self->host_name ){
+ 				# if( $memlist[$counter+1] eq $self->service_description ){
+ 					# if( ! defined( $usgs{$sg->servicegroup_name} ) ){
+ 						# push @retval, $sg;
+ 						# $usgs{$sg->servicegroup_name}++;
+ 					# }
+ 				# }
+ 			# }
+ 			# $counter += 2;
+ 		# }
+ 	# }
+ 	return @retval;
+ }
+ 	
+ 
  # I use a patched version of Nagios right now, so I need these to
  # keep the parser from bombing when I test on my config. (Al Tobey)
  sub Nagios::Host::snmp_community { }
*** lib/Nagios/Object.pm	2006/05/17 11:26:43	1.1
--- lib/Nagios/Object.pm	2006/05/18 14:16:30
***************
*** 61,67 ****
          service_description           => ['STRING',                  10 ],
          host_name                     => [['Nagios::Host'],          10 ],
          hostgroup_name                => [['Nagios::HostGroup'],     10 ],
!         servicegroup_name             => [['Nagios::ServiceGroup'],  16 ],
          is_volatile                   => ['BINARY',                  8  ],
          check_command                 => ['Nagios::Command',         8  ],
          max_check_attempts            => ['INTEGER',                 8  ],
--- 61,67 ----
          service_description           => ['STRING',                  10 ],
          host_name                     => [['Nagios::Host'],          10 ],
          hostgroup_name                => [['Nagios::HostGroup'],     10 ],
!         servicegroups                 => [['Nagios::ServiceGroup'],  16 ],
          is_volatile                   => ['BINARY',                  8  ],
          check_command                 => ['Nagios::Command',         8  ],
          max_check_attempts            => ['INTEGER',                 8  ],
***************
*** 97,104 ****
          use                           => ['Nagios::ServiceGroup',    18 ],
          servicegroup_name             => ['STRING',                  18 ],
          alias                         => ['STRING',                  16 ],
!         members                       => [['Nagios::Host',
!                                            'Nagios::Service'],       16 ],
          name                          => ['servicegroup_name',       22 ],
          comment                       => ['comment',                 22 ],
          file                          => ['filename',                22 ]
--- 97,103 ----
          use                           => ['Nagios::ServiceGroup',    18 ],
          servicegroup_name             => ['STRING',                  18 ],
          alias                         => ['STRING',                  16 ],
!         members                       => [['Nagios::Host', 'Nagios::Service'],       16 ],
          name                          => ['servicegroup_name',       22 ],
          comment                       => ['comment',                 22 ],
          file                          => ['filename',                22 ]
***************
*** 109,117 ****
--- 108,124 ----
  	    alias                         => ['STRING',                  8  ],
  	    address                       => ['STRING',                  8  ],
  	    parents                       => [['Nagios::Host'],          8  ],
+ 	    hostgroups			  => [['Nagios::HostGroup'],	16  ],
  	    check_command                 => ['STRING',                  8  ],
  	    max_check_attempts            => ['INTEGER',                 8  ],
+ 	    check_interval                => ['BINARY',                  8  ],
  	    checks_enabled                => ['BINARY',                  8  ],
+ 	    active_checks_enabled         => ['BINARY',                 16  ],
+ 	    passive_checks_enabled        => ['BINARY',                 16  ],
+ 	    notification_period           => [['Nagios::TimePeriod'],   16  ],
+             obsess_over_host              => ['BINARY',                 16  ],
+             check_freshness               => ['BINARY',                 16  ],
+             freshness_threshold           => ['INTEGER',                16  ],
  	    event_handler                 => ['STRING',                  8  ],
  	    event_handler_enabled         => ['BINARY',                  8  ],
  	    low_flap_threshold            => ['INTEGER',                 8  ],
***************
*** 126,132 ****
  	    notifications_enabled         => ['BINARY',                  8  ],
  	    stalking_options              => [[qw(o d u)],               8  ],
  	    contact_groups                => [['Nagios::ContactGroup'],  16 ],
!         failure_prediction_enabled    => ['BINARY',                  16 ],
  	    name                          => ['host_name',               6  ],
  	    comment                       => ['comment',                 6  ],
  	    file                          => ['filename',                6  ]
--- 133,139 ----
  	    notifications_enabled         => ['BINARY',                  8  ],
  	    stalking_options              => [[qw(o d u)],               8  ],
  	    contact_groups                => [['Nagios::ContactGroup'],  16 ],
!             failure_prediction_enabled    => ['BINARY',                  16 ],
  	    name                          => ['host_name',               6  ],
  	    comment                       => ['comment',                 6  ],
  	    file                          => ['filename',                6  ]
***************
*** 146,158 ****
          contact_name                  => ['STRING',                  8  ],
          alias                         => ['STRING',                  8  ],
          host_notification_period      => ['Nagios::TimePeriod',      8  ],
! 		service_notification_period   => ['Nagios::TimePeriod',      8  ],
          host_notification_options     => [[qw(d u r n)],             8  ],
! 		service_notification_options  => [[qw(w u c r n)],           8  ],
! 		host_notification_commands    => [['Nagios::Command'],       8  ],
! 		service_notification_commands => [['Nagios::Command'],       8  ],
! 		email                         => ['STRING',                  8  ],
! 		pager                         => ['STRING',                  8  ],
          address1                      => ['STRING',                  16 ],
          address2                      => ['STRING',                  16 ],
          address3                      => ['STRING',                  16 ],
--- 153,165 ----
          contact_name                  => ['STRING',                  8  ],
          alias                         => ['STRING',                  8  ],
          host_notification_period      => ['Nagios::TimePeriod',      8  ],
! 	service_notification_period   => ['Nagios::TimePeriod',      8  ],
          host_notification_options     => [[qw(d u r n)],             8  ],
! 	service_notification_options  => [[qw(w u c r n)],           8  ],
! 	host_notification_commands    => [['Nagios::Command'],       8  ],
! 	service_notification_commands => [['Nagios::Command'],       8  ],
! 	email                         => ['STRING',                  8  ],
! 	pager                         => ['STRING',                  8  ],
          address1                      => ['STRING',                  16 ],
          address2                      => ['STRING',                  16 ],
          address3                      => ['STRING',                  16 ],
***************
*** 183,189 ****
      },
      TimePeriod => {
          use                           => ['Nagios::TimePeriod',      8  ],
! 		timeperiod_name               => ['STRING',                  8  ],
          alias                         => ['STRING',                  8  ],
          sunday                        => ['TIMERANGE',               8  ],
          monday                        => ['TIMERANGE',               8  ],
--- 190,196 ----
      },
      TimePeriod => {
          use                           => ['Nagios::TimePeriod',      8  ],
! 	timeperiod_name               => ['STRING',                  8  ],
          alias                         => ['STRING',                  8  ],
          sunday                        => ['TIMERANGE',               8  ],
          monday                        => ['TIMERANGE',               8  ],
***************
*** 197,223 ****
          file                          => ['filename',                6  ]
      }, 
      ServiceEscalation => {
! 	    use                           => ['Nagios::ServiceEscalation',8 ],
! 		host_name                     => ['Nagios::Host',            8  ],
          hostgroup_name                => ['Nagios::HostGroup',       8  ],
! 		service_description           => ['Nagios::Service',         8  ],
          contact_groups                => [['Nagios::ContactGroup'],  8  ],
          first_notification            => ['INTEGER',                 8  ],
          last_notification             => ['INTEGER',                 8  ],
          notification_interval         => ['INTEGER',                 8  ],
          name                          => [['host_name',
                                             'service_description'],   6  ],
          comment                       => ['comment',                 6  ],
          file                          => ['filename',                6  ]
      },
      ServiceDependency => {
! 	    use                           => ['Nagios::ServiceDependency',8 ],
          dependent_host_name           => ['Nagios::Host',            8  ],
          dependent_service_description => ['Nagios::Service',         8  ],
! 		host_name                     => ['Nagios::Host',            8  ],
! 		service_description           => ['Nagios::Service',         8  ],
! 		execution_failure_criteria    => [[qw(o w u c n)],           8  ],
! 		notification_failure_criteria => [[qw(o w u c n)],           8  ],
          name                          => [[qw(dependent_host_name
                                                dependent_service_description
                                                host_name
--- 204,233 ----
          file                          => ['filename',                6  ]
      }, 
      ServiceEscalation => {
!         use                           => ['Nagios::ServiceEscalation',8 ],
! 	host_name                     => ['Nagios::Host',            8  ],
          hostgroup_name                => ['Nagios::HostGroup',       8  ],
! 	service_description           => ['Nagios::Service',         8  ],
          contact_groups                => [['Nagios::ContactGroup'],  8  ],
          first_notification            => ['INTEGER',                 8  ],
          last_notification             => ['INTEGER',                 8  ],
          notification_interval         => ['INTEGER',                 8  ],
+         escalation_period             => ['Nagios::TimePeriod',     16  ],
+ 	escalation_options            => [[qw(w u c r)],            16  ],
          name                          => [['host_name',
                                             'service_description'],   6  ],
          comment                       => ['comment',                 6  ],
          file                          => ['filename',                6  ]
      },
      ServiceDependency => {
! 	use                           => ['Nagios::ServiceDependency',8 ],
          dependent_host_name           => ['Nagios::Host',            8  ],
          dependent_service_description => ['Nagios::Service',         8  ],
! 	host_name                     => ['Nagios::Host',            8  ],
! 	service_description           => ['Nagios::Service',         8  ],
!         inherits_parents              => ['BINARY',                 16  ],
! 	execution_failure_criteria    => [[qw(o w u c n)],           8  ],
! 	notification_failure_criteria => [[qw(o w u c n)],           8  ],
          name                          => [[qw(dependent_host_name
                                                dependent_service_description
                                                host_name
***************
*** 233,249 ****
          first_notification            => ['INTEGER',                 8  ],
          last_notification             => ['INTEGER',                 8  ],
          notification_interval         => ['INTEGER',                 8  ],
          name                          => ['host_name',               6  ],
          comment                       => ['comment',                 6  ],
          file                          => ['filename',                6  ]
      },
      HostDependency => {
! 	    use                           => ['Nagios::HostDependency',  8  ],
          dependent_host_name           => ['Nagios::Host',            8  ],
! 		host_name                     => ['Nagios::Host',            8  ],
          inherits_parent               => ['INTEGER',                 16 ],
! 		notification_failure_criteria => [[qw(o w u c n)],           8  ],
! 		execution_failure_criteria    => [[qw(o w u c n)],           16 ],
          name                          => [['host_name',
                                             'dependent_host_name'],   6  ],
          comment                       => ['comment',                 6  ],
--- 243,261 ----
          first_notification            => ['INTEGER',                 8  ],
          last_notification             => ['INTEGER',                 8  ],
          notification_interval         => ['INTEGER',                 8  ],
+         escalation_period             => ['Nagios::TimePeriod',     16  ],
+ 	escalation_options            => [[qw(d u r)],              16  ],
          name                          => ['host_name',               6  ],
          comment                       => ['comment',                 6  ],
          file                          => ['filename',                6  ]
      },
      HostDependency => {
! 	use                           => ['Nagios::HostDependency',  8  ],
          dependent_host_name           => ['Nagios::Host',            8  ],
! 	host_name                     => ['Nagios::Host',            8  ],
          inherits_parent               => ['INTEGER',                 16 ],
! 	notification_failure_criteria => [[qw(o d u p n)],           8  ],
! 	execution_failure_criteria    => [[qw(o d u p n)],           16 ],
          name                          => [['host_name',
                                             'dependent_host_name'],   6  ],
          comment                       => ['comment',                 6  ],
***************
*** 642,647 ****
--- 654,662 ----
          elsif ( $_[1] eq 'name' ) {
              return $type;
          }
+ 	elsif( $_[1] eq 'members' ) {
+ 	    return $type;
+ 	}
          else {
              croak "bug tobeya\@cpan.org to fix this ...";
          }


More information about the Developers mailing list