BUG/PATCH/WORKAROUND: Problem with Nagios state retention

bruce nagios-devel at vicious.dropbear.id.au
Fri Apr 7 13:54:13 CEST 2006


As many people have noticed over time, there is a persistent problem with 
Nagios not writing out the status retention file in some installations.

The problem is not with the state retention code as such, but in how 
Nagios carefully tries to write out the file to a temporary file (good) 
first, but uses a compiled-in temporary file over the configured 
'temp_file' variable for the state_retention file (bad).

To determine if this problem affects your installation, see whether the 
user running Nagios has write permission to the compiled-in 'tempfile' 
location, eg:

 	$ strings -a `which nagios` | grep tempfile
 	/var/log/nagios/tempfile
 	$ if [ ! -w /var/log/nagios ] ; then echo "I cannot write." ; fi
 	I cannot write.

If, for whatever reason, you cannot give the user running Nagios 
permission to write the compiled-in 'tempfile' file (usually political, or 
possibly avoiding odd issues with multiple Nagios installations on one 
host), a viable workaround is to set up a Nagios service which copies the 
status file to the state retention file, as they have a compatible format:

 	define service {
 		host_name		localhost
 		service_description	state-retention
 		check_command 		copy_status_to_retention
 		normal_check_interval	3
 		max_check_attempts	3
 		retry_check_interval	3
 		check_period		24x7
 		check_freshness		0
 		obsess_over_service	0
 		passive_checks_enabled	0
 		notification_interval	120
 		notification_period	24x7
 		notification_options	n
 		contact_groups		default
 	}

 	define command {
 		command_name		copy_status_to_retention
 		command_line		/bin/cp $STATUSDATAFILE$ $RETENTIONDATAFILE$ && exit 0 || exit 2
 	}

The attached patch (against both 2.0b4 and latest 2.1) ensures that the 
state retention code uses the file pointed to by the configuration's 
'temp_file' variable instead of the compiled 'tempfile'.

--==--
Bruce.

And now, back to tracking down excessive latency problems.
-------------- next part --------------
*** xdata/xrddefault.c	2006/04/07 10:26:00	1.1
--- xdata/xrddefault.c	2006/04/07 10:33:03
***************
*** 118,123 ****
--- 118,133 ----
  		if(temp_ptr==NULL)
  			continue;
  
+ 		/* temp file definition */
+ 		if( ! strcmp(temp_ptr,"temp_file") ){
+ 			temp_ptr=my_strtok(NULL,"\n");
+ 			if(temp_ptr==NULL)
+ 				continue;
+ 
+ 			strncpy(xrddefault_temp_file,temp_ptr,sizeof(xrddefault_temp_file)-1);
+ 			xrddefault_temp_file[sizeof(xrddefault_temp_file)-1]='\x0';
+                 }
+ 
  		/* skip lines that don't specify the host config file location */
  		if(strcmp(temp_ptr,"xrddefault_retention_file") && strcmp(temp_ptr,"state_retention_file"))
  			continue;


More information about the Developers mailing list