Problems with extensive passive monitoring

Mike Becher Mike.Becher at lrz-muenchen.de
Mon Oct 9 14:46:08 CEST 2006


Hi all,

in our environment we got a problem with extensive passive monitoring 
feature of nagios. 

Description in short:
---------------------

In our environment we got more than 250 clients where each of them runs 
its own nagios server to monitor itself. Each client runs up to 8 service 
checks and posts these results as external command via use of 
send_nsca/nsca to a master nagios server, I call it cluster master nagios 
server or short CMNS.

This CMNS is also a client of one site master nagios server (or short 
SMNS). CMNS must forward its messages to SMNS as external commands like 
the clients did to it.

With build-in feature of nagios (we use version 2.5) you can use 
send_nsca/nsca to forward messages from CMNS to SMNS too but this results 
in:
    * heavy load on CMNS due to fork of at least one external command 
      send_nsca to forward one message (in our environment up to 1000 
      forks per minute) to SMNS.
    * up to 1000 nsca per minute to deliver external command messages from 
      clients to CMNS
    * loosing of incomming messages from clients on CMNS because it reads 
      from external command pipe only 30 seconds .. then it makes a pause.
    * child processes of CMNS become childs of `init' and all of them 
      write further into the pipe over which they are connected with the 
      nagios master process.
    * thereby they eat a lot of memory so a machine with 512MB RAM and 2GB 
      swap must be booted after 2 days otherwise it hangs 

The whole description can be read on:
  http://www.mountcup.de/tiki/tiki-index.php?page=mibe-nagios-passive-monitoring

My solution
-----------
Instead of calling an external program (ocsp_command or ochp_command) for 
each external command message to forward it from CMNS to SMNS let write 
the nagios process these messages in a named pipe. The patch attached 
gives you this functionallity for nagios version 2.5.

Then let a helper program read from this named pipe on CMNS site and let 
it forward the messages through a (I call it here) channel to whatever you 
want, in this case to SMNS. I have written a perl program that does this 
for you which is added as attachment too.

What do you thing about the option to use namend pipes in addition to 
ocsp_command and/or ochp_command running as external process?
The NDO interface can't be used in this case because there aren't any 
connectors inside the code for external commands.

best regards
  Mike

-----------------------------------------------------------------------------
 Mike Becher                              Mike.Becher at lrz-muenchen.de
 Leibniz-Rechenzentrum der                http://www.lrz.de
 Bayerischen Akademie der Wissenschaften  phone: +49-89-35831-8721
 Gruppe Hochleistungssysteme              fax:   +49-89-35831-9700
 Boltzmannstrasse 1
 D-85748 Garching bei Muenchen
 Germany
-----------------------------------------------------------------------------
-------------- next part --------------
diff -u -r -N nagios-2.5/base/config.c nagios-mibe-2.5/base/config.c
--- nagios-2.5/base/config.c	2005-12-27 00:18:14.000000000 +0100
+++ nagios-mibe-2.5/base/config.c	2006-09-26 07:39:56.000000000 +0200
@@ -2770,6 +2770,14 @@
 			write_to_logs_and_console(temp_buffer,NSLOG_VERIFICATION_ERROR,TRUE);
 			errors++;
 		        }
+    else {
+	    if(verify_config==TRUE){
+	      char raw_command_line[MAX_COMMAND_BUFFER];
+		    printf(" ocsp_command is set to \"%s\"\n", temp_command->name);
+	      get_raw_command_line(ocsp_command,raw_command_line,sizeof(raw_command_line),0);
+		    printf("         and uses macro \"%s\"\n", raw_command_line);
+      }
+    }
 	        }
 	if(ochp_command!=NULL){
 
@@ -2786,6 +2794,14 @@
 			write_to_logs_and_console(temp_buffer,NSLOG_VERIFICATION_ERROR,TRUE);
 			errors++;
 		        }
+    else {
+	    if(verify_config==TRUE){
+	      char raw_command_line[MAX_COMMAND_BUFFER];
+		    printf(" ochp_command is set to \"%s\"\n", temp_command->name);
+	      get_raw_command_line(ochp_command,raw_command_line,sizeof(raw_command_line),0);
+		    printf("         and uses macro \"%s\"\n", raw_command_line);
+      }
+    }
 	        }
 
 #ifdef DEBUG1
diff -u -r -N nagios-2.5/base/sehandlers.c nagios-mibe-2.5/base/sehandlers.c
--- nagios-2.5/base/sehandlers.c	2005-12-23 20:31:36.000000000 +0100
+++ nagios-mibe-2.5/base/sehandlers.c	2006-09-26 08:15:44.000000000 +0200
@@ -53,6 +53,45 @@
 extern time_t          program_start;
 
 
+static int my_npipe_fprintf(const char *pipe_name, const char *string_wo_newline){
+  struct stat st;
+  int nfd=-1;
+  FILE *npipe=NULL;
+
+  if(pipe_name == NULL) 
+		return ERROR;
+  if(string_wo_newline == NULL)
+		return ERROR;
+
+
+  /* use existing FIFO if possible */
+  if((stat(pipe_name, &st) < 0) ||
+     ((st.st_mode & S_IFIFO) != S_IFIFO)){
+    /* create the external command file as a named pipe (FIFO) */
+    if(mkfifo(pipe_name, S_IRUSR | S_IWUSR | S_IRGRP | S_IWGRP)!=0){
+       return ERROR;
+    }
+  }
+
+  /* open the command file for writing (non-blocked) - O_TRUNC flag cannot be
+   * used due to errors on some systems */
+  nfd = open(pipe_name, O_WRONLY|O_NONBLOCK, S_IWUSR|S_IWGRP);
+  if(nfd < 0){
+    return ERROR;
+  }
+  npipe = fdopen(nfd, "w");
+  if (npipe == NULL) {
+    close(nfd);
+    return ERROR;
+  }
+
+  /* write our data */
+  fprintf(npipe, "%s\n", string_wo_newline);
+
+  /* and close command pipe */
+  fclose(npipe);
+  return OK;
+}
 
 /******************************************************************/
 /************* OBSESSIVE COMPULSIVE HANDLER FUNCTIONS *************/
@@ -74,14 +113,17 @@
 #endif
 
 	/* bail out if we shouldn't be obsessing */
-	if(obsess_over_services==FALSE)
+	if(obsess_over_services==FALSE) {
 		return OK;
-	if(svc->obsess_over_service==FALSE)
+  }
+	if(svc->obsess_over_service==FALSE) {
 		return OK;
+  }
 
 	/* if there is no valid command, exit */
-	if(ocsp_command==NULL)
+	if(ocsp_command==NULL) {
 		return ERROR;
+  }
 
 	/* find the associated host */
 	temp_host=find_host(svc->host_name);
@@ -107,8 +149,40 @@
 	printf("\tProcessed obsessive compulsive service processor command line: %s\n",processed_command_line);
 #endif
 
-	/* run the command */
-	my_system(processed_command_line,ocsp_timeout,&early_timeout,&exectime,NULL,0);
+  if (strncmp(ocsp_command,"namedpipe_ocsp_command:",strlen("namedpipe_ocsp_command:")) == 0){
+    /* put it into pipe */
+    char *npipe_path = strchr(ocsp_command, ':');
+    npipe_path++;
+    if (my_npipe_fprintf(npipe_path, processed_command_line) == ERROR) {
+#ifdef MIBE_DEBUG
+      snprintf(temp_buffer,sizeof(temp_buffer),
+        "npipe: sending of ocsp data skipped for ->%s<- because an error occured\n",
+        svc->host_name
+        );
+ 	 	  temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ 		  write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+    } else {
+      snprintf(temp_buffer,sizeof(temp_buffer),
+        "npipe: sending of ocsp data done for ->%s<-\n",
+        svc->host_name
+        );
+ 	 	  temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ 		  write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+#endif
+    } 
+  } else {
+	  /* run the command */
+#ifdef MIBE_DEBUG
+    snprintf(temp_buffer,sizeof(temp_buffer),
+      "npipe: running ocsp_command ->%s<- for ->%s<-\n",
+      processed_command_line,
+      svc->host_name
+      );
+ 	  temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ 	  write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+#endif
+	  my_system(processed_command_line,ocsp_timeout,&early_timeout,&exectime,NULL,0);
+  }
 
 	/* check to see if the command timed out */
 	if(early_timeout==TRUE){
@@ -140,14 +214,17 @@
 #endif
 
 	/* bail out if we shouldn't be obsessing */
-	if(obsess_over_hosts==FALSE)
+	if(obsess_over_hosts==FALSE){
 		return OK;
-	if(hst->obsess_over_host==FALSE)
+  }
+	if(hst->obsess_over_host==FALSE){
 		return OK;
+  }
 
 	/* if there is no valid command, exit */
-	if(ochp_command==NULL)
+	if(ochp_command==NULL){
 		return ERROR;
+  }
 
 	/* update macros */
 	clear_volatile_macros();
@@ -169,8 +246,39 @@
 	printf("\tProcessed obsessive compulsive host processor command line: %s\n",processed_command_line);
 #endif
 
-	/* run the command */
-	my_system(processed_command_line,ochp_timeout,&early_timeout,&exectime,NULL,0);
+  if (strncmp(ochp_command,"namedpipe_ochp_command:",strlen("namedpipe_ochp_command:")) == 0){
+    /* put it into pipe */
+    char *npipe_path = strchr(ochp_command, ':') + 1;
+    if (my_npipe_fprintf(npipe_path, processed_command_line) == ERROR) {
+#ifdef MIBE_DEBUG
+      snprintf(temp_buffer,sizeof(temp_buffer),
+        "npipe: sending of ochp data skipped for ->%s<- because an error occured\n",
+        hst->name
+        );
+ 	 	  temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ 		  write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+    } else {
+      snprintf(temp_buffer,sizeof(temp_buffer),
+        "npipe: sending of ocsp data done for ->%s<-\n",
+        hst->name
+        );
+ 	 	  temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ 		  write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+#endif
+    }
+  } else {
+	  /* run the command */
+#ifdef MIBE_DEBUG
+    snprintf(temp_buffer,sizeof(temp_buffer),
+      "npipe: running ochp_command ->%s<- for ->%s<-\n",
+      processed_command_line,
+      hst->name
+      );
+    temp_buffer[sizeof(temp_buffer)-1]='\x0';
+ 	  write_to_logs_and_console(temp_buffer,NSLOG_RUNTIME_WARNING,TRUE);
+#endif
+	  my_system(processed_command_line,ochp_timeout,&early_timeout,&exectime,NULL,0);
+  }
 
 	/* check to see if the command timed out */
 	if(early_timeout==TRUE){
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fwd_nagios_results.pl.gz
Type: application/octet-stream
Size: 6576 bytes
Desc: fwd_nagios_results.pl.gz
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061009/e12444ce/attachment.obj>
-------------- next part --------------
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list