Monitoring - 1000+ hosts and latency

David Parrish david at dparrish.com
Mon Aug 11 00:16:13 CEST 2003


Patch attached...

On Fri, Aug 08, 2003 at 06:13:13PM +0000, solo molo wrote:
> From: "solo molo" <solomolo90 at hotmail.com>
> To: david at dparrish.com, Fred.Albrecht at za.tiscali.com
> Cc: nagios-users at lists.sourceforge.net
> Subject: Re: [Nagios-users] Monitoring - 1000+ hosts and latency
> Date: Fri, 08 Aug 2003 18:13:13 +0000
> 
> I'm having the same problem.  Where is this patch located?
> 
> 
> >From: David Parrish <david at dparrish.com>
> >To: Fred Albrecht <Fred.Albrecht at za.tiscali.com>
> >CC: nagios-users at lists.sourceforge.net
> >Subject: Re: [Nagios-users] Monitoring - 1000+ hosts and latency
> >Date: Thu, 7 Aug 2003 07:54:54 +1000
> >
> >I'm not going to guarantee anything. From what I saw, the external check
> >code that list patch applies to looks very similar in 1.0 & 1.1.
> >
> >Try it and see :)
> >
> >On Wed, Aug 06, 2003 at 07:33:31AM +0200, Fred Albrecht wrote:
> >
> > > Hi David
> > >
> > > Will this patch work on version 1.0 as well?
> > >
> > > :)
> > > fred
> > > > -----Original Message-----
> > > > From: David Parrish [mailto:david at dparrish.com]
> > > > Sent: 06 August 2003 01:32 AM
> > > > To: ftang at cgg.com
> > > > Cc: nagios-users at lists.sourceforge.net
> > > > Subject: Re: [Nagios-users] Monitoring - 1000+ hosts and latency
> > > >
> > > >
> > > > I'm running a nagios install monitoring nearly 7000 services and still
> > > > growing. It was unbearable until it was patched to not fork to process
> > > > external check results as often.
> > > >
> > > > If a large portion of your checks are passive, then this
> > > > patch will help a
> > > > lot. Probably not much for active checks though.
> > > >
> > > > My average check latency is 0.006 sec.
> > > >
> > > > On Tue, Aug 05, 2003 at 12:23:25PM +0100, ftang at cgg.com wrote:
> > > > > From: ftang at cgg.com
> > > > > To: nagios-users at lists.sourceforge.net
> > > > > Subject: [Nagios-users] Monitoring - 1000+ hosts and latency
> > > > > Date: Tue, 5 Aug 2003 12:23:25 +0100
> > > > >
> > > > > All,
> > > > >
> > > > > Is there anyone currenly using nagios (any version) on over
> > > > 1000 hosts
> > > > > (3000+ services)? I know that there have been emails about users
> > > > > experiencing extreme latencies, and you can imagine that I
> > > > too am in the
> > > > > same crowd. I like nagios (used netsaint in my last company
> > > > - only 15
> > > > > hosts), but I don''t think that it is suitable for my
> > > > current setup. I hope
> > > > > I am wrong.
> > > > >
> > > > > Thanks for your time.
> >
> >--
> >Regards,
> >David Parrish
> >0410 586 121
> ><< attach3 >>
> 
> _________________________________________________________________
> MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*.  
> http://join.msn.com/?page=features/virus

-- 
Regards,
David Parrish
0410 586 121
-------------- next part --------------
diff -ur nagios-1.1-virgin/base/nagios.c nagios-1.1/base/nagios.c
--- nagios-1.1-virgin/base/nagios.c	Thu Jul 10 08:30:10 2003
+++ nagios-1.1/base/nagios.c	Thu Jul 10 08:32:20 2003
@@ -51,6 +51,8 @@
 #include <getopt.h>
 #endif
 
+#include <time.h>
+
 /******** BEGIN EMBEDDED PERL INTERPRETER DECLARATIONS ********/
 
 #ifdef EMBEDDEDPERL 
@@ -1497,6 +1499,8 @@
 		printf("Current/Max Outstanding Checks: %d/%d\n",currently_running_service_checks,max_parallel_service_checks);
 #endif
 
+		check_for_external_commands();
+
 		/* handle high priority events */
 		if(event_list_high!=NULL && (current_time>=event_list_high->run_time)){
 
@@ -1594,8 +1598,13 @@
 			        }
 
 			/* wait a second so we don't hog the CPU... */
-			else
-				sleep((unsigned int)sleep_time);
+			else {
+				struct timespec t;
+				t.tv_sec = 0;
+				t.tv_nsec = 1000 * 1000 * 50;
+				nanosleep(&t, NULL);
+				//check_for_external_commands();
+				}
 		        }
 
 		/* we don't have anything to do at this moment in time */
@@ -1606,7 +1615,12 @@
 				check_for_external_commands();
 
 			/* wait a second so we don't hog the CPU... */
-			sleep((unsigned int)sleep_time);
+
+			{ struct timespec t;
+			t.tv_sec = 0;
+			t.tv_nsec = 1000 * 1000 * 50;
+			nanosleep(&t, NULL); }
+
 		        }
 	        }
 
--- commands.c.old	Sat Aug  2 14:13:23 2003
+++ nagios-1.1/base/commands.c	Sat Aug  2 14:13:53 2003
@@ -347,8 +347,19 @@
 
 
 	/**** PROCESS ALL PASSIVE CHECK RESULTS AT ONE TIME ****/
-	if(passive_check_result_list!=NULL)
-		process_passive_service_checks();
+        {
+                static unsigned int last_checked = 0;
+                unsigned int t;
+                                                                                                                       
+                        /* Don't process more frequently than once every 5 seconds. */
+                        /* This does a fork!!! */
+                time(&t);
+                if(passive_check_result_list!=NULL && ((t - last_checked) > 5) ) {
+                        process_passive_service_checks();
+                        last_checked = t;
+                }
+        }
+
 
 
 #ifdef DEBUG0
--- commands.c.old	Sat Aug  2 15:37:31 2003
+++ nagios-1.1/base/commands.c	Sat Aug  2 15:44:36 2003
@@ -63,7 +63,7 @@
 
 extern FILE     *command_file_fp;
 
-passive_check_result    *passive_check_result_list;
+passive_check_result    *passive_check_result_list = NULL;
 
 int             flush_pending_commands=FALSE;
 
@@ -96,9 +96,6 @@
 	/* update the status log with new program information */
 	update_program_status(FALSE);
 
-	/* reset passive check result list pointer */
-	passive_check_result_list=NULL;
-
 	/* reset flush flag */
 	flush_pending_commands=FALSE;
 
@@ -1279,13 +1276,9 @@
 
 	new_pcr->next=NULL;
 
-	/* add the passive check result to the end of the list in memory */
-	if(passive_check_result_list==NULL)
-		passive_check_result_list=new_pcr;
-	else{
-		for(temp_pcr=passive_check_result_list;temp_pcr->next!=NULL;temp_pcr=temp_pcr->next);
-		temp_pcr->next=new_pcr;
-	        }
+	/* add the passive check result to the head of the list in memory */
+	new_pcr->next = passive_check_result_list;
+	passive_check_result_list = new_pcr;
 
 #ifdef DEBUG0
 	printf("cmd_process_service_check_result() end\n");
@@ -2845,6 +2838,17 @@
 
 		/* the grandchild process should submit the service check result... */
 		if(pid==0){
+			passive_check_result *t, *t1;
+			/* Reverse passive list to correct the reverse that happened in collection. */
+
+			t = passive_check_result_list;
+			passive_check_result_list = NULL;
+			while (t) {
+				t1 = t->next;
+				t->next = passive_check_result_list;
+				passive_check_result_list = t;
+				t = t1;
+			}
 
 			/* write all service checks to the IPC pipe for later processing by the grandparent */
 			for(temp_pcr=passive_check_result_list;temp_pcr!=NULL;temp_pcr=temp_pcr->next){
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/users/attachments/20030811/04450675/attachment.sig>


More information about the Users mailing list