RFC Proof of concenpt patch: Restarting embedded Perl Nagios periodically to halt memory consumption.

Andreas Ericsson ae at op5.se
Sat Sep 18 04:57:56 CEST 2004


Stanley Hopcroft wrote:
> Dear Ladies and Gentlemen,
> 
> Nag 2.x attempts unsuccessfully (on my bad advice) to limit the maximum 
> memory used by the embedded Perl Nag (ePN) process by periodically 
> deallocating the Perl interpreter and re-initialising it.
> 
> Since 1.2 is my Nag test bed, these changes were backported to it and 
> the negative results noted in a former letter.
> 
> However, changes to the reinit mechanism used by 2.x appear to deal with 
> the problem of increasing memory usage by an ePN by _restarting_ Nagios 
> periodically.
> 
> The changes are
> 
> 1 In utils.c/reinit_embedded_perl(void)
> 
> fork, and in the child process exec the Nag startup script with the the 
> 'restart' parameter.
> 

About the ugliest solution I've heard of so far. How does it handle 
flushing initial status data to logs? Will this cause logfiles to grow 
at an alarming rate instead? I think you need to rethink this. Also, 
leaving the dirty work of cleaning up a process' memory space to the 
kernel is generally (and rightly so) considered bad practice. This 
routine takes bad practice to the next level.

> int reinit_embedded_perl(void){
> 
> #ifdef EMBEDDEDPERL
>         char buffer[MAX_INPUT_BUFFER];
>         pid_t pid ;
> 
>         snprintf(buffer,sizeof(buffer),"Restarting Nagios (to 
> re-initialize embedded Perl interpreter) after %d uses 
> ...\n",embedded_perl_calls);
>         buffer[sizeof(buffer)-1]='\x0';
>         write_to_logs_and_console(buffer,NSLOG_INFO_MESSAGE,TRUE);
> 
>         pid=fork();
> 
>         if(pid==-1)
>                 exit(STATE_UNKNOWN) ;
> 
>         else if(pid==0){
> 
>                 execlp("/usr/local/etc/rc.d/nagios.sh", 
> "/usr/local/etc/rc.d/nagios.sh", "restart", 0) ;
> 
>         } else {
> 
>                 exit(STATE_OK) ;
>         }
> #endif
>         return OK ;
> 
>         }
> 
> 
> 2 Make the Nag startup script suid root.
> 

Dangerous, since it doesn't do a lot of checking to ensure it doesn't 
clobber anything. A user gaining write access to the Nagios binary would 
be in for a walk in the park to escalate his privileges.

> 2.1 minor changes to the startup script (to remove the su) and have the 
> startup script append debug output to a file.
> 
> As with the 2.x code, reinit_embedded_perl() is called in checks.c 
> whenever the number of calls to the embedded interpreter exceeds a 
> threshold value.
> 
> It may well be that the restart is better done by the daemon process, 
> rather than in a child forked to perform a service check. (This way 
> seemed to me to be the fastest way to proceed [since there was already 
> 2.x code with this structure)].
> 
> Here is an extract from the Nagios log showing some test results
> 
> [1095429760] Restarting Nagios (to re-initialize embedded Perl 
> interpreter) after 101 uses ...
> [1095429760] Caught SIGTERM, shutting down...
> [1095429760] Nagios 1.2 starting... (PID=83831)
> [1095429760] Successfully shutdown... (PID=81306)
> [1095429760] Finished daemonizing... (New PID=83832)
> 
> [1095430344] Restarting Nagios (to re-initialize embedded Perl 
> interpreter) after 101 uses ...
> [1095430344] Caught SIGTERM, shutting down...
> [1095430344] Successfully shutdown... (PID=83832)
> [1095430344] Nagios 1.2 starting... (PID=86358)
> [1095430344] Finished daemonizing... (New PID=86359)
> 

So... that's 10 hours between restarts. How many service checks is this 
for? What happens in networks large enough for the idea of the EPN to be 
really useful (+5000 services)?

> I am now testing my prod Nag with this change and a threshold of 100_000 
> checks (should be about a week or a mem usage of 40-60 MB).
> 

40-60 MB * 50 concurrent Nagios processes. I don't have that much RAM.

Is dropping the EPN completely out of the question? It seems to me like 
it's been given a lot of work and has only gone from bad to worse, and I 
for one won't enable it with the requirements you just mentioned.

> Yours sincerely.
> 

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php




More information about the Developers mailing list