RFC Proof of concenpt patch: Restarting embedded Perl Nagios periodically to halt memory consumption.

Stanley Hopcroft Stanley.Hopcroft at IPAustralia.Gov.AU
Sat Sep 18 04:31:07 CEST 2004


Dear Ladies and Gentlemen,

Nag 2.x attempts unsuccessfully (on my bad advice) to limit the maximum 
memory used by the embedded Perl Nag (ePN) process by periodically 
deallocating the Perl interpreter and re-initialising it.

Since 1.2 is my Nag test bed, these changes were backported to it and 
the negative results noted in a former letter.

However, changes to the reinit mechanism used by 2.x appear to deal with 
the problem of increasing memory usage by an ePN by _restarting_ Nagios 
periodically.

The changes are

1 In utils.c/reinit_embedded_perl(void)

fork, and in the child process exec the Nag startup script with the the 
'restart' parameter.

int reinit_embedded_perl(void){

#ifdef EMBEDDEDPERL
        char buffer[MAX_INPUT_BUFFER];
        pid_t pid ;

        snprintf(buffer,sizeof(buffer),"Restarting Nagios (to 
re-initialize embedded Perl interpreter) after %d uses 
...\n",embedded_perl_calls);
        buffer[sizeof(buffer)-1]='\x0';
        write_to_logs_and_console(buffer,NSLOG_INFO_MESSAGE,TRUE);

        pid=fork();

        if(pid==-1)
                exit(STATE_UNKNOWN) ;

        else if(pid==0){

                execlp("/usr/local/etc/rc.d/nagios.sh", 
"/usr/local/etc/rc.d/nagios.sh", "restart", 0) ;

        } else {

                exit(STATE_OK) ;
        }
#endif
        return OK ;

        }


2 Make the Nag startup script suid root.

2.1 minor changes to the startup script (to remove the su) and have the 
startup script append debug output to a file.

As with the 2.x code, reinit_embedded_perl() is called in checks.c 
whenever the number of calls to the embedded interpreter exceeds a 
threshold value.

It may well be that the restart is better done by the daemon process, 
rather than in a child forked to perform a service check. (This way 
seemed to me to be the fastest way to proceed [since there was already 
2.x code with this structure)].

Here is an extract from the Nagios log showing some test results

[1095429760] Restarting Nagios (to re-initialize embedded Perl 
interpreter) after 101 uses ...
[1095429760] Caught SIGTERM, shutting down...
[1095429760] Nagios 1.2 starting... (PID=83831)
[1095429760] Successfully shutdown... (PID=81306)
[1095429760] Finished daemonizing... (New PID=83832)

[1095430344] Restarting Nagios (to re-initialize embedded Perl 
interpreter) after 101 uses ...
[1095430344] Caught SIGTERM, shutting down...
[1095430344] Successfully shutdown... (PID=83832)
[1095430344] Nagios 1.2 starting... (PID=86358)
[1095430344] Finished daemonizing... (New PID=86359)

I am now testing my prod Nag with this change and a threshold of 100_000 
checks (should be about a week or a mem usage of 40-60 MB).

Yours sincerely.

-- 
Stanley Hopcroft

Network specialist, IT Infrastructure
IP Australia
Ph: (02) 6283 3189  Fax: (02) 6281 1353
PO Box 200 Woden  ACT 2606
http://www.ipaustralia.gov.au


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php




More information about the Developers mailing list