Unpredictable service check times fixed?

Stanley Hopcroft Stanley.Hopcroft at IPAustralia.Gov.AU
Tue Apr 15 05:14:30 CEST 2003


Dear Sir,

I am writing to say that it seems to me that the delays in checking 
services are caused by Nagios not being able to fork() itself to perform 
the service checks.

This fork failure is the cause of the problem, it has nothing to do with 
Nagios.

When Nagios wants to run a check it calls the system (kernel) fork() 
function to generate a new process that can then execve() the service 
check program. If Nagios cannot get a new process returned by fork(), it 
cannot check the service, simple as that.

Fork can fail for a number of mainly resource related reasons such as 
(on this FreeBSD system, happily checking 350 services)

ERRORS
Fork() will fail and no child process will be created if:
 
[EAGAIN]           The system-imposed limit on the total number of pro-
                   cesses under execution would be exceeded.  The limit
                   is given by the sysctl(3) MIB variable KERN_MAXPROC.
                   (The limit is actually ten less than this except for
                   the super user).
 
[EAGAIN]           The user is not the super user, and the 
                   system-imposed limit on the total number of processes 
                   under execution
                   by a single user would be exceeded.  The limit is
                   given by the sysctl(3) MIB variable
                   KERN_MAXPROCPERUID.
 
[EAGAIN]           The user is not the super user, and the soft resource
                   limit corresponding to the resource parameter
                   RLIMIT_NPROC would be exceeded (see getrlimit(2)).
 
[ENOMEM]           There is insufficient swap space for the new process.

The problem is I think in your Nagios host having either 

. insufficient memory and or swap. Your host may simply be overcommitted 
with other applications. If you are running ntop or snort or an SQL DB, 
you may have to get rid of them or upgrade your host.

. unpriviledged user resource limits 

Some of these limits can be changed dynamincally, others may require a 
kernel rebuild.

You probably should consult a local system administrator that is 
familiar with tuning whatever OS Nagios is running under.

This sytem (FreeBSD 4.7, 256 MB RAM and an 866 MHz Celeron) with Nag, 
Apache, smslink, sendmail) only runs at a load average of 0.15.

Hope this helps.

Yours sincerely.

 -- 
------------------------------------------------------------------------
Stanley Hopcroft
------------------------------------------------------------------------

'...No man is an island, entire of itself; every man is a piece of the
continent, a part of the main. If a clod be washed away by the sea,
Europe is the less, as well as if a promontory were, as well as if a
manor of thy friend's or of thine own were. Any man's death diminishes
me, because I am involved in mankind; and therefore never send to know
for whom the bell tolls; it tolls for thee...'

from Meditation 17, J Donne.


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list