fork errors

Terry td3201 at gmail.com
Fri Sep 2 16:41:14 CEST 2005


I have a program that checks the logs by the minute and pages when the
fork errors occur, so we are responding within minutes.  I have looked
at the resources every time it happens and we have plenty of
resources.  Is there a single plugin I can put into debugging mode so
that when this happens I get more information as to why it is giving
these errors?   Here are a few facts:
- the system is fine with memory all the time, never runs out (resident/paging)
- there are not an unusual amount of processes running, maybe around
200 at a time, but no where near the ulimit setting
- ulimit for the 'nagios' user matches that of root (unlimited).  here
is the ulimit:
core file size        (blocks, -c) 0
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
max locked memory     (kbytes, -l) 4
max memory size       (kbytes, -m) unlimited
open files                    (-n) 1024
pipe size          (512 bytes, -p) 8
stack size            (kbytes, -s) 10240
cpu time             (seconds, -t) unlimited
max user processes            (-u) 7168
virtual memory        (kbytes, -v) unlimited

Thanks,
Terry



On 9/1/05, Fred <f1216 at yahoo.com> wrote:
> My guess would be to look at your resource utilization on your system,
> most likely causes for fork() to fail are no more process slots, out of
> memory, or past some kind of per-user (non-root) limit.    When this
> occurs look at your system logs, ps output and see if you have *lots*
> of processes hanging around.  It could be that nagios has stopped reaping
> its children (or another unrelated process has sucked up the resources)
> and you have simply pushed your system to the edge.  It might be that you
> get to that situation and it backs off before you even notice it and you
> are left with nagios having problems dealing with the aftermath.
> 
> -FredC
> 
> --- Terry <td3201 at gmail.com> wrote:
> 
> > Hello,
> >
> > I have been having this issue for quite some time.  For some unknown
> > reason, nagios stops performing checks with these errors:
> >
> > [1125536952] Warning: The check of service 'PING' on host 'hostname'
> > could not be performed due to a fork() error.  The check will be
> > rescheduled.
> >
> > All checks fail like this until nagios is restarted.  When this
> > problem is occuring I can run the service checks manually both as the
> > nagios user and as the root user.  There are no resource problems that
> > I can see at the time.  We do not appear to be hitting a limit with
> > open files or anything like that either.  The nagios mirrors the root
> > user in that area.
> >
> > What could be wrong?
> >
> > Thanks!
> >
> >
> > -------------------------------------------------------
> > SF.Net email is Sponsored by the Better Software Conference & EXPO
> > September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
> > Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
> > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> > _______________________________________________
> > Nagios-users mailing list
> > Nagios-users at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/nagios-users
> > ::: Please include Nagios version, plugin version (-v) and OS when reporting
> > any issue.
> > ::: Messages without supporting info will risk being sent to /dev/null
> >
> 
> 
> 
> 
> 
>


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list