Antwort: Default Nagios process self-check

Ryan Gravlin rgravlin at elizacorp.com
Tue Sep 2 19:12:01 CEST 2008


Hi Sascha,

It seems that for every host, 3 processes are launched to do the host ping check: sh, ping, and nagios.  I currently have ~57 hosts that are in an down state and have been acknowledged as out of service.  I would assume 57*3 plus the 30 second timeout could cause this many processes at the same time.

I guess that brings me to my next question.  I could disable active host checks for these out of service machines which would most likely alleviate my warnings about the amount of processes, but would I have to re-enable them once the machines are brought back up?  I currently just acknowledge the problem and leave a comment when a machine is put out of service, but this means that it will be back at some point.  When it does come back, acknowledgement is gone and regular checks are still happening.  Does anyone know of a better way to do this?

Thanks so much,

Ryan Gravlin

________________________________
From: nagios-users-bounces at lists.sourceforge.net [mailto:nagios-users-bounces at lists.sourceforge.net] On Behalf Of Sascha.Runschke at gfkl.com
Sent: Tuesday, September 02, 2008 11:01 AM
To: nagios-users at lists.sourceforge.net
Subject: [Nagios-users] Antwort: Default Nagios process self-check


nagios-users-bounces at lists.sourceforge.net schrieb am 02.09.2008 15:37:47:

> # of Hosts Monitored: 322
> # of Services Monitored: 35
>
> The localhost.cfg comes with a default process check with the values
> 250+ for warnings and 400+ for critical.  Usually about twice an
> hour from checking the event log I get this message:
>
> [09-02-2008 07:02:48] SERVICE ALERT: NAGIOS;Total Processes;WARNING;
> SOFT;1;PROCS WARNING: 370 processes with STATE = RSZDT
>
> It seems to me the machine itself is powerful enough to execute this
> many checks without even breaking a sweat.  Were these defaults
> configured in the thinking that there should never be that many processes?
>
> I'm by no means a Linux or Nagios expert and I was hoping someone
> could explain more of the thinking behind this check than what I
> see.  I can obviously just bump the numbers up but I want to make
> sure that I'm not ignoring something obvious that may have unwanted
> results after the fact.  Should I use these numbers I see here as
> the basis for my new thresholds?

These thresholds were never meant to be any upper limit, the maximum number
of concurrent checks your box can handle solely depends on your hardware.
See it more like an "if you have a nagios installation which produces
that many concurrent checks - then you should know by now how to
change this behaviour" ;-)

But then - I fail to see how your setup with 322 host and only 35(?) service
checks could produce that many processes. Maybe it'll be a good idea to
doublecheck what's going on there.

S


GFKL Financial Services AG
Vorstand: Dr. Peter Jänsch (Vors.), Jürgen Baltes, Dr. Till Ergenzinger, Dr. Tom Haverkamp
Vorsitzender des Aufsichtsrats: Dr. Georg F. Thoma
Sitz: Limbecker Platz 1, 45127 Essen, Amtsgericht Essen, HRB 13522
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20080902/b76d9c1b/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list