Antwort: Re: Antwort: Default Nagios process self-check

Sascha.Runschke at gfkl.com Sascha.Runschke at gfkl.com
Wed Sep 3 09:40:51 CEST 2008


nagios-users-bounces at lists.sourceforge.net schrieb am 02.09.2008 19:12:01:

Hello Ryan,

first of all: please to not top-post and do a fullquote at the same
time. It's considered rude on mailing lists and it's unnecessary
waste of bandwidth and makes mails hard to read, as I do not know
if you posted between the lines too.
Either please don't quote at all, or post your replies between the
quotes, like I do usually for example. Thanks for your understanding.

> It seems that for every host, 3 processes are launched to do the 
> host ping check: sh, ping, and nagios.  I currently have ~57 hosts 
> that are in an down state and have been acknowledged as out of 
> service.  I would assume 57*3 plus the 30 second timeout could cause
> this many processes at the same time.

Ok, that seems like a possible explanation. If you have a lot of checks,
which are expected to time out, then you will have a lot of "hanging"
processes waiting to quit. There is really nothing you can do about
it, but lower your timeout value.
Another option is to refrain from using check_ping, but use check_icmp
instead. check_ping is a wrapper script, which calls /bin/ping and
parses its output. Therefor you have the extra "sh" fork. check_icmp
is a plugin with native icmp implementation and doesn't require
the extra fork. It's faster, less bulky and todays check of choice
for host checks - at least for intranet hosts in my opinion.
WAN hosts are another thing though... but that might be my personal
opinion only.

> I guess that brings me to my next question.  I could disable active 
> host checks for these out of service machines which would most 
> likely alleviate my warnings about the amount of processes, but 
> would I have to re-enable them once the machines are brought back 
> up?  I currently just acknowledge the problem and leave a comment 
> when a machine is put out of service, but this means that it will be
> back at some point.  When it does come back, acknowledgement is gone
> and regular checks are still happening.  Does anyone know of a 
> better way to do this?

There is really no way to solve that "problem". It is like it is.
Question is rather: how can it be, that you have 57 hosts, which
are down for an extended period? I do not know your environment, but
it seems questionable to me, why such hosts would deserve a monitoring.
But that's up to you to decide.

Regards
        Sascha

-- 
Sascha Runschke
IT-Infrastruktur

GFKL Financial Services AG
Limbecker Platz 1
45127 Essen

Telefon : +49 (201) 102-1879 Mobil : +49 (173) 5419665 Fax : +49 (201) 
102-1102105




GFKL Financial Services AG
Vorstand: Dr. Peter Jänsch (Vors.), Jürgen Baltes, Dr. Till Ergenzinger, Dr. Tom Haverkamp
Vorsitzender des Aufsichtsrats: Dr. Georg F. Thoma
Sitz: Limbecker Platz 1, 45127 Essen, Amtsgericht Essen, HRB 13522
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20080903/d4c0f34a/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list