check_nagios isn't very smart.

Fischer, Thomas thomas.fischer at quadriga.com
Wed Oct 1 11:07:41 CEST 2003
Previous message: AW: Stale log files
Next message: (no subject)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Using Nagios to monitor itself is quite useless anyway. If Nagios falls over then you won't see any results until it is restarted. An additional way is to use service freshness checks.

The cron approach is the only way i see this going forward. If you use a distributed set-up you can use nsca_send to transfer the status back to a central server. If not then let the cron job mail or page you. An additional way is to use service freshness checks with a distrributed set-up. If no status is transfered for a certain amount of time you write a check_command that alerts you.

And yes nagios can fork multiple processes out so 1+ processes is OK.

Zombie processes you can watch with Nagios itself and you can (and should) also incorporate this into the cron job to see if there are any dead nagios processes. To be honest i never really had any zombies with Nagios for what i can remember.

Tom

-----Original Message-----
From: jeff vier [mailto:jeff.vier at tradingtechnologies.com]
Sent: 30 September 2003 17:51
To: nagios-users
Subject: [Nagios-users] check_nagios isn't very smart.


Okay, I tuned our nagios system, here.

With an increase in efficiency and "intelligence" there's a lot less
false alerts.

However, that in itself is causing another problem.

Since check_nagios depends on the log being updated to figure out if
nagios is running, it often thinks it's dead.  We can easily go an hour
without an update to the log file.

I fixed this by setting log_service_retries=1, but that seems
ridiculous.  Turning on what amounts to debugging to trick another
element of nagios.

So, my question is, is there another way to watch nagios that doesn't
cause me to have to pile tons of garbage into my filesystem?

Some things I was considering, and the reasons I haven't [yet?]:

option 1 - cron once per 1 min (and have a 2 min nagios_check max):
	if [ "`ps -ef |grep nagios|grep -v grep|wc`" -gt 2 ]; then echo "[`date
+%s`] Heartbeat">> nagios.log; fi

  problem - What about zombied processes?  I'm falsely assuming 1 or
more nagios processes means it's okay.

option 2 - change the nagios_check_command in cgi.cfg to use a script
with a bunch more logic, but basically use
'lynx -head -dump -auth=user:pwd \
"http://localhost/nagios/cgi-bin/extinfo.cgi?type=1&host=hostname"'

  problem - I'm depending on http, which I guess is okay, since if http
is failing, I'd be updating the nagios.log anyway with that error and
sending out alerts.  also, I have to re-invent the process with, so far,
unknown feasibility, and I don't have much time to waste if it turns out
this is a bad idea for reasons I didn't think of (hence my asking).

Thoughts?  If I do end up figuring out a new way to do it, I'll
certainly post it.



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null 
This e-mail is the property of Quadriga Worldwide Ltd 
The message (and any associated files) is intended only for the use of  
the individual or entity to which it is addressed and may contain information  
that is confidential, subject to copyright or constitutes a trade secret. If  
you are not the intended recipient you are hereby notified that any  
dissemination, copying or distribution of this message, or files associated  
with this message, is strictly prohibited. If you have received this  
message in error, please notify us immediately by replying to the message  
and deleting it from your computer. Messages sent to and from us may be  
monitored.  Internet communications cannot be guaranteed to be secure or  
error-free as information could be intercepted, corrupted, lost, destroyed,  
arrive late or incomplete, or contain viruses. Therefore, we do not accept  
responsibility for any errors or omissions that are present in this  
message, or any attachment, that have arisen as a result of e-mail  
transmission. If verification is required, please request a hard-copy  
version. Any views or opinions presented are solely those of the author and  
do not necessarily represent those of the company. 
  
N¬±ùÞµéšŠX¬²š'²ŠÞu¼“†)äç¤Yé\¢g¢ž’š½éá¶ÚþØbžH
zG(›û5¨"¢Ë¬z»&j)bž	b²ÓZ‚*,ºÇ«²X¬¶Ë(º·
~Šàzw†Ûi³ÿåŠËl²‹«qçè®§zßåŠËlþX¬¶)ß£ùÚ‚*,ºÇ«°ù^jÇ¢ÉnuãZ‚*,½êìŠ‰é–è"ž÷«²*'½©Ý9,!zzÞ¦ŠíŠxŸ(¬²çŒzËë0ŠØhºÛ.¦š+¶)àŠwèÂ)e®+$mè§‚Ç§¶Ú?uëÿžée
Previous message: AW: Stale log files
Next message: (no subject)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list