How often does Nagios need restarting? (Quis custodiet ipsos custodes?)

Marc Powell marc at ena.com
Mon Jun 29 23:44:35 CEST 2009


On Jun 29, 2009, at 4:20 PM, Kustner, Tom wrote:

> 2. Thanks for pointing out that host checks are not always performed
> unless a service has been detected has failing.  I value the service
> checking, but I assumed it was also pinging the host on a regular  
> basis
> and that is apparently not the case.  I come from the background of
> using products such as Insight Manager and OpenManage which are
> vendor-specific solutions that have their limitations but which
> automatically perform pinging on a regular basis.  I'll look at the
> documentation for information on getting that set for us.  It explains
> my frustration as to why a server can reboot and Nagios not detect it.


Word of warning - you *do not* want to enable regularly scheduled host  
checks under nagios-2.x. The current logic of only checking a host  
when a service is not OK is more than sufficient under normal  
circumstances. Enabling regularly scheduled checks under 2.x will only  
hurt your performance. While service checks can be done in parallel,  
host checks are done serially in that version. While a host is being  
checked, nagios stops *all other activity* until the host check  
completes; other checks, logging, notifications, everything.

To illustrate, if you have 200 hosts, sending 5 pings (~5 seconds to  
complete), it will take 200(hosts) x 5(seconds) = 1000 seconds just to  
check your host status. That's over 16 minutes that nagios is only  
checking those hosts and none of the services on those hosts, or  
sending notifications, or anything else.

Nagios-3.x implements parallel host checks, just like service checks,  
but even then regularly scheduled host checks aren't really needed or  
encouraged and are just a waste of resources that could be used for  
service checks, IMHO.

Even then, unless you're checking _very_ frequently, a modern server  
can easily reboot in the time between checks. I'd recommend using  
check_snmp as a service check to look at the snmp reported uptime and  
alert if it's less than a reasonable interval of your normal check  
interval (say 5-10 minutes typically).

--
Marc


------------------------------------------------------------------------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list