Uptime Calculation Question

Kevin Keane subscription at kkeane.com
Fri Feb 11 13:03:23 CET 2011


The trick is to carefully select what you are actually checking. You probably don't want to run 5000 checks every five minutes, but you really only need to have one check, or a few at most, per server that will tell you whether or not whatever you are monitoring is up; that should be enough for your SLA. Make sure that check is very inexpensive computationally, and you can safely run it once per minute.

For instance, for a  Web site, check_http is a fairly inexpensive check, depending on the options you use.

That said, you may also want to look at other tools. I haven't used it myself, but I hear that many people use Cacti for this type of higher-resolution monitoring/measuring.

A third option is to create your own agent that monitors something important - for instance, it could monitor the Web server log files and generate an alert if no new entries have been added for 20 seconds, or if it sees a 500 error, things like that. Such an agent can submit check results to Nagios as a passive check result, basically right as it occurs. Drawback: if the server as a whole is down, such an agent wouldn't report a problem. Advantage: such an agent can be crafted very specifically to measure whatever parameters your SLA defines.

-----Original Message-----
From: Breandan Dezendorf [mailto:breandan at dezendorf.com] 
Sent: Thursday, February 10, 2011 6:50 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Uptime Calculation Question

On Thu, Feb 10, 2011 at 9:12 PM, Yueh-Hung Liu <yuehung.liu at gmail.com> wrote:
> nothing will be known without checking.
> you want more precise data you have to do more checks, that is, 
> decrease the "check_interval" value.

And the lower you set the check_interval, the harder the servers have to work to keep up with all the checks.  While the servers we are running could very well run all 5000 service checks every 5 minutes (or even faster), it would chew up a lot of our growth capacity for the server.

--
Breandan Dezendorf
breandan at dezendorf.com
bwdezend at gmail.com

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

------------------------------------------------------------------------------
The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE:
Pinpoint memory and threading errors before they happen.
Find and fix more than 250 security defects in the development cycle.
Locate bottlenecks in serial and parallel code that limit performance.
http://p.sf.net/sfu/intel-dev2devfeb
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list