negative check latency with Nagios as VM?

Mike Hawley mike.hawley at nspire.co.nz
Mon Aug 20 09:33:18 CEST 2007


Does the comment not to Nagios on VMWare also apply to a Distribution
server?

Thanks in advance - Mike 



-----Original Message-----
From: nagios-users-bounces at lists.sourceforge.net
[mailto:nagios-users-bounces at lists.sourceforge.net] On Behalf Of Steve
Shipway
Sent: Monday, August 20, 2007 5:24 PM
To: Frost, Mark {PBG}
Cc: nagios-users at lists.sourceforge.net
Subject: Re: [Nagios-users] negative check latency with Nagios as VM?

We run a lot of VMWare here, although we're running our Nagios on a physical
box for performance reasons.  I've spent a lot of time researching how to
monitor virtual hosts and the potential pitfalls...

> We're testing our Nagios 2.9 implementation on a VMWare server.  This 
> box does have the VMWare tools installed and is running NTP to sync 
> time.

Linux under VMWare seems to work best if you let VMWare Tools synch the time
to the ESX server (which uses NTP to synch its own time).  If you run NTP on
a virtual host, it can sometimes get confused as vmware-tools will also
adjust the time.  Similarly, a Windows guest should try to rely on
vmware-tools for the clock synch not anything else.

> The performance on this box seems a bit worse, but roughly comparable
to
> our physical box.  (Oddly enough, Nagios restart almost
instantaneously
> on the VM where it takes around 20 seconds to respond to the web 
> interface on the physical box...)

If your old box was Nagios 1.x then that's the reason.  Nagios 2 is much,
much faster in the web interface because it preparses and caches the
configuration. Another possibility could be that your virtual disk is held
partly in memory cache on the ESX server, speeding up initial access.

> at one point I saw the minimum check time at -2.00 seconds.  This
means
> this VM is so fast that it's running checks before they're even 
> scheduled!  Wow!

This is because your clock is getting skewed.  VMWare is not good for
anything which is sensitive at any resolution smaller than 1min, because the
clock hops about a bit due to the virtualisation.  Particularly when you're
running ntp *and* vmware-tools it can cause weird behaviour as they fight
over who is authoritative.

> In any case, I was concerned about this.  My biggest worry with a VM
is
> that it doesn't track the time well enough.  

This is very much the case, a guest OS under VMWare will experience weird
clock behaviour.  This is why plugins like check_net, check_cpu, and
anything rate-based are pointless and actually misleading if run via NRPE in
a VM.  A plugin which queries SNMP to get a counter and then calculates its
own rate on a different (physical) server is fine, as long as the rate
calculation is not run in a VM.

> Or perhaps I'm just associating this with a VM and it's just Nagios 
> itself.  Has anyone seen this before?

I've see it before in checks run under VMWare.  If you want to check CPU
usage under VMWare, I'm working with some people at Bright House Networks on
the new version of check_esx to support ESX3.  The old version works with
ESX2.

In brief -
* Don't run NTPD and vmware-tools together
* Don't run check_cpu, check_net or check_memory for a guest
* Don't run any rate-based checks on a virtual machine
* Don't run Nagios under VMWare if you can avoid it

Hope this helps,

Steve

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting
any issue. 
::: Messages without supporting info will risk being sent to /dev/null

No virus found in this incoming message.
Checked by AVG Free Edition. 
Version: 7.5.484 / Virus Database: 269.12.0/957 - Release Date: 8/16/2007
1:46 PM
 

No virus found in this outgoing message.
Checked by AVG Free Edition. 
Version: 7.5.484 / Virus Database: 269.12.0/957 - Release Date: 8/16/2007
1:46 PM
 


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list