Distributed Monitoring with passive Host checks

Marc Powell mpowell at ena.com
Fri Aug 1 17:19:47 CEST 2003



> -----Original Message-----
> From: Thomas Fischer [mailto:thomas.fischer at quadriga.com]
> Sent: Friday, August 01, 2003 7:33 AM
> To: nagios-users at lists.sourceforge.net
> 
> Guys and Gals,
> 
> following situation. We have multiple remote sites (currently over 400
but
> expanding to over 2000 in the next 3 years) which use internally
exactly
> the same adressing scheme. Don't ask why the same Addresses, but be
> assured that i pointed already a 45 Magnum at the idiot who designed
this.
> Anyway back to the prob.
> 
> I want to remotely monitor all hosts in every site (well not all
because
> that would already now mean about 100k hosts), but i don't have direct
> access at the moment into each site. Also i would need to include a
second
> Firewall to open a VPN tunnel to our HQ where the central server sits.
An
> additional Firewall would cost. I hear you all crying out already why
2
> Firewalls, but unfortunately the sites use PIX Firewalls which is a
piece
> of shi* and can't do split tunnels and NAT at the same time,
thankfully i
> will sooner or later be able to use a Nokia with CP FW-1 for each
site.
> 
> How can i do passive host checks from the central server without
spending
> loads of development time, loads of money and have loads of headaches?
> Anybody did that already? Ohhh and no i can't wait a minimum of 12
months
> until Nagios 2.0 comes out.
> 
> Any ideas, pointers etc. highly welcome. If anybody has an idea just
> contact me and i can pass more details on about the Network setup.
> 

I do not envy your endeavor. You are talking about a truly incredible
number of hosts you want to monitor and I'm not aware of anything that
could efficiently handle what you are going to be asking it to do.
However -- here are some thoughts on distributed monitoring with Nagios
from a users perspective.

You're going to need more than one central server accepting the passive
checks. I think 1 server for every 3000-4000 checks might be optimistic.
Your biggest problems are going to be the speed at which the central
server is able to process the checks and also the web-interfaces. The
command pipe is only about 4K in size on linux machines. That's not a
whole lot of room for check data so the other check results have to
queue until nagios has a chance to clear the pipe. As far as the web
interfaces, currently the Summary views get really slow with large
number of hosts/services (about 4 minutes for 1800 hosts and 2200
services for me). This is due to be corrected in 2.0, but I highly doubt
that there is enough optimizations that can be done to efficiently
display 100K hosts and services without a very significant overhaul.

Use fast disks and lots of ram where you expect the most processing to
be done. The faster/more the merrier.

Config file management - create a database to store the host/service
information then use perl scripts to generate the specific host and
service configs for each device. That'll save you time in the long run.

Template inheritance, template inheritance, template inheritance. Say it
once more.

Disable host checking or limit it to a single ping, at least with 1.x.
Nagios gets very aggressive when it thinks a host is down and will
effectively stop everything else it is doing until it has verified the
status of the host. I think that in your case you'll need to disable it
altogether since 1.x doesn't yet support passive host checks and your
central servers will try to verify in addition to the remote collectors.

Bandwidth -- Just to report 100K check results back to the central
servers is going to require about 0.2 mbit/sec (assumes 100bytes/check @
5 minute intervals if I've done my math right.)

I'm sure there's lots lots more but those come to mind immediately.
Also, the above comments apply for reasonable check intervals. 100K
checks over 5 or 10 minutes is significantly more intense than 100K
checks over a week.

--
marc


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list