Monitoring tool for a large enterprice? Is Nagios suitable to any degree?

Arno Lehmann al at its-lehmann.de
Thu Jun 2 13:52:59 CEST 2005


Hello,

Ralf Strandell wrote:

> Hi,
> 
> I'm new to Nagios and currently evaluating it's suitability for my professional needs.

Fine.

> What I need to monitor:
> ------------------------------------------------------
> 1) NETWORK
> I need to monitor hundreds of Juniper/Cisco/Microsoft/other devices including routers, switches, firewalls, vpn gateways, dsl/isdn, dns-servers, dhcp-servers and uninterruptible power supplies using SNMP.

Ok.

> I need to know about connectivity, reboots, uptime, cpu load, memory, bandwidth utilization (octets/time and % of max), traffic distribution *changes* (by protocol, by port), up/down interfaces and VPN tunnels, routing *changes* and alarms (traps) and UPS battery and electricity status.

Ok, as long as you plan to invest some efforts even in programming on 
your own.

> 
> 2) HOSTS
> I need to monitor server parameters including connectivity, reboots, uptime, cpu load, memory, swapping, disk I/O, diskspace and services

Ok.

> 
> 3) APPLICATIONS
> Including databases (deadlocks, logs, free space...) and everything one could find in a modern big enterprises data center.

Ok, but this might need some programming by yourself, too.

> 
> 4) SERVICE LEVEL
> I also want to monitor the point-to-point bandwidth and response time (ping roundtrip, http response, general tcp connect, database connections).

Ok, to my knowledge, although I never applied those.
> 
> 5) BUSINESS PROCESSES (ABSTRACTION LAYER)
> I want some basic root cause analysis capability (ie. unreachable vs. down) and an abstraction layer between polls/traps and alerts. I want to define compound events that happen when several events coincide. Examples: Disk is more than 90% full more than 10 minutes. Primary network connection has been lost for more than 10 minutes or both the primary and backup connections have been down more than three minutes. Event A and B happened, but not C, and all this has lasted for more than 10 minutes and it is not sunday between 1am and 2am. These rules/scripts/compound events are important for my monitoring needs. I need to monitor a big enterprise with several data centers, complicated network topology and business systems comprising of several servers working together.

Well, this might need some work, but seeing that nagios can be extended 
by any sort of event scripts and you can access the (relatively) raw 
data from its logs this should be possible.
> ----------------------------------------------------------------
> 
> Plus...
> 
> All this collected or deducted data should be stored in an event database and used for history reports, snapshot reports, service level reports, trends/graphics,... everything. Naturally it needs a web user interface with at least two user levels (admin, monitor) and several views (network view, business view...). Flexibility and manageability are more important than instant ease of use.

Ok, to my knowledge - never tried this, but you can preocess the check 
outputs to insert them into a database.
> 
> These requirements rule out about 100% of the monitoring tools I have found. Please help. I'm lost.

:-)

> This would be used as a professional monitoring tool. It's a day job. Usually 5 x 8hrs, so I don't need anything "simple yet powerfull". It can also cost a bit. It can be hard to learn - no problem. So, do I have any other choice than HP OpenView or Tivoli Enterprise Console? What can nagios do for me?

Well, looking at your needs and your job description (and budget) I'd 
say it like this: Nagios can provide an extensive framework for your 
higher-level needs and supply almost all of the basic functionality.

Now, I'm not billing you as a consultant :-) but after such a short 
description I'd say that given the above it should be possible to 
implement something that offers everything you need. You will need to 
spend some time studying nagios, discussing with the developers, and 
doing some scripting yourself. You will need to set up a number of 
monitoring hosts to distribute the workload (depending, of course, on 
the temporal detail you need). You will need - assuming you work alone - 
about two months for the basic monitoring and load distribution, and 
after that comes the setup of compound events and the sort of reporting 
you need. You will need a mangement that understands that its requests 
need funding, time, and discussion.

In the end, you will have done a lot of the necessary work yourself, but 
you will have something that fits to your business, that was most 
probably less expensive in license costs and does not require more 
support than something commercial. And, even better, you will have given 
some valuable experiences and work to all Nagios users :-)

To sum up - I think you can use nagios. Try it with one testing machine 
and set up some monitoring. Implement reports, notifications, and 
distributed monitoring in small steps. Spend the necessary time and 
money. Read and use the mailing lists.
After some relatively short time you can decide yourself.

Arno

> 
> 
> 

-- 
IT-Service Lehmann                    al at its-lehmann.de
Arno Lehmann                  http://www.its-lehmann.de


-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list