Monitoring tool for a large enterprice? Is Nagios suitable to any degree?

Ralf Strandell ralf.strandell at auriamail.net
Thu Jun 2 13:21:35 CEST 2005


Hi,

I'm new to Nagios and currently evaluating it's suitability for my professional needs.

I have searched the internet for monitoring tools, but almost everything I can find seems to belong to the "simple yet powerful" category. I need something better. I have been using a heavily extended Big Brother monitoring system and it is not flexible and powerfull enough. Nagios might do. I don't know. I don't know how well Nagios works with MIBs and SNMP traps or whether it supports compound events etc. The documentation is extensive, but it's hard to find the relevant information.

Thus I need to ask you. Sorry for a long email...

What I need to monitor:
------------------------------------------------------
1) NETWORK
I need to monitor hundreds of Juniper/Cisco/Microsoft/other devices including routers, switches, firewalls, vpn gateways, dsl/isdn, dns-servers, dhcp-servers and uninterruptible power supplies using SNMP.

I need to know about connectivity, reboots, uptime, cpu load, memory, bandwidth utilization (octets/time and % of max), traffic distribution *changes* (by protocol, by port), up/down interfaces and VPN tunnels, routing *changes* and alarms (traps) and UPS battery and electricity status.

2) HOSTS
I need to monitor server parameters including connectivity, reboots, uptime, cpu load, memory, swapping, disk I/O, diskspace and services

3) APPLICATIONS
Including databases (deadlocks, logs, free space...) and everything one could find in a modern big enterprises data center.

4) SERVICE LEVEL
I also want to monitor the point-to-point bandwidth and response time (ping roundtrip, http response, general tcp connect, database connections).

5) BUSINESS PROCESSES (ABSTRACTION LAYER)
I want some basic root cause analysis capability (ie. unreachable vs. down) and an abstraction layer between polls/traps and alerts. I want to define compound events that happen when several events coincide. Examples: Disk is more than 90% full more than 10 minutes. Primary network connection has been lost for more than 10 minutes or both the primary and backup connections have been down more than three minutes. Event A and B happened, but not C, and all this has lasted for more than 10 minutes and it is not sunday between 1am and 2am. These rules/scripts/compound events are important for my monitoring needs. I need to monitor a big enterprise with several data centers, complicated network topology and business systems comprising of several servers working together.
----------------------------------------------------------------

Plus...

All this collected or deducted data should be stored in an event database and used for history reports, snapshot reports, service level reports, trends/graphics,... everything. Naturally it needs a web user interface with at least two user levels (admin, monitor) and several views (network view, business view...). Flexibility and manageability are more important than instant ease of use.

These requirements rule out about 100% of the monitoring tools I have found. Please help. I'm lost.

This would be used as a professional monitoring tool. It's a day job. Usually 5 x 8hrs, so I don't need anything "simple yet powerfull". It can also cost a bit. It can be hard to learn - no problem. So, do I have any other choice than HP OpenView or Tivoli Enterprise Console? What can nagios do for me?





More information about the Users mailing list