Monitoring tool for a large enterprice? Is Nagios suitable to any degree?

Subhendu Ghosh sghosh at sghosh.org
Thu Jun 2 15:37:42 CEST 2005


On Thu, 2 Jun 2005, Ralf Strandell wrote:

> Hi,
>
> I'm new to Nagios and currently evaluating it's suitability for my 
> professional needs.
>
> I have searched the internet for monitoring tools, but almost everything 
> I can find seems to belong to the "simple yet powerful" category. I need 
> something better. I have been using a heavily extended Big Brother 
> monitoring system and it is not flexible and powerfull enough. Nagios 
> might do. I don't know. I don't know how well Nagios works with MIBs and 
> SNMP traps or whether it supports compound events etc. The documentation 
> is extensive, but it's hard to find the relevant information.
>
> Thus I need to ask you. Sorry for a long email...
>
> What I need to monitor: 
> ------------------------------------------------------ 1) NETWORK I need 
> to monitor hundreds of Juniper/Cisco/Microsoft/other devices including 
> routers, switches, firewalls, vpn gateways, dsl/isdn, dns-servers, 
> dhcp-servers and uninterruptible power supplies using SNMP.
>
> I need to know about connectivity, reboots, uptime, cpu load, memory, 
> bandwidth utilization (octets/time and % of max), traffic distribution 
> *changes* (by protocol, by port), up/down interfaces and VPN tunnels, 
> routing *changes* and alarms (traps) and UPS battery and electricity 
> status.
>
> 2) HOSTS I need to monitor server parameters including connectivity, 
> reboots, uptime, cpu load, memory, swapping, disk I/O, diskspace and 
> services
>
> 3) APPLICATIONS Including databases (deadlocks, logs, free space...) and 
> everything one could find in a modern big enterprises data center.
>
> 4) SERVICE LEVEL I also want to monitor the point-to-point bandwidth and 
> response time (ping roundtrip, http response, general tcp connect, 
> database connections).
>
> 5) BUSINESS PROCESSES (ABSTRACTION LAYER) I want some basic root cause 
> analysis capability (ie. unreachable vs. down) and an abstraction layer 
> between polls/traps and alerts. I want to define compound events that 
> happen when several events coincide. Examples: Disk is more than 90% 
> full more than 10 minutes. Primary network connection has been lost for 
> more than 10 minutes or both the primary and backup connections have 
> been down more than three minutes. Event A and B happened, but not C, 
> and all this has lasted for more than 10 minutes and it is not sunday 
> between 1am and 2am. These rules/scripts/compound events are important 
> for my monitoring needs. I need to monitor a big enterprise with several 
> data centers, complicated network topology and business systems 
> comprising of several servers working together. 
> ----------------------------------------------------------------
>
> Plus...
>
> All this collected or deducted data should be stored in an event 
> database and used for history reports, snapshot reports, service level 
> reports, trends/graphics,... everything. Naturally it needs a web user 
> interface with at least two user levels (admin, monitor) and several 
> views (network view, business view...). Flexibility and manageability 
> are more important than instant ease of use.
>
> These requirements rule out about 100% of the monitoring tools I have 
> found. Please help. I'm lost.
>
> This would be used as a professional monitoring tool. It's a day job. 
> Usually 5 x 8hrs, so I don't need anything "simple yet powerfull". It 
> can also cost a bit. It can be hard to learn - no problem. So, do I have 
> any other choice than HP OpenView or Tivoli Enterprise Console? What can 
> nagios do for me?
>
>
>
>

Even with HPOV you will need additional tools sets to accomplish all the 
level of details you are looking for.  For most of these categories there 
are best-of-breed application.

For host and network fault monitoring - Nagios
For link traffic/app response times  - MRTG/Cricket/Cacti with alerts to 
Nagios
For link ping response times - SmokePing
For protocol distribution - Netflow from routers/ntop and FlowScan
For routing changes - custom plugin that looks at nexthop for specific 
routes
For router config management - RANCID  - could probably define an 
alert/trap to feed into Nagios
For business process abstraction - service dependencies in Nagios is great 
- it is pretty close to HPOV's service manager in design functions

You will probabaly want to write you own web interface to integrate the 
various data sets.


-- 

-sg


-------------------------------------------------------
This SF.Net email is sponsored by Yahoo.
Introducing Yahoo! Search Developer Network - Create apps using Yahoo!
Search APIs Find out how you can build Yahoo! directly into your own
Applications - visit http://developer.yahoo.net/?fr=offad-ysdn-ostg-q22005
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list