Help with large scale planning

Mark Potter mpotter at x-iss.com
Tue May 20 16:20:39 CEST 2008


Hello List,

Been a while since I have been able to post on a regular basis due to
being given the opportunity to seek other employment! I have landed,
gracefully, in a position where I have been tasked with designing a
large scale Nagios installation. The requirements are all client
requirements and pretty necessary. I need a little advice on where to
start. I will describe the environment and then lay out my idea on how I
see the design coming together.

The environment is an HPCC environment and the requirements are based on
that aspect almost exclusively. Initially the monitoring will be for
clustering only and then expanding out to other servers outside of the
HPCC environment.

1. No client installed on compute nodes (there is an HA head node where
a client or a full install could be done).
2. No active checks directly to compute nodes
3. Ganglia is available for node data

That is the majority of the requirements. Ganglia makes things a bit
easier but I am not sure how much easier. It looks like GroundWork could
handle this but I don't see the large scale features available in the
open source version.

The environment is as follows

1. 80 clusters
2. Each cluster has 70-72 compute nodes 


The client wants a single point of monitoring for this environment. I am
looking at the following for a setup:

Using the ganglia plugin from Nagios Exchange to gather and parse the
data, on the HA head node, and having this report back to a main Nagios
server (HA) for the single point of monitoring.

What I don't know is how Nagios 3.x will scale with ~5000-6000 hosts
coming into a single point of monitoring. What cannot happen is the
checks causing any degradation in the HPCC environment. Ganglia is
already in place and accounted for in performance so querying the
ganglia process is allowed but they would prefer to pull this data from
gmetad and not gmond.

Also, from the Nagios management side, I would like to see if there is a
way to automatically add hosts if a new host pops up in the ganglia
data. This is not a deal breaker but will make life so much easier in
the long run. I have likely not given enough information somewhere but I
think there is enough here to get a discussion started.

It's good to be back!
Regards,

Mark L. Potter
eXcellence in IS Solutions, Inc. (X-ISS) 
Office:  713-862-9200  x219
Email : mpotter at x-iss.com <mailto:mpotter at x-iss.com>
http://www.x-iss.com <http://www.x-iss.com/> 

Making IT Work for You
HPC & Enterprise IT Solutions

*         HPC Application Acceleration
*         Cluster Design, Deploy, Manage, Train
*         Linux/Windows Integration
*         Remote Management, Backup, Anti-Spam/Virus
*         Network Assessments, Design
*         Security Audits, Design
*         Datacenter Design, Relocation
*         Messaging and Collaboration



NOTICE:
This message may contain privileged or otherwise confidential information.
If you are not the intended recipient, please immediately advise the sender
by reply email and delete the message and any attachments without using,
copying or disclosing the contents.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list