Best way to monitor application clusters

Paul Weaver paul.weaver at bbc.co.uk
Mon Sep 24 18:11:47 CEST 2007


I've recently started using nagios in our development environment, and
have knocked a few plugins for some of our programs (i.e. monitor a log
on a remote server to make sure it's growing, but not growing too fast
or too slow, or jumbo pings between two remote machines), which is very
impressive.

One thing I would like to monitor is a group of hosts/services, and flag
a warning if x% are not available, and a critical if y% are offline. A
common example would be checking DNS services. If you have 4 DNS
servers, you don't want to be woken up at 3AM if one falls offline, but
if 3 are offline you would, and if 4 are offline you want an APB. You
still want to see the servers are offline though on a webpage, and
possible a notification in work hours.

I'm aware of host/service groups, being one way of doing it, however I'm
unsure if notifications can be set based on % of hosts/services
available in a group. 

Another way would be a "virtual host", with a custom "check_host_alive"
which checks all hosts in a collection, and returns an
OK/critical/warning based on the number of failures, and likewise with
"virtual services". The original hosts could then be monitored
separately, or even not at all.

For example, a service I would like to check is whether 3 mysql
databases are in sync with each other. I currently have a web page that
compares the log positions. It seems to me that logically the service
should run on the mysql boxes, however I only want it running on 

Another example would be I have a piece of java software (call it "A")
that must run on at least one of 4 machines, and preferably on 2 of
them. I don't care which machine it's on, but if it's not running I want
to be notified in red lights.

I could have a simple "virtual service A", which would critical on 0,
warn on 1 and OK on 2 or more.
This would be attached to "virtual host A", which would critical on 0,
warn on 1 and OK on 2 or more of the servers that the service runs on.

I'd also like a "simple" login to the web page which would only display
the "clusters" of services/hosts, rather than the total view, which
would allow our support engineers to easilly see real problems, and
allow management to sleep hapilly with lots of green lights.

I must admit I'm leaning to the virtual host/service thing, but I was
wondering if there's a standard/better way of monitoring these kind of
things?

Thanks

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20070924/c43c9bf1/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list