RFC: Nagios + AMQP

Hans Engelen engelenh at gmail.com
Tue Dec 16 13:27:30 CET 2008


> http://community.nagios.org/wiki/index.php/Nagios_AMQP

Thanks

> Ok, although I still have some difficulties towards implementing that.
> Maybe I wasn't looking at the right place because I had hard time
> finding C, Python or Perl API's for JMS. C would be a requirement to
> integrate into Nagios, and I'd need Perl or Python API's for writing a
> POC. With AMQP I have up to date API's for both C and Python (from
> Apache Qpid) that can talk 0-8, 0-9 and 0-10.
>
> If anyone could point me in the right direction I'll be more than happy.
> I'm just not going to write a C implementation of JMS from stratch if I
> can use AMQP APIs directly :).

Truth be told I was looking for the exact same thing, I have examples
based on the MQ Series libs. Essentially if you download the MQ Series
v7 client for Unix/Linux you get a really complete set of well
documented examples illustrate how to use it but they still use the MQ
Series shared libs. It might give you some ideas though.

I think also there is one thing that has not yet been considered.
Interoperability with third party products. For instance, we have were
I work already a a fair number of applications that use messaging.
Among them data middleware solutions such as Ab Initio. This
middleware converts gigabytes of data every day for us from one third
party software package to another third party software package.
Obviously this is something you would want to monitor. Rather than
work with 'complex' checks that need to sift through log files,
getting Ab Initio to feed back passive check information into Nagios
would be relatively easy with Messaging. Ab Initio has support for a
number of messaging solutions but not yet for AMQP.

> AMQP is an emerging standard; the final v1 spec isn't even finished yet.
> However there's already a handful of AMQP brokers available and the
> standard is being pushed by a consortium of many big companies as a
> replacement for big proprietary messaging platform.

This is probably a bit of a problem too, it's still an emerging
standard and it's still being heavily developed. Don't get me wrong I
think it shows promise and is long overdue but for the moment it's
still a little rough around the edges.

>> The way I'd imagined this working was that the client-side app would
>> still make a call out to a local binary (say a messaging 'send_nsca'
>> equivalent).  That binary would have a config file that told it how to
>> send that message to that environment's messaging tier.  It could
>> probably even be made argument-compatible with send_nsca if that was of
>> benefit.  This approach means that it's easy(er) to make working
>> binaries for Windows, Unix, Linux or any other tier that can build JMS
>> code.

I choose messaging myself to get rid of send_nsca externals where
possible. Having a send_nsca-like tool is good but most of my
'homemade' checks have the messaging logic built in. In this way I
have better control from within the check should there be a problem
delivering the check results. For instance if the messaging server is
not running I could queue up the results in a spool file so that once
the messaging server problem is solved the backlog of results would be
sent still. MQ Series btw makes this scenario better because it
supports message expiry. In this way such a backlog would be cleaned
up on the MQ Server. Messages that are considered too old would be
expired and purged automatically.

> I would like to keep all existing functionality - *MQ is not something
> small users will want to play with. I do think however that there would
> be a huge benefit in large-scale and highly distributed installations.

Indeed, another reason why I started on it. We have Groundwork Pro set
up in a cluster.

> * An NRPE-like (or DNX-like depending on how you see it) daemon for
> executing checks. It could eventually end up straight to the servers
> (especially is they have their own broker already) but would also be the
> main Nagios plugin executors.

I agree although large parts of DNX could be gutted and replaced with
Messaging. The parts I see staying are the Nagios Module and the
Worker node client daemon. On the other hand I am left to wonder if
there is even any value to it. I was thinking more of sticking with
the external command file as long as possible. To facilitate active
checks all one would need to do is replace the check_plugin with
something that writes a request on the appropriate queue and waits for
the response back from a remote daemon that is subscribed to that
queue, forks the real check_plugin (either on a worker node or the
monitored host itself) captures the results and sends them back as a
reply to be picked up by the check_plugin replacement on the nagios
server.

> * A nsca daemon sending messages instead of writing to the pipe

You lost me here.

> * A send-nsca compatible program to send messages (check results and
> also Nagios commands)

Again a little fuzzy.

> * Possibly a daemon receiving messages and writing them down to the pipe
> (to habe a non-MQ Nagios receive messages).

I see now you would want to alter the base Nagios product and extend
it with messaging. Not sure that is needed, I see no reason why it
could not work as an add-on to nagios that uses the external command
file as it's point of entry to nagios and a check_send_message
check_plugin to initiate remote checks (with remote being a worker
node or the actual monitored host). This would not guarantee maximum
compatibility with others that embed nagios such as GW.

>> I'm less interested in seeing performance improvements with this (we
>> don't really see performance issues with NSCA).  I also don't really
>> have performance concerns with messaging either as we've seen MQ
>> transfers can be very fast.

For some reason GW have replaced NSCA non the less with bronx which is
loaded as a module into Nagios. I am not  sure why.

> The performance improvement is in regard to running checks. What if
> nagios could concentrate solely on scheduling checks and interpreting
> results, and you could have an army of plugin executors monitoring a
> huge distributed network.

It would help on my end for sure.

> A remote web interface could also use messaging to control a Nagios daemon.

Indeed, I was thinking of a small tool to quickly schedule downtime
for hosts should an admin have to do maintenance on a server during
the 'check period' defined for said server. Without this a systems
admin would either have to go over to the nagios server to do it there
(and not all our admins use Nagios since we have HP who do much of our
support for us during the night) or would trigger the nagios server to
out notifications more or less needlessly. A small application like
this however could alleviate this problem. Just run it, type a reason
in a text field and click the first of two buttons (labeled 'Start
Maintenance') and the tool would send a scheduled downtime request for
say 24 hours. Once the maintenance is done click the second button
(labeled 'End Maintenance') and a request to remove the scheduled
downtime entry is sent.

Cheers,
Hans

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/




More information about the Developers mailing list