[PATCH] common/macros.c:2185:grab_standard_servicegroup_macro() speed up & Service check execution problem report

Stephane LAPIE stephane.lapie at darkbsd.org
Tue Jan 4 12:40:47 CET 2011


On 01/04/2011 07:54 PM, Andreas Ericsson wrote:
> On 01/04/2011 11:25 AM, Stephane LAPIE wrote:
>> On 01/04/2011 06:38 PM, Andreas Ericsson wrote:
>>> http://www.op5.org/community/plugin-inventory/op5-projects/merlin
>>> http://git.op5.org/git/?p=nagios/merlin.git;a=blob;f=HOWTO;hb=master
>>> http://git.op5.org/git/?p=nagios/merlin.git;a=blob;f=README;hb=HEAD
>>>
>>> Make especially sure you read the first paragraph of the README.
>>
>> Oh, I see.
>>
>> I have been working with Nagios 3.2.0 since around Nov 2009 for our
>> monitoring setup, so I went and implemented my own thing using ssh keys,
>> a control shell script with specific commands, tuned configurations to
>> limit duplication of files, and enhancements to NSCA client/server to
>> make them workable, and such.
>>
> 
> Sounds like you made the detour more pleasant, whereas Merlin cuts a
> new path. If your way works to satisfaction, I guess that should keep
> on working for you.

To be fair, my way is still very "static" and sub-optimal (it requires a
lot of care to keep a configuration the specific way I made it).

Also, it is not so much a cluster, as a "master node holding the GUI and
doing no checks", and "slave nodes doing all the checking for their
assigned servers".

This means there is no real redundancy should the master blow up (then
its status information would be lost, unless I go through the trouble of
rsync'ing it or holding it on NFS or something for a standby master,
which introduces yet its own share of troubles).

(No redundancy of information on the slaves is not so important since
they are only here to send the latest information to the master node.)

So, while I have some level of "distribution" and "static load
balancing", this setup can't do automagic load balancing intuitively,
and it can't hope to provide complete redundancy, as it is. :)

This leads me to the conclusion that the only realistic solution is to
have a database to handle all that nitty-gritty for you. And then it
requires to be very closely integrated to the monitoring system itself,
which is not the case of basic 3.2.0 from what I have seen.


Also, I encountered quite a few times problems with clobbering the
external command file when restarting Nagios (this can be bothersome on
the master server on my setting, as it means it won't, EVER, receive any
information :)). It would be nice to be able, like, to SIGUSR1 (or any
other signal) the Nagios process to force it to reopen the file, without
interrupting process execution.

>> I never tried to touch the DB side of things with a vanilla Nagios base,
>> because it wouldn't be proper to handle that as a side-hack, and would
>> most likely kill any measure of performance.
>>
> 
> You needn't bother with the db parts of Merlin if you don't want to,
> but its distributed/loadbalanced nature has some perks that you just
> can't get without an eventbroker module (such as command forwarding
> and automagic loadbalancing).

Actually, I'd really gladly welcome having a database (with its own
solid redundancy system) to keep the monitoring data, if the Nagios GUI
cgi scripts could use it, which is not the case as I understand it.

Also, I didn't want to touch the C code of Nagios as much as I could
avoid, because my setup is already alien enough as is, and this would
induce more unforeseeable side effects :)

Therefore, so far I did my stuff by just relying on external scripts,
and possible configuration within Nagios. I guess I am reaching the
limit to what one can do that way, so I'll be giving a look at the event
brokers and such.

> You're welcome. Let me know how it pans out. We've had some troubles
> on *BSD systems in the past, but they should all be ironed out by now.
> I have limited testing capabilities though, so feedback is most welcome.

I would be strongly inclined to say it is solid on OpenBSD, in the 3.2.0
incarnation at least. The core process can keep on running for months on
end without a hitch.

Best regards,
-- 
Stephane LAPIE, EPITA SRS, Promo 2005
"Even when they have digital readouts, I can't understand them."
--MegaTokyo

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 262 bytes
Desc: OpenPGP digital signature
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20110104/72d826cd/attachment.sig>
-------------- next part --------------
------------------------------------------------------------------------------
Learn how Oracle Real Application Clusters (RAC) One Node allows customers
to consolidate database storage, standardize their database environment, and, 
should the need arise, upgrade to a full multi-node Oracle RAC database 
without downtime or disruption
http://p.sf.net/sfu/oracle-sfdevnl
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list