Multiple Nagios proccesses running.

Andreas Ericsson ae at op5.se
Wed Jul 27 21:13:01 CEST 2005


Chris Wilson wrote:
> Hi Andreas,
> 
> 
>>What you think and don't think is, sadly, irrelevant. It's a fact that 
>>Ethan doesn't actively track down bugs or prioritise bug-reports on the 
>>1.x-branch. If you're interested you could ofcourse backport the fixes 
>>from the 2.x branch. I'm sure Ethan would welcome a patch.
> 
> 
> Irrelevant to who?


To the facts. Nagios is a GNU project. Ethan gets no money for it 
(although a great deal of appreciation), and he has only so much time. I 
for one am quite happy that he doesn't bother with old code but instead 
focuses on making 2.x as good as it can and should be.


> I will try and find time to maintain it myself if
> nobody else wants to, and it doesn't annoy anyone too much (but it would
> be a "fork" of Nagios 1.2).

Whatever floats your dinghy. Just make sure you get Ethan's permission 
to use the name Nagios though, or you might end up in a trademark dispute.


> It will probably be a few years before I
> trust 2.x enought to use it.

Your loss. It has a great deal to recommend it.

> I guess from the number of people reporting
> this issue on the mailing list that I'm not the only one.
> 

You're probably not.

> If Ethan will accept patches for 1.2, then great. I could even take some
> responsibility for maintaining the official 1.x branch if that would
> help.
> 

It probably won't. Ethan is adamant when it comes to not handing out CVS 
access, and developing without it is quite fruitless. I've offered to 
help myself. Considering I've had 20-something patches I don't think 
it's an issue of code-quality.


> 
>>Nothing's wrong with it per se. To work around it I added the redhatish 
>>concept of lockfiles that are created by the init-script. Several nagios 
>>instances can still be spawned so long as you don't use the init-script, 
>>but on platforms that have the "service" script it's not often useful to 
>>do so anyways.
> 
> 
> I think my patch makes nagios.lock work the way it should, so a separate
> lockfile isn't necessary. But I would definitely welcome comments.
> 
> 
>>>Nagios tries to do the
>>>mutual exclusion, but fails for reasons that I don't understand yet.
>>>
>>
>>I take it you haven't read the code. The mutex part simply isn't there 
>>(it's fairly easy to follow, if you take it from main() and just read on 
>>down to event_execution_loop() (or something).
> 
> 
> How do you think I wrote a patch without reading the code?

Guess-work? I've seen a fair amount of it from you on other topics.


> base/utils.c
> daemon_init() doesn't use mutexes

mutexes are thread constructs. 1.x is a single-thread app.

> at all in 1.2. It uses fcntl(F_SETLK),
> but that apparently doesn't work (at least there is no mutual exclusion
> on Linux). I made it tougher by checking whether the process listed in
> the PID file is still running, and aborting with an appropriate error if
> it is.
> 

I'll have to look at that patch. Can you send it again?
I don't have high hopes though, considering the fact that you proposed 
using the extremely non-portable setproctitle() to discern the master 
process from the slaves.

How exactly do you check that the process listed isn't running? AFAIK 
there aren't any portable syscalls available for getting the process 
name from a pid, and the /proc filesystem works differently on different 
platforms.

> How is the version in 2.x "more complete"? What can be more complete
> than properly checking that the process specified by the lockfile is not
> still running?
> 

man fcntl. If a process is already holding the lock one is trying to 
acquire it will return -1. It's the right way to do it. Checking the pid 
and trying to find matching process is the cumbersome, incorrect and 
non-portable way.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO September
19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list