Multiple Nagios proccesses running.

Andreas Ericsson ae at op5.se
Thu Aug 11 15:24:46 CEST 2005


Chris Wilson wrote:
> Hi Andreas,
> 
> 
>>Rather than exiting if it finds a process running with the same pid, it 
>>should try and kill it (using SIGTERM, sleeping 5 seconds and then 
>>issuing SIGKILL). This is because we can't be sure WHAT process it 
>>found, just that it has the same pid as the one that used to be nagios, 
>>and on a restart attempt where the previous daemon failed to exit the 
>>logical thing to do is to re-read the configuration.
> 
> 
> You're right that we can't identify whether the other process is, but
> killing it sounds much worse than just aborting! What if the user is
> running several daemons as the same UID (e.g. nobody, daemon) and
> another one gets the PID that Nagios was using before?
> 

True. For a proper fix, the lockfile would be locked against writing by 
the old process. If there is no such process *AND* the file isn't 
locked, it's fairly safe to assume the process isn't another nagios 
daemon. If the lock is held, but the pid is wrong, some process is 
running but has failed to update the pid in the file (a bug, by its own 
means), and if a process exists but no lock is held, it's safe to assume 
that the process running is another nagios daemon. However, that leaves 
us with the old checking system pretty much in place, and your patch 
becoming something of an extra clarification. filelock held = nagios 
running, no filelock = nagios possibly not running, or running with some 
weird permissions, or some such.


However, in this scenario the filelock should always be attempted as 
root (or at least as the most privileged user nagios starts as), because 
root can sometimes (always, but sometimes silently) override filelocks 
held by processes with lesser privileges.

> Surely it's safer to abort so that the user finds out something is
> wrong, checks for and removes the old Nagios process, and then deletes
> the lockfile?

This assumes user intervention, which I assumed was what you were trying 
to move away from.

> It's at least better than the current behaviour (on Linux
> at least) of silently carrying on :-)
> 

Indeed, but that behaviour is flawed on its own merit.

> But if you insist that killing the other process is the right thing to
> do, I will implement it.
> 

I don't. It only is if Nagios is running as a dedicated pseuod-user, 
which it won't necessarily be. One could ofcourse in such cases submit a 
RELOAD command to the external pipe. I'm not sure how many hoops one 
should jump through though, or even if it's the right one to jump next.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list