Verion 2.0b4 how does cgi's nagios_check_command work?

Andreas Ericsson ae at op5.se
Thu Oct 13 16:54:36 CEST 2005


John P. Rouillard wrote:
> In message <434D5AF2.7010504 at op5.se>,
> Andreas Ericsson writes:
> 
> 
>>John P. Rouillard wrote:
>>
>>>In message <43467920.4070508 at op5.se>,
>>>Andreas Ericsson writes:
>>>
>>>>John P. Rouillard wrote:
>>>>
>>>>
>>>>>In message <43465AB9.6020304 at op5.se>,
>>>>>Andreas Ericsson writes:
>>>>>
>>>>>
>>>>>>John P. Rouillard wrote:
>>>
>>>
>>>>>>>The reason I ask is
>>>>>>>that nagios was down and the cgi's all happily reported that it was
>>>>>>>up. Could this be because the host and service status files were
>>>>>>>available since the machine crashed?
>>>>>>
>>>>>>Yes, that's almost certainly it. There is no really good way of 
>>>>>>detecting that nagios is actually running unless you're logged in as 
>>>>>>root.
>>>>>
>>>>>Hmm, I am not sure I follow why you need to be logged in as root.
>>>>
>>>>Because otherwise you shouldn't have access to reading process 
>>>>information about another users process.
>>>>
>>>>
>>>>>Why not stat the status.log file and check to see if its (mtime)
>>>>>timestamp is less than the setting of:
>>>>>
>>>>>	status_update_interval*2
>>>>>
>>>>>if aggregate_status_updates is enabled? One could also allow a setting
>>>>>"freshness_threshold" in cgi.cfg that is the number of seconds/minutes
>>>>>old the status.dat file is allowed to be if aggregate_status_updates
>>>>>isn't set.
>>>>
>>>>Good idea. Write the code for it and submit a patch.
>>>
>>>Actually not so much a good idea. There is actully a creation
>>>datestamp in the status.dat file I was going to use, but I decided to
>>>run an experiment first. I have my status_update_interval set to 3
>>>seconds.
>>>
>>>I used check_fileage to warn me if the file's age was over 3 seconds
>>>and ran it in a while loop. It failed often. The longest interval was
>>>139 seconds between updates with a number of periods of 20-30 seconds.
>>>
>>>My guesses are: nagios only writes the status file when it needs to.
>>
>>This is correct. The status_update_interval is never checked, although 
>>the status is updated every time a service changes either state or 
>>output (or a host, for that matter).
> 
> 
> Ideally nagios would provide a next_check_time in the status.dat, but
> I wonder if that could be usefully intuited from:
> 
>   min(
>       min(next_check time on services) + service_check_timeout),
>       min(next_check time on hosts) + host_check_timeout)
>      )
> 
> Possible problems: on demand host checks (if part of a network is
> down) could screw up the timing since everything else stops.
> 
> Just because a service check is scheduled doesn't mean that it is
> going to run (time period may be wrong etc), but if its determined to
> be non-runnable the escheduled time for it should cause a re-write of
> the status.dat file correct?
> 
> There has to be an easier way of determining if nagios is running
> doesn't there?
> 

Easy isn't the problem. The trick is to get it to work from a different 
and almost always less privileged user. Perhaps a simple neb-module can 
touch some file every 10 seconds and if it's 30 seconds old the GUI 
could then reasonably suspect that nagios has crashed.

However, I haven't noticed nagios crashing on a modern system. It used 
to, with glibc-2.0.35 and linuxthreads-0.7 (which was really buggy). 
Since upgrading to glibc-2.3.30 (or some such) and linuxthreads-0.10 
everything is running smoothly, so this isn't really a problem for me or 
any of our customers.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231


-------------------------------------------------------
This SF.Net email is sponsored by:
Power Architecture Resource Center: Free content, downloads, discussions,
and more. http://solutions.newsforge.com/ibmarch.tmpl
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list