Problems with many hanging Nagios processes (Nagios spawning rogue nagios processes eventually crashing Nagios server)

Ethan Galstad nagios at nagios.org
Wed Jan 3 04:31:23 CET 2007


Hendrik Bäcker wrote:
> Hi all,
> 
> as mentioned in Ethans Thread for testing the actual branch version, I
> am afraid the problems are not only sitting on the buffers.
> 
> I have talked to a collegue of mine, watching to the sources. Specially
> the event.c on line 1079
> 
> ####
>                         if(run_event==TRUE){
> 
>                                 /* remove the first event from the
> timing loop */
>                                 temp_event=event_list_low;
>                                 event_list_low=event_list_low->next;
> 
>                                 /* handle the event */
> 
>                                 handle_timed_event(temp_event);
> // This is 1079 -----------^
>                                 /* reschedule the event if necessary */
>                                 if(temp_event->recurring==TRUE)
>                                        
> reschedule_event(temp_event,&event_list_low);
> 
>                                 /* else free memory associated with the
> event */
>                                 else
>                                         free(temp_event);
>                                 }
> ####
> 
> The function starts after on line 1154 and following.
> 
> If I am right, this is the worker part who do anything for nagios,
> starts checks, get check result (reaper), freshness checks and anything
> else.
> 
> Is this part working serialized (one shot after another) or is it
> threaded before?
> If it is serialzed, won't it be able to paralize it?
> 
> Do anyone know how long the processing of handle_timed_event is running?
> (Just a question before, I will test it after this mail compiling with
> debug3)
> 
> Just a my 2 cents.
> 
> Best wishes
> Hendrik
> 
[snip]

Most things in Nagios are performed in a serial fashion.  They include, 
event handlers, starting service checks, running the OCSP command, 
updating the status log, etc.  All these actions are kicked off the the 
handle_timed_event() function, which runs each thing serially.

Although the process of starting service checks is handled serially, the 
actual execution is run in parallel, and the service check reaper (which 
collects service check results) runs as its own thread.  The processing 
of service check results is handled in a serial fashion, although this 
is not a time-intensive process like the actual execution of a check.

The execution of host checks are the huge holdup in Nagios 2.x, and they 
are (for the most part) parallelized in Nagios 3.x, so that will help in 
the future.  Things like event handlers, notifications, etc. are hard to 
  run in parallel while ensuring that certain things happen in a 
particular, repeatable order.

Hope that helps.  It can get more confusing the more you look into 
things. :-)


Ethan Galstad,
Nagios Developer
---
Email: nagios at nagios.org
Website: http://www.nagios.org

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list