Host and Services update fonction called twice

Andreas Ericsson ae at op5.se
Thu May 14 20:09:15 CEST 2009


Matthieu Kermagoret wrote:
> Hello,
> 
> I'm Matthieu and I work with Julien, here at Merethis. I see that
> there's a bit of a misunderstanding so I'll try to clarify and explain
> what we believe to be a bug in Nagios. All code, dumps and
> explanations below are extracted from the latest CVS revision of
> Nagios.
> 
> On Thu, May 14, 2009 at 1:44 PM, Andreas Ericsson <ae at op5.se> wrote:
>> The figures you posted are really just crap to me as I have no idea what
>> the different figures are suppose to mean.
>>
> 
> Those are just plain text dump of what ndomod sends to ndo2db. The
> format is really simple. Just notice that each "paragraph" is a
> different event that will generate a DB query (ie. if you have twice
> the same paragraph in a row, you'll execute the same query twice on
> the DB).
> 
>> A hook such as the one below would let you debug this
>> properly:
>>
>> [...]
>>        if (ds->type != NEBTYPE_SERVICE_CHECK_PROCESSED) {
>>                return 0;
>>        }
>> [...]
>>
> 
> That's what tipped me off. In fact we weren't talking about
> SERVICE_CHECK events but about SERVICE_STATUS events ! So I guess your
> explanations about DNX support code is off the table... Right ?
> 
> Now that we're clear, here are my first investigations.
> 
> It seems that for each service status update on Nagios, the
> update_service_status() function from common/statusdata.c is called
> twice. This function generates a NEBTYPE_SERVICESTATUS_UPDATE event
> each time it's called. Below is what I believe to be the offending
> code from base/checks.c :
> 

Ah, right. Now at least it makes sense :)

> <code>
> 
>   881 int handle_async_service_check_result(service *temp_service,
> check_result *queued_check_result){
> [...]
>  1560 		/* schedule a non-forced check if we can */
>  1561 		if(temp_service->should_be_scheduled==TRUE)
>  1562 			schedule_service_check(temp_service,temp_service->next_check,CHECK_OPTION_NONE);
> [...] /* No modification of temp_service in between. */
>  1590 	update_service_status(temp_service,FALSE);
> 
> </code>
> 
> Here's what to notice is :
>   - the call to schedule_service_check() with temp_service
>   - the call to update_service_status() below with no modification of
> temp_service
> 
> <code>
> 
>  1634 void schedule_service_check(service *svc, time_t check_time, int options){
> [...]
>  1764 	/* update the status log */
>  1765 	update_service_status(svc,FALSE);
> 
> </code>
> 
> Unfortunately, when trying to schedule the next service check, it is
> possible that the temp_service object is reused, just updated on the
> next service check time. So the event could be broadcasted a first
> time in schedule_service_check() and a second time in
> handle_async_service_check_result().
> 
> So what do you think about it ? I'm new to Nagios code so I might be mistaken.
> 

It seems you're right. I'll have to investigate this more in-depth. I'll file
it in mantis at tracker.nagios.org for now though.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Register now for Nordic Meet on Nagios, June 3-4 in Stockholm
 http://nordicmeetonnagios.op5.org/

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com




More information about the Developers mailing list