Host and Services update fonction called twice

Matthieu Kermagoret mkermagoret at merethis.com
Thu May 14 17:52:21 CEST 2009


Hello,

I'm Matthieu and I work with Julien, here at Merethis. I see that
there's a bit of a misunderstanding so I'll try to clarify and explain
what we believe to be a bug in Nagios. All code, dumps and
explanations below are extracted from the latest CVS revision of
Nagios.

On Thu, May 14, 2009 at 1:44 PM, Andreas Ericsson <ae at op5.se> wrote:
> The figures you posted are really just crap to me as I have no idea what
> the different figures are suppose to mean.
>

Those are just plain text dump of what ndomod sends to ndo2db. The
format is really simple. Just notice that each "paragraph" is a
different event that will generate a DB query (ie. if you have twice
the same paragraph in a row, you'll execute the same query twice on
the DB).

> A hook such as the one below would let you debug this
> properly:
>
> [...]
>        if (ds->type != NEBTYPE_SERVICE_CHECK_PROCESSED) {
>                return 0;
>        }
> [...]
>

That's what tipped me off. In fact we weren't talking about
SERVICE_CHECK events but about SERVICE_STATUS events ! So I guess your
explanations about DNX support code is off the table... Right ?

Now that we're clear, here are my first investigations.

It seems that for each service status update on Nagios, the
update_service_status() function from common/statusdata.c is called
twice. This function generates a NEBTYPE_SERVICESTATUS_UPDATE event
each time it's called. Below is what I believe to be the offending
code from base/checks.c :

<code>

  881 int handle_async_service_check_result(service *temp_service,
check_result *queued_check_result){
[...]
 1560 		/* schedule a non-forced check if we can */
 1561 		if(temp_service->should_be_scheduled==TRUE)
 1562 			schedule_service_check(temp_service,temp_service->next_check,CHECK_OPTION_NONE);
[...] /* No modification of temp_service in between. */
 1590 	update_service_status(temp_service,FALSE);

</code>

Here's what to notice is :
  - the call to schedule_service_check() with temp_service
  - the call to update_service_status() below with no modification of
temp_service

<code>

 1634 void schedule_service_check(service *svc, time_t check_time, int options){
[...]
 1764 	/* update the status log */
 1765 	update_service_status(svc,FALSE);

</code>

Unfortunately, when trying to schedule the next service check, it is
possible that the temp_service object is reused, just updated on the
next service check time. So the event could be broadcasted a first
time in schedule_service_check() and a second time in
handle_async_service_check_result().

So what do you think about it ? I'm new to Nagios code so I might be mistaken.

Best regards,

-- 
Matthieu KERMAGORET | Développeur

mkermagoret at merethis.com

MERETHIS est éditeur du logiciel Centreon.

------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image 
processing features enabled. http://p.sf.net/sfu/kodak-com




More information about the Developers mailing list