NODutils: Duplicate lines in servicechecks table for each service check

Michael Friedrich michael.friedrich at univie.ac.at
Tue Nov 3 12:36:15 CET 2009


Hi,

Øyvind Nordang wrote:
>
> When I check the servicechecks table each check of a service is listed
> twice, but not with the same information in both rows.
>   
I've noticed that myself and that seems to be a bug. Maybe it concerns 
the unique constraint where the UPDATE should happen. But it could also 
be that some data is sent incorrectly and therefore no matching can be done.
> In dbhandlers.c on line 1518 all I can think of is that ON DUPLICATE
> KEY UPDATE does not work as it should. I haven't had time to look at
> the code and programming isn't my strong side.
>
> 	if(asprintf(&buf,"INSERT INTO %s SET %s, command_object_id='%lu',
> command_args='%s', command_line='%s' ON DUPLICATE KEY UPDATE %s"
> 		    ,ndo2db_db_tablenames[NDO2DB_DBTABLE_SERVICECHECKS]
> 		    ,buf1
> 		    ,command_id
> 		    ,es[0]
> 		    ,es[1]
> 		    ,buf1
> 		   )==-1)
> 		buf=NULL;
>   
Also the creation of "buf1" is interesting. The approach of building the 
query buffer for both INSERT and UPDATE might lead into a real error 
too. The only difference is that the INSERT statement gets 
command_object_id, command_args and command_line

The defined keys in db/mysql.sql

  KEY `instance_id` (`instance_id`),
  KEY `service_object_id` (`service_object_id`),
  KEY `start_time` (`start_time`)

Kind of strange, no unique definition ... but let's step further into 
analysis.

Problem regarding that routine is, that the constraint never matches.
I'm trying to resolve this error from Icinga IDOUtils where the error 
exists too. The main differences are that the queries are normalized 
(INSERT INTO .. () VALUES() instead of INSERT INTO .. SET ..=..) but the 
constraints are the same as in NDOUtils. Also the debugoutput is quite 
enhanced to see what's going on.

I've run several tests with Postgres and Oracle, and regarding 
especially right now Oracle, the defined constraint also matches and 
entries do not get duplicated. Postgres also works fine.

The reason why this is happening is quite simple - the queries point 
exactly to the unique constraint.

Postgres:
UPDATE %s SET ... WHERE instance_id=%lu AND service_object_id=%lu AND 
start_time=%s AND start_time_usec=%lu

Oracle:
MERGE INTO %s USING DUAL ON (instance_id=%lu AND service_object_id=%lu 
AND start_time=%s AND start_time_usec=%lu) WHEN MATCHED THEN UPDATE SET 
... WHEN NOT MATCHED THEN INSERT ... VALUES ...

Conclusion to that is that it is a MySQL problem regarding the defined 
keys and the ON DUPLICATE KEY statement.

To step further into that, I've only selected the columns which are 
necessary to see the difference::

mysql> select servicecheck_id, service_object_id, start_time, 
start_time_usec, end_time, end_time_usec, command_object_id, 
execution_time from icinga_servicechecks;
+-----------------+-------------------+---------------------+-----------------+---------------------+---------------+-------------------+----------------+
| servicecheck_id | service_object_id | start_time          | 
start_time_usec | end_time            | end_time_usec | 
command_object_id | execution_time |
+-----------------+-------------------+---------------------+-----------------+---------------------+---------------+-------------------+----------------+

This block should be have been updated e.g.

|             521 |              6421 | 2009-11-03 10:34:58 |          
245848 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             522 |              6422 | 2009-11-03 10:34:58 |          
339129 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             523 |              6423 | 2009-11-03 10:35:02 |          
188397 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             524 |              6424 | 2009-11-03 10:35:02 |          
329014 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             525 |              6425 | 2009-11-03 10:35:03 |           
91160 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             526 |              6307 | 2009-11-03 10:35:03 |          
305725 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             527 |              6426 | 2009-11-03 10:35:04 |           
57256 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             528 |              6427 | 2009-11-03 10:35:04 |          
150767 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             529 |              6429 | 2009-11-03 10:35:05 |          
172237 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             530 |              6428 | 2009-11-03 10:35:05 |          
241257 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |

But it doesn't, you can see the INSERT statements right after a while in 
the history.

|             531 |              6421 | 2009-11-03 10:34:58 |          
245848 | 2009-11-03 10:35:02 |        798210 |                 0 
|        4.55236 |
|             532 |              6422 | 2009-11-03 10:34:58 |          
339129 | 2009-11-03 10:35:02 |        801988 |                 0 
|        4.46286 |
|             533 |              6430 | 2009-11-03 10:35:16 |          
721993 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             534 |              6423 | 2009-11-03 10:35:02 |          
188397 | 2009-11-03 10:35:06 |        395320 |                 0 
|        4.20692 |
|             535 |              6424 | 2009-11-03 10:35:02 |          
329014 | 2009-11-03 10:35:06 |        786733 |                 0 
|        4.45772 |
|             536 |              6307 | 2009-11-03 10:35:03 |          
305725 | 2009-11-03 10:35:08 |         45368 |                 0 
|        4.73964 |
|             537 |              6425 | 2009-11-03 10:35:03 |           
91160 | 2009-11-03 10:35:08 |        389047 |                 0 |        
5.29789 |
|             538 |              6426 | 2009-11-03 10:35:04 |           
57256 | 2009-11-03 10:35:08 |        674087 |                 0 |        
4.61683 |
|             539 |              6427 | 2009-11-03 10:35:04 |          
150767 | 2009-11-03 10:35:09 |        322330 |                 0 
|        5.17156 |
|             540 |              6429 | 2009-11-03 10:35:05 |          
172237 | 2009-11-03 10:35:09 |        524681 |                 0 
|        4.35244 |
|             541 |              6428 | 2009-11-03 10:35:05 |          
241257 | 2009-11-03 10:35:09 |        900865 |                 0 
|        4.65961 |
|             542 |              6431 | 2009-11-03 10:35:17 |           
26969 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             543 |              6432 | 2009-11-03 10:35:17 |          
133886 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |
|             544 |              6430 | 2009-11-03 10:35:16 |          
721993 | 2009-11-03 10:35:20 |        996513 |                 0 
|        4.27452 |
|             545 |              6431 | 2009-11-03 10:35:17 |           
26969 | 2009-11-03 10:35:21 |        391540 |                 0 |        
4.36457 |
|             546 |              6432 | 2009-11-03 10:35:17 |          
133886 | 2009-11-03 10:35:21 |        396431 |                 0 
|        4.26255 |


Let's focus on service_object_id=6421 (i've cleared the IP address)

The first query generates the entry where end_time is 0 ... 
FROM_UNIXTIME(0), 0 ... start_time is 1257240898 (= 2009-11-03 10:34:58 
GMT+1)

|             521 |              6421 | 2009-11-03 10:34:58 |          
245848 | 1970-01-01 01:00:00 |             0 |              6193 
|              0 |

[1257240916.978113] [002.0] [pid=2673] INSERT INTO icinga_servicechecks 
(instance_id, service_object_id, check_type, current_check_attempt, 
max_check_attempts, state, state_type, start_time, start_time_usec, 
end_time, end_time_usec, timeout, early_timeout, execution_time, 
latency, return_code, output, long_output, perfdata, command_object_id, 
command_args, command_line) VALUES (1, 6421, 0, 1, 3, 0, 1, 
FROM_UNIXTIME(1257240898), 245848, FROM_UNIXTIME(0), 0, 60, 0, 0.000000, 
0.245000, 0, 'PING OK - Packet loss = 0%, RTA = 17\.43 ms', '', '', 
6193, '100\.0,20%!500\.0,60%', '/opt/icinga/libexec/check_ping -H 
1\.2\.3\.4 -w 100\.0,20% -c 500\.0,60% -p 5 -4') ON DUPLICATE KEY UPDATE 
check_type='0', current_check_attempt='1', max_check_attempts='3', 
state='0', state_type='1', start_time=FROM_UNIXTIME(1257240898), 
start_time_usec='245848', end_time=FROM_UNIXTIME(0), end_time_usec='0', 
timeout='60', early_timeout='0', execution_time='0.000000', 
latency='0.245000', return_code='0', output='PING OK - Packet loss = 0%, 
RTA = 17\.43 ms', long_output='', perfdata=''

For that check, we are waiting for end_time and end_time_usec and truly 
need the UPDATE condition, the INSERT will create this faulty and 
duplicate entry (short version!).

|             531 |              6421 | 2009-11-03 10:34:58 |          
245848 | 2009-11-03 10:35:02 |        798210 |                 0 
|        4.55236 |

[1257240917.081916] [002.0] [pid=2673] INSERT INTO icinga_servicechecks 
(instance_id, service_object_id, check_type, current_check_attempt, 
max_check_attempts, state, state_type, start_time, start_time_usec, 
end_time, end_time_usec, timeout, early_timeout, execution_time, 
latency, return_code, output, long_output, perfdata, command_object_id, 
command_args, command_line) VALUES (1, 6421, 0, 1, 3, 0, 1, 
FROM_UNIXTIME(1257240898), 245848, FROM_UNIXTIME(1257240902), 798210, 
60, 0, 4.552360, 0.245000, 0, 'PING OK - Packet loss = 0%, RTA = 75\.77 
ms', '', '', 0, '', '') ON DUPLICATE KEY UPDATE check_type='0', 
current_check_attempt='1', max_check_attempts='3', state='0', 
state_type='1', start_time=FROM_UNIXTIME(1257240898), 
start_time_usec='245848', end_time=FROM_UNIXTIME(1257240902), 
end_time_usec='798210', timeout='60', early_timeout='0', 
execution_time='4.552360', latency='0.245000', return_code='0', 
output='PING OK - Packet loss = 0%, RTA = 75\.77 ms', long_output='', 
perfdata=''

As you may see on a full output on that row, the INSERT puts empty 
command_line, empty command_object_id etc into the DB which is wrong.

Now let's compare on the unique constraint, where ON DUPLICATE KEY shoud 
trigger on the UPDATE statement:

instance_id - same!
service_object_id - same!
start_time - same!

Well regarding the fact that the unique constraint is missing and 
instead normal keys are defined, let's turn the other way around - why 
are the hostchecks running fine without duplicate entries? Hmmm, because 
the unique key is defined in a correct manner.

servicechecks:
KEY `instance_id` (`instance_id`),
KEY `service_object_id` (`service_object_id`),
KEY `start_time` (`start_time`)

hostchecks:
UNIQUE KEY `instance_id` 
(`instance_id`,`host_object_id`,`start_time`,`start_time_usec`)

Whoever did that to the SQL-Statement during table creation......

Correct way of getting that to work will be defining a real unique key 
and then ON DUPLICATE KEY will trigger the UPDATE statement.

I will work on a patch and provide it when finished testing.

Kind regards,
Michael








> Hope someone can clarify this for me, is this a bug in nagios, ndoutil
> or have I done something wrong...
>
> Thanks in advance.
>
> --
> Øyvind Nordang
>
> ------------------------------------------------------------------------------
> Come build with us! The BlackBerry(R) Developer Conference in SF, CA
> is the only developer event you need to attend this year. Jumpstart your
> developing skills, take BlackBerry mobile applications to market and stay 
> ahead of the curve. Join us from November 9 - 12, 2009. Register now!
> http://p.sf.net/sfu/devconference
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>
>   

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference




More information about the Developers mailing list